This situation presents 2 challenges: computing an overall f1-score when you only have per-batch values, and doing so in a multilabel setting where micro, macro and weighted f1 scores are expected. What a journey!
Splitting a multi-label dataset into train and test sets is more complicated than the single-label case. You can't simply split each class. You have to be more clever, and stratify - here's how.
The saved_model API allows for easy saving. Restoring the model and performing inference is a bit trickier when the input Tensors come from a tf.data.Dataset. We'll see here how this works.
That's my personal setup: links, descriptions and configuration. From ZSH to Pyenv through Spaceship and Tmux
Here is my second AMI, ugraded from the previous one with a lot of what you need