I am using Tensorflow Datasets’
tfds.load function to load my data:
import tensorflow_datasets as tfds import tensorflow as tf (raw_train, raw_validation, raw_test), metadata = tfds.load( 'cats_vs_dogs', split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'], with_info=True, as_supervised=True, )
Now, I have some additional images of cats and dogs on my local pc (for example
Cat1.jpg). I would like to add them to these data. How can I do this?
Note that I have not just one file, but a lot and furthermore this is just a binary classification example; same question holds for multi-class classification, so it would be good to also have a solution for that.
Update: I tried different ways, like trying to read in images with tf-nightly with tf.keras.preprocessing.image_dataset_from_directory, however, it is not that easy, unforunately. There are a lot of problems, like the resulting dataset is in different dtype and cannot be merged with the original one. I have no solution for this problem. I put a bounty on it, because I really need detailed code, a working solution and not just some general thoughts how in theory this could be achieved. I don’t need a solution with image_dataset_from_directory, if anyone has any solution, detailed code which works, I am fine with that.
I did not want to post any code, as I think there are better ways to solve this. However, please find the way I tried it here (in colab):
!pip install tf-nightly #!pip uninstall tf-nightly import tensorflow as tf print(tf.__version__) train_ds = tf.keras.preprocessing.image_dataset_from_directory( '/tmp/Test/', image_size = (224,224), batch_size = 32, # label_mode = 'int' )
There is a Test folder in tmp. One subfolder cat and the other dog. Includes some random pictures from search for cat and dog.
Resulting train_ds is a
<BatchDataset shapes: ((None, 224, 224, 3), (None,)), types: (tf.float32, tf.int32)>
import os import shutil os.listdir("/tmp/Test") #First find where the ".ipynb_checkpoints" is located. shutil.rmtree("/tmp/Test/.ipynb_checkpoints") import tensorflow_datasets as tfds (raw_train, raw_validation, raw_test), metadata = tfds.load( 'cats_vs_dogs', split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'], with_info=True, as_supervised=True, )
raw_train for example is a
<DatasetV1Adapter shapes: ((None, None, 3), ()), types: (tf.uint8, tf.int64)>.
def _normalize_img(img, label): img = tf.cast(img, tf.float32) / 255. img = tf.image.resize(img, (224,224)) label = tf.cast(label, tf.int64) img = tf.cast(img, tf.uint8) return (img, label) # ds = tfds.load('mnist', split='train', as_supervised=True) ds = raw_train.map(_normalize_img)
ds is now a
<DatasetV1Adapter shapes: ((224, 224, 3), ()), types: (tf.uint8, tf.int64)>
Does not solve it, as data is not properly matched/concatenated. Furthermore in multi-class case I have no control to check the match of the labels.
So I do not need any general thoughts about how this could be achieved in theory. I need a detailed working solution, detailed code. And not just for binary as here in this example, but I also need it for multi-class problems, as I also have this problem there. So how to match the "read-in labels" with the labels resulting from tfds.load in multi-class case. That there are no miss-matching, like mixing the classes or so. E.g. cats becomes horse (in case of cats vs dogs vs horses).
I also tried to point a ImageDataGenerator directly to the raw_train dataset. If that worked I could have proceeded with using ImageDataGenerator in general, alhough I actually did not want this. So I just want to add images to the raw_train dataset. I tried this:
from tensorflow.keras.preprocessing.image import ImageDataGenerator train_image_generator = ImageDataGenerator( rescale=1./255, ) train_datagen = train_image_generator.flow_from_directory( directory=raw_train, target_size=(224, 224), shuffle=True, batch_size=128, class_mode='binary' )
And then match/concatenate the results of these datagenerators. But it is not possible to just point this on raw_train, it gives an error.
The objects returned by
tfds.load are instances of
tf.data.Dataset. Therefore, you can build a new
tf.data.Dataset instance of your local images, and then use
concatenate method to join them together. To build such a dataset from the images on disk, at least there are three different ways:
You can use the newly added
tf.keras.preprocessing.image_dataset_from_directory function. For the moment, this is only available in
tf-nightly. You can find a sample example of working with this function here.
Alternatively, you can use
tf.data API for having much more control over loading process as well as further transformations on images and their labels. Here is a sample example on how to achieve this.
Or you can first load the images using whatever library/method as a Numpy array, and also construct another array corresponding to their labels. Then you can create a
tf.data.Dataset instance using
from_tensor_slices method. You can find an example here. Note that this method is NOT recommended if you have lots of images (which in turn means that the size of the constructed Numpy array would be very large and therefore makes the data pipeline memory-wasteful or impossible to build).
Answered by today on January 1, 2022
0 Asked on January 15, 2021 by alejandro-vargas
2 Asked on January 15, 2021 by ed1t
1 Asked on January 15, 2021 by jean-jol-borter
0 Asked on January 15, 2021 by sonya-gold
1 Asked on January 15, 2021 by coffeecode
1 Asked on January 15, 2021 by danielo515
2 Asked on January 15, 2021 by gelin-luo
1 Asked on January 15, 2021 by humanbean
1 Asked on January 15, 2021 by simon-mills
1 Asked on January 14, 2021 by karthikeyan-vijayakumar
5 Asked on January 14, 2021 by homealone
1 Asked on January 14, 2021 by shynex
2 Asked on January 14, 2021 by liel-barouch
3 Asked on January 14, 2021 by frannmich
0 Asked on January 14, 2021
5 Asked on January 14, 2021 by rajan-prasad
2 Asked on January 14, 2021 by dgoldfeder
Get help from others!