Add images from disk to a Tensorflow dataset

Stack Overflow Asked by Stat Tistician on January 1, 2022

I am using Tensorflow Datasets’ tfds.load function to load my data:

import tensorflow_datasets as tfds
import tensorflow as tf

(raw_train, raw_validation, raw_test), metadata = tfds.load(
    split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],

Now, I have some additional images of cats and dogs on my local pc (for example Cat1.jpg). I would like to add them to these data. How can I do this?

Note that I have not just one file, but a lot and furthermore this is just a binary classification example; same question holds for multi-class classification, so it would be good to also have a solution for that.

Update: I tried different ways, like trying to read in images with tf-nightly with tf.keras.preprocessing.image_dataset_from_directory, however, it is not that easy, unforunately. There are a lot of problems, like the resulting dataset is in different dtype and cannot be merged with the original one. I have no solution for this problem. I put a bounty on it, because I really need detailed code, a working solution and not just some general thoughts how in theory this could be achieved. I don’t need a solution with image_dataset_from_directory, if anyone has any solution, detailed code which works, I am fine with that.

I did not want to post any code, as I think there are better ways to solve this. However, please find the way I tried it here (in colab):

!pip install tf-nightly
#!pip uninstall tf-nightly

import tensorflow as tf

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    image_size = (224,224),
    batch_size = 32,
    # label_mode = 'int'

There is a Test folder in tmp. One subfolder cat and the other dog. Includes some random pictures from search for cat and dog.

Resulting train_ds is a <BatchDataset shapes: ((None, 224, 224, 3), (None,)), types: (tf.float32, tf.int32)>

import os
import shutil

os.listdir("/tmp/Test") #First find where the ".ipynb_checkpoints" is located.


import tensorflow_datasets as tfds
(raw_train, raw_validation, raw_test), metadata = tfds.load(
    split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],

raw_train for example is a <DatasetV1Adapter shapes: ((None, None, 3), ()), types: (tf.uint8, tf.int64)>.

  def _normalize_img(img, label):
  img = tf.cast(img, tf.float32) / 255.
  img = tf.image.resize(img, (224,224))
  label = tf.cast(label, tf.int64)
  img = tf.cast(img, tf.uint8)
  return (img, label)
  # ds = tfds.load('mnist', split='train', as_supervised=True)
  ds =

ds is now a <DatasetV1Adapter shapes: ((224, 224, 3), ()), types: (tf.uint8, tf.int64)>


Does not solve it, as data is not properly matched/concatenated. Furthermore in multi-class case I have no control to check the match of the labels.

So I do not need any general thoughts about how this could be achieved in theory. I need a detailed working solution, detailed code. And not just for binary as here in this example, but I also need it for multi-class problems, as I also have this problem there. So how to match the "read-in labels" with the labels resulting from tfds.load in multi-class case. That there are no miss-matching, like mixing the classes or so. E.g. cats becomes horse (in case of cats vs dogs vs horses).

Second way:
I also tried to point a ImageDataGenerator directly to the raw_train dataset. If that worked I could have proceeded with using ImageDataGenerator in general, alhough I actually did not want this. So I just want to add images to the raw_train dataset. I tried this:

from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_image_generator = ImageDataGenerator(

train_datagen = train_image_generator.flow_from_directory(
  target_size=(224, 224),

And then match/concatenate the results of these datagenerators. But it is not possible to just point this on raw_train, it gives an error.

One Answer

The objects returned by tfds.load are instances of Therefore, you can build a new instance of your local images, and then use concatenate method to join them together. To build such a dataset from the images on disk, at least there are three different ways:

  • You can use the newly added tf.keras.preprocessing.image_dataset_from_directory function. For the moment, this is only available in tf-nightly. You can find a sample example of working with this function here.

  • Alternatively, you can use API for having much more control over loading process as well as further transformations on images and their labels. Here is a sample example on how to achieve this.

  • Or you can first load the images using whatever library/method as a Numpy array, and also construct another array corresponding to their labels. Then you can create a instance using from_tensor_slices method. You can find an example here. Note that this method is NOT recommended if you have lots of images (which in turn means that the size of the constructed Numpy array would be very large and therefore makes the data pipeline memory-wasteful or impossible to build).

Answered by today on January 1, 2022

Add your own answers!

Related Questions

Animation to play once per session

0  Asked on January 15, 2021 by alejandro-vargas


postgres random text in jsonb column

2  Asked on January 15, 2021 by ed1t


How to make TestNG print detail message about the failure

2  Asked on January 15, 2021 by gelin-luo


Strange igraph behaviour, generating duplicate vertices

1  Asked on January 15, 2021 by simon-mills


C++ printing float as nan

1  Asked on January 14, 2021 by acarter


Webclient isn’t dowloading everything from URL c#

1  Asked on January 14, 2021 by shynex


How to change the entire div content on button click

2  Asked on January 14, 2021 by liel-barouch


Sorting and paging nested documents

2  Asked on January 14, 2021 by taras-kohut


How to adjust for loop so that it prints list only once?

1  Asked on January 14, 2021 by benito-cano


Ask a Question

Get help from others!

© 2023 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP