Preparing Multi-labelled Image dataset

In case of multiple output labels, caffe requires the input data to be formatted accordingly. A possible option is to create hdf5 dataset which contains the images and belonging labels.

After checking this matlab demo script, I have prepared a function which generates a hdf5 dataset from the images and labels stored in specified .txt file. For the following images with four output labels, the function generates a .h5 file which could be used for caffe training procedures:

./images/img1.jpg 1 1 -1 1
./images/img2.jpg -1 1 1 -1

For the given example the function reads the chunks of images on the specified paths and label values and stores the data into hdf5 dataset:

%% Pattern for reading list of images from .txt file
pattern = '%s %d %d %d %d';
[names, l1, l2, l3, l4] = textread(images_labels, pattern);
labels = [l1, l2, l3, l4];
%%


created_flag=false;
totalct=0;
for batchno=1:num_total_samples/chunksz
  fprintf('batch no. %d\n', batchno);
  last_read=(batchno-1)*chunksz;
  batchImages = [];
  batchLabels = [];
  for i = 1 : chunksz
      imgPath = names(last_read+i);
      img = imread(imgPath{1});
	  if size(img,1) ~= image_size(1) || size(img,2) ~= image_size(2)
		img = imresize(img, image_size);
	  end
      batchImages(:,:,:,i) = img;
      batchLabels = [batchLabels; labels(last_read+i,:)];
  end

  % store to hdf5
  startloc=struct('dat',[1,1,1,totalct+1], 'lab', [1,totalct+1]);
  curr_dat_sz=store2hdf5(result_file, batchImages, batchLabels', ~created_flag, startloc, chunksz, Inf);
  created_flag=true;% flag set so that file is created only once
  totalct=curr_dat_sz(end);% updated dataset size (#samples)
end

The complete working example can be found here.

comments powered by Disqus