Preparing Multi-labelled Image dataset
In case of multiple output labels, caffe requires the input data to be formatted accordingly. A possible option is to create hdf5 dataset which contains the images and belonging labels.
After checking this matlab demo script, I have prepared a function which generates a hdf5 dataset from the images and labels stored in specified .txt file. For the following images with four output labels, the function generates a .h5 file which could be used for caffe training procedures:
./images/img1.jpg 1 1 -1 1 ./images/img2.jpg -1 1 1 -1
For the given example the function reads the chunks of images on the specified paths and label values and stores the data into hdf5 dataset:
%% Pattern for reading list of images from .txt file
pattern = '%s %d %d %d %d';
[names, l1, l2, l3, l4] = textread(images_labels, pattern);
labels = [l1, l2, l3, l4];
%%
created_flag=false;
totalct=0;
for batchno=1:num_total_samples/chunksz
fprintf('batch no. %d\n', batchno);
last_read=(batchno-1)*chunksz;
batchImages = [];
batchLabels = [];
for i = 1 : chunksz
imgPath = names(last_read+i);
img = imread(imgPath{1});
if size(img,1) ~= image_size(1) || size(img,2) ~= image_size(2)
img = imresize(img, image_size);
end
batchImages(:,:,:,i) = img;
batchLabels = [batchLabels; labels(last_read+i,:)];
end
% store to hdf5
startloc=struct('dat',[1,1,1,totalct+1], 'lab', [1,totalct+1]);
curr_dat_sz=store2hdf5(result_file, batchImages, batchLabels', ~created_flag, startloc, chunksz, Inf);
created_flag=true;% flag set so that file is created only once
totalct=curr_dat_sz(end);% updated dataset size (#samples)
end
The complete working example can be found here.