The convolutional neural networks


Computer Vision is an interdisciplinary field born in the late 1960s, whose purpose is to reproduce human visual systems through methods of acquiring, processing, analyzing, and understanding digital images. This scientific discipline deals with developing artificial systems that extract information from given input images, transforming it into the world’s descriptions. These days, one of the most critical tasks of Computer Vision is the image segmentation, or rather the automatic assignment of a label (which indicates the category of that pixel, for example, “asphalt” or “background”) to each image pixel. This task is in general solved using complex Convolutional neural networks, as SegNet.

The problem with this procedure concerns the Networks training process, since labeling real datasets by hand is very expensive. Moreover, the possible solution provided by the use of synthetic datasets to train the Network revealed that it generalized worse, regardless of how realistic they are.
This thesis faces up with the choice of the best synthetic dataset on which set the synthetic dataset to make it “similar” from the Network point of view to the correspondent real one, in order to optimize the cost of its training process.

We studied a statistical test based on the Maximum Mean Discrepancy (MMD), with the aim of answering the question of whether the datasets we want to compare have the same probability distribution or not. The MMD under certain conditions is a metric measure between these probability distributions, so we can use it to evaluate “distances” between datasets, and use the witness function to try to identify where these differences are, looking at the images which are less and more distant, according to this metric.

We evaluated the MMD theoretical tools first on a controlled environment, comparing datasets extracted from the handwritten digits dataset of MNIST and on the features extracted from the first convolutional layer of a simple CNN with classification task, LeNet, in order to find an interpretation to the network point of view respect to the differences between the compared datasets.

Then, we used these methods on a couple of synthetic datasets seen from a complex neural network with segmentation task, SegNet, to study its sensitivity and reactions to variations in the environment.