An Exploratory Study Into Capsule Networks And How To Make Them Deeper

In the last few years, deep learning models have achieved outstanding results in various number of tasks such as object recognition and machine translation. Being able to extract abstract features from raw data and to learn representations that are expressed in terms of more straightforward representations, deep neural networks outperform other traditional computer
vision and machine learning methods.

Despite their success, convolutional neural networks have some limitations. They fail to use the underlying linear manifold, and they are a bad fit for the psychology of shape perception. This is the reason why Hinton thinks that the higher levels of a vision system should look like the representations used in computer graphics. He introduced in [7] the idea that individual neurons can not represent the pose of an object. Instead, the relations between objects are spread over many numbers, a whole bunch of active neurons called capsule, representing the pose of an entity. Furthermore, part-whole transformations play an essential role in hierarchical modeling in computer vision. Hinton stated in [15] that we need to route information in the image to the neurons that know how to deal with it. Convolutional networks do a basic primitive routing by means of pooling: the pooling unit selects the most active neuron only on the basis of its magnitude. We can think of routing as an iterative process. It is better to route the information to a
single capsule that can make sense of it. Active capsules at one level make predictions, via the part-whole transformation matrices, for the pose of a capsule in the layer above. When multiple votes agree, a higher level capsule becomes active. Hinton proposed two different routing algorithms: the first one in [15] is based on the scalar product, the second one in [8] is based on
the expectation-maximization algorithm.

The aim of this thesis is to build a PyTorch framework to gain a deeper understanding of capsule network theory and its advantages over convolutional networks on image classification tasks.
The thesis is structured as follows. Chapter 2 and Chapter 3 cover the main idea behind neural networks and deep learning methods, respectively. Chapter 4 discusses the drawbacks of the convolutional networks in image and object recognition. Then in Chapter 5, we introduce capsule networks, which attempt to solve the problems with these state of the art methods. In Chapter 6,
we present the datasets used to test our capsule network framework. Chapter 7 gives more details of the capsule network architectures used to conduct our experiments, which results are discussed in Chapter 8. In Chapter 9, we discuss some limitations of capsule networks and some future developments of our framework.

    For more information please fill out the form below