Deep learning is part of a broader family of machine learning methods based on artificial neural networks which learn by example. The learning aspect can occur (semi-)supervised or unsupervised. Artificial neural networks were inspired by information processing and distributed communication nodes in biological systems. Deep learning architectures (e.g. convolutional neural networks) are applied in a range of highly complex applications across industries. Overall, these architectures automatically produce results comparable to and in some cases surpass human expert performance.
Typically, computer vision applications involving extreme complexities could benefit greatly from deep learning. The following example in the autonomous vehicle sector illustrates this.
This is why deep learning has found it's way into many industries. In medical imaging based evaluation processes it is used to achieve a higher degree of surgical expertise. In the food industry, deep learning allows for faster and more qualitative item segmentation and category classification. Deep learning is used across many industries, including security and logistics.
What is deep learning?
Challenges in computer vision often focus on detecting various items on still or moving images. Convolutional neural networks, a specific category of deep learning, lend themselves well to explain the concept of deep learning using 2D filtering operations.
In the example below a specific 2D filtering operation, known as a convolution, is applied on a particular image to illustrate the calculation of the intensity of the horizontal edges present at each location in the image.
More complex features in images can be interpreted by applying many straightforward filters. The schematic below illustrates the deep learning filtering sequence that recognizes the character in an image (in this case the character ‘A’). Each column represents a layer where each blue circle represents a filtering operation applied on the previous layer. This filtering operation identifies the resemblance with a particular shape (by combining the result of the previous layer). The grey value within the circles of the filters indicates the level of resemblance to that shape, ranging from black (0% resemblance) to white (100% resemblance).
The end result of this recognition process is a score for each possible class, which is used for classification of the image. The many layers of filters in this approach illustrate the 'deep' in deep learning.
Teaching improves predictive quality
It is clear that applying consecutive filtering can detect highly complex patterns. However, it is infeasible to manually select each of the filters in order to achieve the best results. The solution is to teach the model, such that the appropriate filters and filtering sequence can be determined, by feeding in correct results as examples.
In a first step, an architecture is defined with random filters. We then provide the model with an image from our training set for which the correct result is known. In this case, we show the model an image with the character 'A' and correct the model outcome.
The initial model will make a prediction. As the model has never seen any image data before, it is very likely to be incorrect. The model is then very slightly adjusted to get a better prediction for this example.
This way, through consecutive teaching with many examples of each possible outcome, the model can attain very reliable results.
This process is similar to how humans learn, where children learn through examples and repitition and gradually learn to distinguish the shapes of characters over time.