Teaching computers how to see like humans with Convolution Neural Networks

convolution neural networks


Digital Image Processing:

Google’s vast index had more than a trillion images some years ago. This number can’t possibly go down owing to the ever-increasing use of internet and use of pictures in conveying the message. But that was just the amount of images indexed by Google, there is also a significant portion of images found on the internet that aren’t indexed. Similar is the case with videos, with over 1 billion videos on YouTube alone it is a massive amount of data in the form of videos.

Videos and images are not only found on the internet. There are security cameras capturing videos of their area of effect at all times. Pictures and videos are being shot for personal and business use. They also make up for a hefty amount of data.

This vast amount of data is way beyond the scope of usual human intervention. This information needs to be processed through automation. Automation, machine learning, computer vision are some techniques using which we can actually have a chance to decipher this data and make sense of it. Digital image processing is the solution for this issue. It involves the use of computational methods for differentiating between the images and videos.

Digital image processing makes this data of images and videos much more viable for analysis purposes. A widespread use of image processing is employed by law enforcement agencies. They compare the facial features of the culprit with the databases of their own and the video inputs coming from around the country with the help of image processing. Additionally, the fingerprint software used by the police and other agencies also employ the use of image processing in its working.

Neural Networks:

Computers or machines do not have same concepts about vision and interpreting images and videos like humans and animals do. Our mind features an insight learning which allows us to make sense of the visual sensory input in a much faster and accurate manner than a computer. We can infer whether the given object is a cat, a dog or a chair irrespective of the color and type. The machine takes time to produce this result as it first makes a digital interpretation of the image and compares it with the images and objects placed in its database.

A new advanced field of machine learning has given rise to the technique of neural networks, which is a machine learning method, is used in processing the image. This method lends true learning capabilities to the machine which brings it closer to the working of the human brain. Human brain receives information from sensory organs, analyzes it and generates a response. Similarly, in the neural networks, there is an input node, a hidden node, and an output node. Information is received on the input node, the hidden node analyzes it, adds value to it and sends it to an output node which produces the action suggested.


Convolution Neural Networks:

Convolution neural networks is a modified version of the neural network. The convolution neural network gathers as many inputs as possible which means we should be able to capture most if not all the pixels in an image and process for further analysis. Makes sense, right?

Well not really. Believe it or not even with the large computers and multiple core CPUs and GPUs we have at our disposal these days – this is not practically possible because of constraints of time and processing power.

Convolution neural network decreases the sample size used in the analysis. An image usually contains hundreds of thousands or even millions of pixels. With the current computing capacity we have at our disposal – it is not viable to analyze each and every pixel. It is also a fact that pixels which are close to each other are very similar to each other though vary as the distance increases. So the sample size is decreased by making groups of pixels. One pixel, a representative, is selected from each group and put up with other such pixels. These pixels are then analyzed with ease as they are small in number and can be efficiently computed.


Source: kdnuggets.com


  1. The image is divided into smaller parts in the form of tiles.
  2. These are sent to a neural network which converts them from a network of tiles to arrays.
  3. The arrays represent the area of picture numerically. They are also assigned three axes of color, height, and width of the channel which are the three dimensions. A fourth dimension of time is also assigned if the input is, in fact, a video.
  4. The multidimensional arrays are then exposed to a downsampling function which removes the unnecessary and redundant information from these set of arrays so that only the required amount is analyzed.
  5. This data is then forwarded to the conventional neural network.
  6. The computations are done pretty quickly, and the desired output is generated in the form of labels.

The first three steps are also termed as convolution. And from that name of this neural network comes from. There are many convolution and down sampling layers in a real system. They all are working at the same time to reduce the sample to a manageable size.

Why is it better?

Convolution neural network is better than the conventional image processing method due to variety of reasons. Conventional image processing methods include: Conversion to greyscale of images and then comparing the pixels, comparing the pixels one by one, Scale-Invariant Feature Transform (SIFT), Binary Robust Independent Elementary Features (BRIEF) and Speeded-Up Robust Features (SURF) etc. These processing methods are either totally dying out or are inefficient when compared to convolution neural networks. Convolution neural network is much more efficient as it uses less time and fewer resources. Additionally, SIFT and SURF are somewhat comparable to convolution neural network, but SIFT and SURF offer face problems of inaccuracies. SIFT and SURF use Gaussian differences and differences of scales for detecting the object and often compromise on the data for simplifying the computations which essential for identification of an object. On the other hand, convolution neural networks though employ simplification of the image, but they make sure that no important detail is lost in the process.

Convolution neural systems are now being used around the world. Neural systems are inspired by and are derived from the human brain. When used for image processing, it sorts of gives the computer a human-like vision. It is due to this reason that almost all the face detectors software is using convolution neural networks for accuracy. It also uses so few resources that even your smartphones are using this for face detection.

This technology is relatively new and yet it has endless advantages over the traditional imaging recognition. It is already being used for law enforcement, security, health, and much more. Only time will tell how more can we benefit from it.

Leave a Reply

Your email address will not be published. Required fields are marked *