A Brief History of Deep Learning

deep learning


Deep learning is a branch of machine learning concerned with algorithms inspired by the structure and function of the brain known as Artificial Neural Networks.

deep learning

Deep learning employees algorithms to process data and resemble the thinking process or to develop abstractions. It uses layers of algorithms to process data, visually recognize objects and understand Human speech. Information is passed through each layer, Input for the next layer is provided by the output of the previous layer. In a network the first layer is Input layer where as the last layer is the Output layer. All layers between the input and output layers are referred to as Hidden layers .Every individual layer is typically a simple, uniform algorithm containing single kind of activation function.


One of the aspects of Deep Learning is Feature Extraction. It uses an algorithm to automatically construct meaningful “features” of the data for the reasons of training, understanding and learning .Generally, the Data Scientist or the Programmer is the cause for Feature Extraction.

According to the history of Deep Learning in the year 1943 when Walter Pitts and Warren McCulloch created a computer model based on the neural networks of the Human Brain. They created the term “Threshold logic” which they called as the combination of Algorithms and Mathematics to mimic thought process. Since that time, Deep Learning has steadily evolved with two significant breaks in its development. Both were tied to Artificial Intelligence.

In 1960, Henry J. Kelley is given credit for developing the basics of Back Propagation Model. In 1962 Stuart Dreyfus has developed a simpler version based only on the Chain rule.

The earliest efforts came from Alexey Grigoryevich Ivakhnenko in developing the Deep Learning Algorithms. He developed the “Group Method of Data Handling” whereas Valentin Grigoryevich Lapa is the “Author of Cybernetics and Forecasting Techniques” in 1965.They used complicated equations (polynomials) that were then analyzed statistically. From each layer the best statistically concluded features were forwarded on to the next layer it is known as the slow manual process.

During the 1970’s the first AI( Artificial Intelligence)  kicked in the result of promises that couldn’t kept .The impact of this lack of funding limited both  Deep Learning and Artificial Intelligence research. There were some individuals who carried on research without funding.

Kunihiko Fukushima used the first “Convolutional neural networks”. He designed Neural Networks with convolutional layers and multiple pooling. In 1979, an Artificial Neural network, called Neocognitron was developed by him which used a Hierarchical, Multi layered design. This design helped the computer the “learn” to recognize the visual patterns. The networks resembled modern versions. Fukushima’s design helped important features to be adjusted manually by increasing the   “Weight” of certain connections.

Many concepts of Neocognitron continue to be used. The usage of top-down connections and new learning methods have helped for a variety of neural networks  to be realized.  If more than one pattern is designed at the same time, the Selective Attention Model can divide and recognize single patterns by shifting its attention from one to the other. A modern Neocognitron has two advantages. It not only indentifies patterns with missing information but also completes the image by adding the missing information .The process can be alternatively called as Inference.

In 1970, as per the Back Propagation, the use of errors in training Deep learning models  evolved significantly. It was when Seppo  Linnainmaa wrote his Master’s Thesis, adding a FORTAN code for Back Propagation. Sadly, the concept was not applied to neural networks until 1985. Yaan LeCun provided the first demonstration of Back Propagation Bell labs. He combined Back Propagation with convolutional neural networks onto read “Handwritten” digits. This System was eventually used to study the number of handwritten checks.

During (1985-90s) second Artificial Intelligence Winter also affected the research of neural network s and Deep Learning. In 1995, Vladimir Vapnik and Dana cortes developed the support vector machine which is the system for mapping and recognizing the similar data. Long Short-Term Memory (LSTM) for recurrent neural networks was developed in 1997 by Sepp  Hochrieter and Juergen Schmidhuber.

In   1999, there was a significant evolutionary step for Deep Learning took place. GPU (Graphics Processing Units) were developed due to faster processing data  by the computers .It increased the computational speeds 1000 times over a 10 year span. During this time the competition between the neural networks and support vector machines came into existence.

The  “Vanishing Gradient Problem “ appeared in 2000.It was discovered features formed in lower layers were not being learned by the upper layers due to no reach of signals to these layers.

In 2001, a research report by META Group (Gartner) described the opportunities and challenges of data growth is 3d  (three- dimensional).It describe the increasing volume and speed of the data as increasing the  data sources and types .This was a call to prepare the onslaught of Big Data.

In 2009 Fie-Fie Li, an Artificial Intelligence (AI) professor launched Image Net at Stanford. A free data base has been assembled with more than 14 million labeled Images.

The Speed of GPU’S had increased outstandingly by 2011.Making the possibility to train neural networks without the layer-by-layer pre-training.  With the increase in computer speed, Deep learning had significant advantages in terms of speed and efficiency. In 2012, Google Brain released the results of an unusual project known as The Cat Experiment. It explored the difficulties of “unsupervised learning.” Using this, a convolutional neural net is given unlabeled data and is then asked to check recurring patterns.

Neural net spread over 1000 computers was used by the Cat Experiment .10 million unlabeled images were randomly taken from the YouTube, shown to the system and then the training software is allowed to run. Unsupervised learning plays vital role in the field of Deep Learning. When compared to forerunners Cat Experiment works 70 percent better in processing unlabeled images.

At present, The Evolution of Artificial Intelligence and the processing of Big Data are both dependent on Deep Learning.



Leave a Reply

Your email address will not be published. Required fields are marked *