Machine learning gives a system the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning develops computer programs that can use data to learn.
Today, we hear people talk about artificial intelligence, the growing power of computers and robots which can perform a various task better than humans. Every day our machines are getting smarter, able to perform tasks beyond one’s imagination.
Ever wondered how your spam emails automatically get detected and reach the spam folder? How Facebook detects faces present in a photo you uploaded? in general, how are machine trained to learn a doable human task without hard coding them?
1- What is the connection between Artificial Intelligence and Machine Learning?
Experts define the Artificial Intelligence (AI) as an area of computer science that involve the creation of intelligent machines that work and react like humans. This includes simple and basic problems such as reading a document to more complex scenarios like speech recognition or high-level decision making. So where do machine learning (ML) stands?
Figure-1 shows the relation between AI and ML. Artificial intelligence is a wide area of infinite possibilities, and machine learning forms a part of it. Machine learning is giving a system the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning develops computer programs that can use data to learn.
Deep Learning (DL) is another subset of AI, to be more specific of ML, which takes the ideas of machine learning further in-depth (thus the name deep learning) and is the current hot topic of this area of science. The theory behind any machine learning algorithm or technique is purely mathematical, which includes a lot of statistics, stochastic methods, matrices, and set theory.
In other words, machine learning is for people with high mathematical skills and a good sense of programming. When you search for AI on google, images that will come up will be something like in figure 2. This is just for fancy representation, and real systems do not even come close to images like this one. In reality, a neural network for machine learning looks something like in figure 3.
It is a combination of various neurons and Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) link joining them together, imitating the structure of the human nervous system.
2- Applications of Machine Learning
Below are a few modern-day problems and applications of machine learning that we can see in day to day life. Every problem in machine learning is seen as how will a human mind approach or solve it and then try to replicate the same in smart computers/machines.
2.1- Classification Problems
Classification is one of the most basic and early tasks performed by smart computers which involve a simple operation of separating images according to the features present in it. For example, there are 100 images of cats and dogs. Manual separation of these images into individual classes of ”dogs” and ”cats” can take up to an hour. But networks trained for this task can perform this in a few seconds.
Here when a network is trained, it learns to identify and separate the features of a dog and that of a cat. For a human, these features could be anything from size and shape to color and behavior of the animal. The computer may or may not use the same features. This is where machine learning and deep learning differs.
In machine learning, the user has to specify these features to be learnt, whereas, in deep learning, the machine itself trains to learn its own features from the input data.
Fig. 2. Artificial Intelligence
Networks Today like ResNet have been able to perform with an accuracy of more than 96% on datasets like ImageNet, which has more than 14 million images from 21,500+ categories. These include categories like “dog”, “balloon”, “strawberry”, etc. With the advancement in computing technologies, deeper neural networks can be made, which may outperform current networks and achieve almost perfect results.
2.2- Email Classification Using Machine Learning
Everyone nowadays uses some or the other email service for daily work. When the email was introduced, it caught people’s attention very quickly and so came it problem which we now know as spams. Spams could be anything from basic advertising to irrelevant invitations or malicious content.
Today the email services have evolved to a stage where you need not worry about spams filling the inbox. It automatically gets identified and sent to a separate spam folder. This is another achievement of machine learning over the last decade. To identify an email as spam or not spam, the computer takes hints from the types of words in the message, their frequency of occurrence, presence of a certain type of images and the email ID of the sender.
The spam emails will have excessive use of words like ”free”, ”get”, ”coupons”, etc. in it. Training a network for this job is very easy and highly effective. Over the years, it was the user who was asked to mark an email as spam or not spam. The spam emails from this were then collected to train the network (don’t worry, your privacy over emails are secured as no human intervention is there, and the messages are well encrypted) to identify the spam from actual useful emails.
Fig. 3. Neural Network (courtesy: medium.com)
Even though 100% accuracy is still to be achieved, the system is very accurate. Even Today, the users have an option to make an email as spam if not already sorted or mark a proper email which was mistakenly marked as spam. This information is used to further improve the network and so making the system flawless. That is why same as humans, we say that the machine is continuously and always learning.
2.3 Natural Language Processing (NLP)
Understanding and processing day to day language is a big task for machines. Natural language processing is very different from what exists Today in the form of document reader or speech to text converter. These were a very preliminary task performed by the computers. When we talk about NLP, it includes teaching the machine to learn how to understand various emotions related to the text.
These include anger, happiness, command, a joke or tougher text which has sarcasm in it. A perfect machine would be one which can understand when a human statement is a command to complete and when it is a mere joke or a sarcastic statement. Also, we would expect the machine to generate a suitable reply, which is both relevant as well as polite to hear.
NLP, to some extent, is currently being used by companies like Apple Inc. and Google for their applications like Siri and Google Assistant. When a sentence is received by the machine, it needs to analyze the whole statement as one because the words appearing at the start of the sentence will have as much impact as the words at middle or end on the meaning of the sentence.
Also, how the words have been used will carry information about the type of the sentence (or emotion). This requires a time based continuous neural network. Example of such networks can be Long-Short Term Memory (LSTM) networks which form a part of Recurrent Neural Network (RNN). These have the ability to remember the earlier inputs for a short period and process the output accordingly.
2.4 Depth Analysis & Machine Learning
Imagine knowing the exact length of an object from its image. This is being made possible using machine learning wherein the distance or depth information from an image can be extracted. Let say there is a picture of a family with some background may be in front of a building. Now the idea behind depth analysis is to know the heights of each of the family member in the photo.
Not only this, but it can also be used for knowing the height of the building in the picture. Or in general, knowing the dimensions of any object in the image. The challenge behind the depth analysis arises due to the fact that images are 2-D representations of the 3-D world we see.
When a photo is clicked by a camera, it loses the range or depth information as the 3-D object is now captured in a 2-D screen. Suppose the x, y and z be the three-dimensional axis of the real world with x and y being horizontal and vertical and z be the distance from the camera.
Now when the image is taken by the camera, it loses this z-axis information and stores the image in a 2-D plane. Mathematically, it is impossible to retain the z-axis information. This is typically done using the Convolutional Neural Network (CNN), a type of deep learning technique in which information is to be obtained from images.
These are feed-forward networks consisting of convolution layers which perform the convolution operation on the input. A typical CNN is shown in figure 4. The convolution operation is followed by a down-sampling layer like max-pooling and average-pooling.
This is done to reduce the number of variables in the network and introduce some redundancy in the network. The CNNs are usually terminated with fully-connected layers also known as dense layers or given as input to some other network for cascading. The CNN is trained with images of objects with known dimensions.
Then once trained, the network can predict the dimensions of an object in an image. Not only this, but it can also be used to find distance or an object from the camera. Say you clicked an image of a car traveling on the road, it can tell the distance of that car from the camera. Google is currently working on this with good success in estimating dimensions of small close-by objects such as stationaries, dishes and another day to day objects.
Fig. 4. Convolutional Neural Network