What is Computer Vision in AI and Machine Learning? Technology Upgrade

Raktim Singh

July 6, 2021

What is Computer Vision in AI and Machine Learning?

Computer vision is a field in artificial intelligence in computer science that aims to provide computers a visual understanding of the world.

It helps in building a machine, which has the ability to look at an image or video, understand that image/video & take relevant action.

We can say that, the goal of computer vision (is computer vision part of AI) problems is to use the observed image data to infer something about the world.

Machine with computer vision, contains algorithms that can track objects in video footage and reconstruct 3D models of them.

Many of us, must have heard of ‘Amazon Go’ stores. Here, one does not have to wait in long lines/queues for checkout/billing for the items purchased.

Here, using computer vision(computer vision of ai deals with), deep learning and sensors, machines identify the items picked, which were picked from the shelves, add it to virtual carts, and charge the amazon account.

The main aim of computer vision is to mimic human vision using digital images, using three main processing components executed one after the other.

Image acquisition
Image processing
Image analysis and understanding

The scientific discipline of computer vision relates to the theory of artificial intelligence to extract information from images.

The image data can take various forms, such as a view from multiple cameras, video sequences, multidimensional data from a 3D scanner.

Computer vision Technology aims to apply its theories and models construction of computer vision systems.

Computer vision also includes scene reconstruction, object recognition, event detection, 3D scene modeling, motion estimation, video tracking, image restoration.

So let understand What is Computer vision in AI and How did it begin?

It started in the 1960s by universities that were studying artificial intelligence.

It was designed to imitate the human visual system robots could have intelligent behavior.

In the 1970s, the goal of the study was to achieve full scene understanding along with three-dimensional structures from an image.

The coming decade had a more rigorous analysis and quantitative aspect of computer vision. Concepts such as scale-space, texture, and focus.

By 1990, research in project 3D reconstructions got a better understanding of camera calibration, this further lead to the methods of 3D reconstructions options from multiple images.

There is a resurgence of feature-based methods in recent works, which is used in conjunction with machine learning techniques.

Deep learning techniques have brought new light to the field of computer vision. This is about What is Computer vision in AI , now

How it works

Computer vision needs lots of data (images/pictures/videos.).

It runs analyses of data over and over until it discerns distinctions and ultimately recognize images.

For a computer vision algorithm, pictures are arrays of color pixels that can be statistically mapped to a certain description. Image is stored as vector array in digital form.

With CV, machine tries to understand the content of digital images. This involve extracting a description from the image, which may be an object, a text description, a three-dimensional model, and so on.

Two essential technologies are used to accomplish this:

Convolutional neural network (CNN)
Type of machine learning called deep learning

ANN (Artificial Neural network) & CNN

An ANN refers to a network of interconnected, layered processing elements that work together to power computer vision. ANNs works almost same as the neural network configurations of the human brain. This allows the computers to see images/ videos and learn exactly what is in them.

The most popular architecture used for image classification is Convolutional Neural Networks (CNNs), which is a variant of ANN with convolution and pooling layers.

CNN

A CNN helps a machine learning or deep learning model “look” by breaking images down into pixels that are given tags or labels. It uses the tags/labels to perform convolutions (a mathematical operation on two functions to produce a third function) and makes predictions about what it is “seeing.”.

As a human, when we see, say image from a distance, we try to figure out the contour of that image. Same way, here, CNN first discerns hard edges and simple shapes, then fills in information as it runs iterations of its predictions.

It is then recognizing or seeing images in a way similar to humans.

The neural network runs convolutions and checks the accuracy of its predictions in a series of iterations until the predictions start to come true.

A CNN is used to understand single images.

On the other hand, a recurrent neural network (RNN) is used in a similar way for video applications to help computers understand how pictures in a series of frames are related to one another.

An RNN can handle sequential data, accepting the current input data, and previously received inputs. RNNs can memorize previous inputs due to their internal memory.

Machine Learning

Machine learning uses algorithmic models that enable a computer to teach itself about the context of visual data. Once enough data is fed through the model, the computer will “look” at the data and teach itself to tell one image from another. Algorithms enable the machine to learn by itself, rather than someone programming it to recognize an image.

Related fields to computer vision

Neurobiology

An important role is played by computer vision in the field of neurobiology. There has been extensive research on the study of eyes, neurons, and the brain structure involved in the processing of visual stimuli in both humans and various animals. This has led to a subfield in computer vision where the processing of different biological systems is mimicked.

Robotic navigation

Computer vision helps by providing information about the environment where a path is to be navigated.

Examples of computer vision Technology

Content organization

Computer vision helps us to organize our content. An example of this is apple photos.

The app can access our photo collections, automatically structures them, add text to them to create a more curated way of the best moments.

Facial recognition.

In biometric authentication, facial recognition plays a crucial role. Mobile devices are having face unlock applications. The front camera is used for face recognition; Based on the analysis of, (face) image , the device gives authorization to the person.

Augmented reality

Computer vision is a key element in augmented reality apps. AR apps identify physical objects in a given space and use this information to provide more data/information about the in the physical object.

Self-driving cars

Cars can make sense of their surroundings with the help of computer vision. In a smart vehicle, few cameras capture videos at different angles and send them as an input signal to the software. The video is processed and detect objects such as cars, traffic lights, pedestrian, etc.

Health

90% of medical data which is used for diagnosis is in the form of an image. Image processing technologies such as MRI, X-rays, etc., have been proved beneficial. For instance, diabetic retinopathy can be detected with the help of computer vision algorithms. Cancer detection is also possible with its help. It is also possible to identify tumor regions by not getting confused with the normal areas which look like a tumor.

Sports

CV is helping various sport persons during their training, by recognizing activity patterns and analyzing the performance.

in case of gymnastics, the system could look into the performance of a gymnast, and prepare a report on the areas of improvements, strength, etc.

CV is also helping/improving referee decisions by tracking players/objects in sports.

Hawk-Eye system is used in Tennis, cricket & football.

As per Wikipedia

Hawk-Eye is a computer vision system used in numerous sports such as cricket, tennis, Gaelic football, badminton, hurling, rugby union, association football, and volleyball, to visually track the trajectory of the ball and display a profile of its statistically most likely path as a moving image

So, in Cricket, CV tracks the ball and predicts its trajectory. Based on that, the umpire can decide, whether a batsman was out ( LBW) or no.

Agriculture

Common agricultural problems such as weeds emergence or nutrient deficiency are managed with the help of computer vision.

CV helps in monitoring the health of plants. With CV, now we can prevent and control weeds, insects/pests as well as other plant diseases.

It also helps in monitoring the health of Cattle. By analyzing the behavior & body movement of animals, one can detect if a particular animal is sick.

Amazon Echo Look

Amazon echo look is dedicated to fashion. It has a feature of voice-activated camera work, detailed cinematography for best pictures. AI components help you to choose from a wide range of clothing.

Document scanning

In almost all industries, various legal documents are required, while making any contract.

Those legal documents need to get signed by all parties involved in that deal.

As of now, one person has to manually check, whether the document is complete & duly signed by all parties. Now this work can be automated with CV.

Some of the big players involved in CV are

Amazon, Chooch AI, Clarifai, Deepomatic, Google,IBM, Microsoft, Neurala and SAS.

Apart from these big players, I have listed below, some other players, who are creating new products, based on CV.

SenseTime

They are mainly in facial recognition technology.

Nauto

They work with commercial and autonomous fleets in order to reduce the number of collisions (thanks to advanced AI solutions).

Hawk-Eye Innovations

Operating mainly in the sports niche, Hawk-Eye is helping umpires, in taking better decisions.

It can track balls accurately using vision processing technology. If an official makes an incorrect decision, teams can challenge the call and the technology will provide a definitive answer.

Intello Labs

They are helping in agriculture. Now farmers can take photographs of specific crops & upload it to the Intello Labs database.

Depending on their needs, they can learn about weeds, pests, diseases, and more.

OrCam

OrCam is helping people, who are either fully or partially blind. With their product, consumers can read digital and printed text, recognize hand gestures, identify products in real-time, and pick out bank notes for payments.

Conclusion: What is Computer Vision in AI and Machine Learning

What is Computer Vision in Artificial Intelligence starts with the definition of Computer vision

Computer vision is a field that studies the theory and practice of processing digital images.

Computer vision has the power to change every field ranging from Health Care, security, gaming, and many more.

Computer Vision’s applications are much broader than they may first appear.

The field itself is also transforming, expanding its scope and finding new applications in domains as diverse as vision-based vehicle navigation, monitoring and analyzing the performance of computer systems and power grids, security systems, robotics, image-guided surgery, dentistry, and more.

Although privacy can be an issue here since our data is exposed to access by anyone, therefore, it can be concluded that computer vision has both advantages and disadvantages, but used vigilantly, it can help in changing the world.

Hope you got clarity on What is Computer Vision in AI and Machine Learning?