What is Computer vision?

AI Glossary

Computer vision

computer vision, machine vision, CV

Computer vision is the field of artificial intelligence that teaches machines to understand images and video — to detect objects, classify scenes or read text — rather than treat them as raw pixels.

It lets machines detect objects, classify images and read text from photos and video.
Modern methods are built on deep learning and neural networks.
For images it is the counterpart of what natural language processing is for text.

Computer vision is the field of artificial intelligence concerned with how machines should understand the content of images and video. The tasks here span a broad range: recognizing objects in a photo, classifying an entire scene, spotting defects on a production line, or reading text from a document. The goal is always the same — to move from raw pixels to information you can act on.

Modern computer vision rests above all on deep learning and neural networks, which learn to recognize visual patterns from many examples rather than relying on hand-written rules. You can think of it as the counterpart of natural language processing — one field works on images, the other on text. When the two worlds come together in a single model, we speak of multimodality: the ability to process images and text at the same time.

In enterprise use, computer vision powers quality control in manufacturing, automated reading of invoices and documents, analysis of camera footage, and identity verification. Most of the time you don't build such a solution from scratch — you take a ready model trained on large image datasets and adapt it to the specific case, for example to recognize one type of part in photos from the shop floor.

Related terms