Our sense of sight allows us to visualise shapes, colour and movements around us; the images that are captured by our eyes are analysed and processed by our brains, enabling us to see the world we live in. This fascinating and complex function allows us to connect with our surroundings. However, vision is not only used by humans and other living organisms, it is also used by computers.
Computer vision is the process of extracting, analysing, understanding and learning from a single image or videos. The aim of computer vision is to enable a computer to perform the same tasks as a human with the same accuracy and capability. In fact, computer vision applications are already ubiquitous. The face detector on your mobile phone that is used to focus or adjust the lighting when you take a photo, the number plate recognition as you enter charge zones, or Google’s organisation of your photos so that you can search for visual content or group photos of the same person, all use computer vision.
However, human vision is a notoriously difficult thing to replicate. We are able to look at a picture and almost subconsciously identify objects such as trees and houses. Computers were originally designed to work with numbers. The use of machines for vision, e.g. being able to identify an individual’s face in a photo, is a fairly recent phenomenon. For a computer to have the ability to identify a person or specific object, it must first be taught to do so.
Essentially, in order to teach a computer to match or distinguish faces from photos requires several million photos of several thousand people. An algorithm learns how to separate photos of the same person from photos of other people. The data speaks for itself, and determines what best distinguishes one person’s face from another, represented as a set of numbers that provide a unique digital `signature’ for each face. When you upload a photo, the algorithm creates a `signature’ for that face and matches it to the signatures of the faces already in your photo collection. In a similar manner a computer can be taught to recognise different animals or vehicles or plants.
This technology allows computers to identify multiple different objects within a single image or video stream, all within a matter of seconds. The future of computer vision looks promising, with a variety of real-world uses. Professor Andrew Zisserman FRS is a leader in the world of computer vision, and is also the 2017 Royal Society Milner Award winner. His research focuses on computer visual recognition, image retrieval and multi-view geometry. Join us at this year’s Milner Award lecture given by Professor Zisserman, ‘How can a computer understand what is happening in a video?’, taking place on 22 November.