Azure Cognitive Services: Computer Vision

Posted on Nov 29, 2018 | Jason Nissen

Azure provides a robust set of artificial intelligence (AI) features called Cognitive Services that developers can easily utilize within their own applications. The Cognitive Services offering is currently divided into five main categories: Vision, Speech, Language, Knowledge, and Search.

One of the AI services under the Vision category is Computer Vision. This service analyzes an image and extracts information about the content within the image.

Examples of what Computer Vision can do include:

Tag visual items
Categorize an image
Describe an image
Recognize faces within an image
Evaluate colors and generate thumbnails
Provide details for moderating content within images

Tags, Category, and Description

Computer Vision can tag visual items within an image based on more than 2,000 recognizable objects, such as people, scenery, and actions. Along with each tag, Computer Vision will include a confidence score. The results are sorted by the confidence score from highest to lowest.

In addition to tagging, it is also able to return taxonomy-based categories based on a list of 87 concepts, like faces, food, nature, and abstract.

Using the tags, Computer Vision will generate a description of the image displayed as human-readable text in a complete sentence.

Let’s look at an example:

Pug
Pic: Instagrammer @snissenful

Tags

Name	Confidence
dog	0.99915123
sitting	0.9753965
indoor	0.9698432
black	0.8994449
laying	0.7277232
white	0.6921732
pug	0.6921732
bulldog	0.11564628

Category

Name	Confidence
animal_dog	0.98828125

Description

Caption	Confidence
a large black dog lying on the ground	0.8761196

Above are the results returned from Computer Vision for an image of my pug, Jasmine. For the tags, it was able to identify a black and white dog lying indoors. Also, based on the confidence scores returned, it was able to determine the dog is most likely a pug. It returned the correct category of animal – dog, but the description of “a large black dog lying on the ground” isn’t quite accurate. It should be “a small, portly pug flopping on the couch.”

Faces

Face detection is another feature of Computer Vision. This AI technology provides the ability to detect human faces within images and return the face coordinates as well each person’s gender and age.

Far Reach Partners

Faces

index

age

gender

coordinates

Male

top

167

left

558

width

height

Male

top

left

666

width

height

Female

top

117

left

311

width

height

Male

top

left

149

width

height

Female

top

126

left

459

width

height

Above are the results returned from Computer Vision for an image of the Far Reach partners. For the faces, it was able to detect all 5 faces of the partners and identify the appropriate gender. The estimated ages were accurate for a couple partners but the others were off by 10 years…some of us must be aging faster than others.

Colors

Computer Vision can also perceive color schemes using an algorithm that is capable of extracting individual colors from an image. The colors are analyzed in three different contexts: foreground, background, and as a whole. An accent color is extracted from an image and represents the most visible color to users through a mix of dominant colors and saturation.

Flower
Pic: Instagrammer @snissenful

Tags

Name	Confidence
plant	0.964872241
flower	0.942176044
red	0.8848852
garden	0.133545712
flora	0.07647245
leaf	0.0756237

Category

Name	Confidence
plant_flower	0.85546875

Description

Caption	Confidence
a close up of a flower	0. 983600438

Colors

Dominant color background

Dominant color foreground

Red

Accent color

#C40724

In this example, Computer Vision determined a dominant background color of green, a dominant foreground color of red, and an accent color with the hex value of #C40724. Not bad!

Thumbnails

Not all images are suited for all devices. Therefore, it’s sometimes necessary to generate different thumbnail sizes to provide a better user experience on certain devices. Computer Vision is able to generate thumbnails by identifying a region of interest (ROI). It uses a thumbnail algorithm to recognize the main object, which is the region of interest, and remove distracting elements from the image. It then crops the image based on the identified ROI. And finally, it changes the aspect ratio to fit the target thumbnail dimensions.

Cat
Pic: Instagrammer @snissenful

In this example, the original image seen above results in the thumbnail images below. The thumbnail images have been automatically cropped according to the different target thumbnail dimensions based on the region of interest, which in this case is the kitty in a bucket in the picture. This feature is referred to as smart cropping.

cat thumbnails
Pic: Instagrammer @snissenful

There are several other capabilities within Computer Vision beyond those described above. You can also…

Identify celebrities and landmarks utilizing models
Define your own models for performing custom image recognition
Extract text using optical character recognition (OCR) and handwriting recognition
Evaluate images for potential adult content to assist with content moderation
Determine other image metadata such as dimensions, image type, and format

A picture is worth a thousand words. And with the help of Computer Vision, developers can easily determine a few of those words through code.

Blog

Azure Cognitive Services: Computer Vision

Tags, Category, and Description

Faces

Colors

Thumbnails

Related posts

Let's work together.