Azure provides a robust set
of artificial intelligence (AI) features called Cognitive Services that developers can easily utilize within their own applications. The Cognitive Services offering is currently divided into five main categories: Vision, Speech, Language, Knowledge,
and Search.
One of the AI services under the Vision category is Computer Vision. This service analyzes an image and extracts information about the content within the image.
Examples of what Computer Vision can do include:
- Tag visual items
- Categorize an image
- Describe an image
- Recognize faces within an image
- Evaluate colors and generate thumbnails
- Provide details for moderating content within images
Tags, Category, and Description
Computer Vision can tag visual items within an image based on more than 2,000 recognizable objects, such as people, scenery, and actions. Along with each tag, Computer Vision will include a confidence score. The results are sorted by the confidence score
from highest to lowest.
In addition to tagging, it is also able to return taxonomy-based categories based on a list of 87 concepts, like faces, food, nature, and abstract.
Using the tags, Computer Vision will generate a description of the image displayed as human-readable text in a complete sentence.
Let’s look at an example:
Pic: Instagrammer @snissenful
Tags
Name | Confidence |
---|
dog | 0.99915123 |
sitting | 0.9753965 |
indoor | 0.9698432 |
black | 0.8994449 |
laying | 0.7277232 |
white | 0.6921732 |
pug | 0.6921732 |
bulldog | 0.11564628 |
Category
Name | Confidence |
---|
animal_dog | 0.98828125 |
Description
Caption | Confidence |
---|
a large black dog lying on the ground | 0.8761196 |
Above are the results returned from Computer Vision for an image of my pug, Jasmine. For the tags, it was able to identify a black and white dog lying indoors. Also, based on the confidence scores returned, it was able to determine the dog is most
likely a pug. It returned the correct category of animal – dog, but the description of “a large black dog lying on the ground” isn’t quite accurate. It should be “a small, portly pug flopping on the couch.”
Faces
Face detection is another feature of Computer Vision. This AI technology provides the ability to detect human faces within images and return the face coordinates as well each person’s gender and age.
Faces
index | age | gender | coordinates |
---|
0 | 52 | Male | top | 167 | left | 558 | width | 65 | height | 65 |
|
1 | 51 | Male | top | 77 | left | 666 | width | 64 | height | 64 |
|
2 | 43 | Female | top | 117 | left | 311 | width | 54 | height | 54 |
|
3 | 34 | Male | top | 69 | left | 149 | width | 52 | height | 52 |
|
4 | 49 | Female | top | 126 | left | 459 | width | 50 | height | 50 |
|
Above are the results returned from Computer Vision for an image of the Far Reach partners. For the faces, it was able to detect all 5 faces of the partners and identify the appropriate gender. The estimated ages were accurate for a couple partners
but the others were off by 10 years…some of us must be aging faster than others.
Colors
Computer Vision can also perceive color schemes using an algorithm that is capable of extracting individual colors from an image. The colors are analyzed in three different contexts: foreground, background, and as a whole. An accent color is extracted
from an image and represents the most visible color to users through a mix of dominant colors and saturation.
Pic: Instagrammer @snissenful
Tags
Name | Confidence |
---|
plant | 0.964872241 |
flower | 0.942176044 |
red | 0.8848852 |
garden | 0.133545712 |
flora | 0.07647245 |
leaf | 0.0756237 |
Category
Name | Confidence |
---|
plant_flower | 0.85546875 |
Description
Caption | Confidence |
---|
a close up of a flower | 0. 983600438 |
Colors
Dominant color background | |
Dominant color foreground | Red |
Accent color | #C40724 |
In this example, Computer Vision determined a dominant background color of green, a dominant foreground color of red, and an accent color with the hex value of #C40724. Not bad!
Thumbnails
Not all images are suited for all devices. Therefore, it’s sometimes necessary to generate different thumbnail sizes to provide a better user experience on certain devices. Computer Vision is able to generate thumbnails by identifying a
region of interest (ROI). It uses a thumbnail algorithm to recognize the main object, which is the region of interest, and remove distracting elements from the image. It then crops the image based on the identified ROI. And finally, it changes
the aspect ratio to fit the target thumbnail dimensions.
Pic:
Instagrammer
@snissenful
In this example, the original image seen above results in the thumbnail images below. The thumbnail images have been automatically
cropped according to the different target thumbnail dimensions based on the region of interest, which in this case is the kitty in a bucket in the picture. This feature is referred to as smart cropping.
Pic: Instagrammer @snissenful
There are several other capabilities within Computer Vision beyond those described above. You can also…
- Identify celebrities and landmarks utilizing models
- Define your own models for performing custom image recognition
- Extract text using optical character recognition (OCR) and handwriting recognition
- Evaluate images for potential adult content to assist with content moderation
- Determine other image metadata such as dimensions, image type, and format
A picture is worth a thousand words. And with the help of Computer Vision, developers can easily determine a few of those words through code.