An Adventure with Cognitive Image Tagging

At Verndale, we decided to create an experimental Sitecore module that uses Artificial Intelligence and integrates with Sitecore. It was an interesting exercise that shows the power and limits of AI. The code from this experiment is available online, and the Sitecore module can even be installed via NuGet.

The Use Case

Although required for SEO and Accessibility guidelines, Content Authors often neglect to provide Alternate Text for images displayed on their website. What if you could use Artificial Intelligence to automatically generate Alt Text for Image Items stored in the Sitecore Media Library?

It was decided to handle this need with three distinct features:

When an Author uploads an Image to the Media Library, supply an Alt Text value as part of the upload process.
When an author selects an Image Item in the Media Library provide a button that allows the author to request Alt text from AI provided no Alt text has already been entered.
When an author selects an Image Item in the Media LIbrary, provide a button that allows the author to override any existing Alt text value with one from AI.

Breaking Down the Work

To allow for maximum flexibility, the work was broken down into two discrete libraries:

Verndale.CognitiveImageTagging

This library handles the actual connection to Azure Cognitive Services. It specifies configuration file settings that can be used to define one or more endpoints (as well as subscriptions). This allows developers to have one account for local testing and another account for production use.

While Azure offers a decent number of API calls regarding image analysis, we opted for the two that would provide the most useful content for Image Alt Text.

The Description service provides “captions” and “tags” Captions are usually a single sentence that describes the image. Captions have a “confidence” score that the AI uses to tell you whether it thinks the description is accurate. In our library we set a certainty threshold and only accept the highest confidence caption above that threshold. Tags are single words or small phrases that apply to the image. Tags don’t have a confidence threshold, but my experimentation shows that the deeper into the list a tag is, the less likely it applies.

A secondary service provides Optical Character Recognition. We use this service to extract text from within the image, which will then be assigned as Alt text.

The OCR service is only called if the Descriptive service reports that the image likely contains text.

Verndale.Feature.CognitiveImageTagging

The Helix nomenclature should give away the fact that this is the Sitecore module. It has a NuGet dependency on the base library, adding connections to the Sitecore content entry interface to allow content authors access to AI.

Once installed, all images uploaded to the Sitecore media library will be analyzed, and a default Alt Text provided by Azure cognitive services.

When a content author selects an Image Item in the media library, additional context buttons are available to allow them to add Alt Text at will. A “safe” button that will not override any existing Alt Text is provided, along with an “override” button that will replace any existing Alt Text value with whatever Azure decides to send.

How the Module decides what text to put in the Alt Text field:

If the image contains text, the image is passed through OCR, and the resulting value is used for the Alt Text field.
If image analysis produces Caption text, the highest confidence Caption will be used, as long as it is above the minimum confidence threshold.
If no caption is valid, the Tags are written to the Alt Text field, separated by commas.

Experiments

Messing around during testing showed that our default Caption confidence level is on the high side. Here’s an image with an example caption. The caption is sorta OK, but pretty generic. AI was only 50% sure about this sentence.

undefined

If we turn the Caption confidence up to 75%, we get these tags instead. Again, they’re sorta OK, but a little contrived.

undefined

I love how the AI seems to make up its own little story about what’s going on in the picture.

Here’s an example of a particularly zealous caption:

undefined

I took this photo, and I can verify it’s not the Champlin Fighter Museum.

Perhaps the most shocking result came from processing a published image off the internet. I can’t post the photo here for copyright reasons, but the AI was able to identify all people in the photo and even identified the band they played in. My hypothesis is that captions are not random, but are pulled from “similar” pictures that actually have Alt Text already defined on them.

Lessons Learned

Wiring up an AI request to a few Command buttons in Sitecore was trivial. Sandy Foley had already done the hard work of writing the interface to the Azure Cognitive Services – although I had to go back into the library and simplify the calls. Sandy was following Azure examples using modern C# asynchronous technique. These are not supported by Sitecore’s 15 year old Sheer UI API. After spending a day in Dinosaur land replacing HttpClient with HttpRequest, Sitecore’s Command buttons accepted their new job.

Contacting AI to analyze images takes a few seconds. Plan your install accordingly. I should also mention that it usually takes two or three Http requests out to Azure to fully analyze an image (which may contain text), and Azure isn’t free, although it’s pretty affordable. For testing you could probably get away with $5/mo or less.

Does Azure Cognitive Services produce good Alt Text?

No.

Almost Never.

At best, the captions are generalized or tangentally related to the actual image. Good Alt text is incredibly descriptive – remember – Alt Text is basically Closed Captioning for the blind. You want to include who, what, when, where, why and how – AI won’t do that. AI also won’t use curated SEO keywords. Humans are definitely a better solution.

AI captions could get you in legal trouble.

While playing around I found that AI could not always reliably identify images of people, particularly if it was a close up, action shot, or had post-processing applied. It identified one of my selfies as a “cat”.

If you’re going to use this library I strongly recommend you implement some workflow checkpoints to ensure each and every Alt tag is reviewed for suitability to avoid any problems involving SEO or lawyers.

Next Steps

You can deploy this Sitecore Module to any Sitecore 9.1 Solution via NuGet:

https://www.nuget.org/packages/Verndale.Feature.CognitiveImageTagging/

The source code for the Sitecore module is here:

https://github.com/verndale/verndale-sitecore-cognitiveimagetagging

If you just want to play with Azure Cognitive Services, the code (including a console test app) is here:

https://github.com/verndale/verndale-cognitiveimagetagging

You will need an Azure subscription, and you will have to add Cognitive Services to it in order to run either library.

https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/

While not exactly practical, it was fun to play with artificial intelligence. You should definitely try it out.

Constellation for Sitecore

An Adventure with Cognitive Image Tagging

The Use Case

Breaking Down the Work

Verndale.CognitiveImageTagging

Verndale.Feature.CognitiveImageTagging

Experiments

Lessons Learned

Does Azure Cognitive Services produce good Alt Text?

Next Steps

Like this:

Leave a ReplyCancel reply

The Use Case

Breaking Down the Work

Verndale.CognitiveImageTagging

Verndale.Feature.CognitiveImageTagging

Experiments

Lessons Learned

Does Azure Cognitive Services produce good Alt Text?

Next Steps

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Constellation for Sitecore