Computer Vision and Machine Learning in IMatch
Modern AI-based technology makes it possible to automatically add descriptions and keywords to images. To detect persons and objects, recognize faces, ethnicity, age, gender and sentiment.
While surely not useful for all IMatch users, the possibilities and time-saving potential is tantalizing.
Especially if you have 50,000+ untagged files sitting in your digital asset management system, waiting for manual keyword input and processing. And for corporate users with large heaps of totally unorganized files.
Using machine intelligence to do the grunt work can really save days or even weeks or manual labor. Combine that with the powerful features in IMatch for organizing images and you get awesome results in almost no time.
Scroll down to see the results.
See the thread More Artificial Intelligence… in the IMatch community for additional information and background.
There are many companies who offer AI services or computer vision services of some sort. This is currently a very hot topic and things are moving fast. Some things work well, others do not. We hence invested some time to try out a selection of the available services and to see how they could be integrated into IMatch and IMatch Anywhere™.
From the vast number of service providers we picked the following players:
The groundwork we did for the IMatch scripting system for these tests allows us to add additional vendors later with ease.
We created a set of 200 representative sample images from the IMatch marketing collection (commercial stock photos) and creative commons licensed images downloaded from Pixabay. We tried to create a good set of samples, ranging from typical family and vacation photos to product, wildlife and food photography.
The services provided by each vendor differ. Here is an overview:
This means providing one or more tags (non-hierarchical keywords) for an image
Assigning one or more categories from a fixed set of categories to an image
- Face Detection
Finding faces in images. Optionally detecting gender, age, ethnicity and sentiment.
- Face Recognition
Recognizing persons in images based on a previously created database of persons
- Special Services
Some vendors offer special services for food (ingredients recognition), apparel, celebrities, ‘adult’ content detection, image ‘quality’, ‘focus’, ‘sharpness’, text recognition (OCR) etc.
What Was Tested
For this initial test we concentrated on keywords and descriptions. Note: currently only Microsoft offers automatic descriptions.
How Would This be Used in IMatch?
IMatch would use the tags (keywords) returned by the service. Optionally mapping it via your thesaurus to transform the flat keywords into proper hierarchical keywords. IMatch could also use the categories returned by some services to categorize images, or use the categories as additional keywords.
IMatch would provide additional features like filtering, unifying, replacing or ignoring the delivered keywords. This gives you fine control over how the keywords provided by the service are applied to your files.
The descriptions returned by Microsoft Azure (and maybe other services in the future) could be used to automatically fill in the XMP title, headline or description metadata fields.
Detected faces would be automatically imported into the XMP face regions and IMatch Face Annotations. The gender, ethnicity and mood (were provided) could be used for additional keywords or categories. If face recognition is available and enabled, the person names would become available for face regions and IMatch Face Annotations. “Show me all files or Aunt Esmeralda” would become an easy thing to do. Or “Show me all files of Arthur with Trillian”.
How Much Does it Cost?
This is the sad part. Google, Microsoft and all the other vendors don’t work for free 😉
The typical cost for tagging 1000 files is about 1 US$. If you want tags and faces, it will cost you about 2 US$ per 1000 files. All vendors offer a generous amount of free ‘calls’ per month (good for 1000 to 5000 files).
Is it Worth It?
Yes, but it depends. If you shoot mostly family photos or travel motives, you can tag and describe files very efficiently with the tools provided by IMatch.
However, if you start fresh with a huge backlog of images to tag and annotate, spending 50 bucks to get 50,000 files tagged by a machine can be well worth it. You can also mix keywords assigned by you (primary keywords, best quality) with keywords provided by an AI to get an extra layer of conceptual keywords. Great for search functions.
Privacy and Legal Issues
If you plan to use AI-based services like this, you have to be aware of the privacy issues. You are uploading your files to a computer system in another country. This means that this company can see your images. In theory, at least. Many of the vendors also reserve the right (in the small print in their usage agreements) to retain copies of your files or fingerprints to further improve their technology.
If you process images showing people, uploading them to a 3rd party server often requires an explicit written agreement by every person shown. This has become more complicated and serious since the introduction of the GDPR in Europe. Most AI vendors operate their servers in the U.S., which is no longe considered a safe haven for data by many.
Please find below the results for our sample set of 200 files. We’ve used the services to add keywords (tags) to each file. For Microsoft Azure we’ve also used the option to automatically describe images.
Before you open the result documents, consider how long it would take for you to add keywords two 200 very different images. Or 1,000. Or 50,000…
For this test we used the free tiers of all services and limited the speed to one file per second. It took less than 4 minutes to add keywords to all 200 files.
These PDF files are fairly large (5 to 10 MB, so please don’t use them via mobile links).