I started by writing a common framework for "AI Vision" services. This allows me to handle the substantial differences between the many vendors, to deal with tags, labels, categories, faces, landmarks etc. in a unified way. I then wrote a base adapter class and adapters for the services I've tried so far. They all produce a "model" that describes the image. This model can then be used to perform a wide range of useful tasks inside IMatch

There are some services which deliver landmarks, e.g. the name of a place, or house or landmark ("Houses or Parliament", "Niagara Falls", "Palace of Versailles",...).
Some services allow to categorize images, based on a predefined category hierarchy.
Some services produce often amazingly correct description of the image ("small brown dog playing with a ball in the garden", "tourists standing in front of the Brandenburger Tor in Berlin").
Some services allow to detect faces in images. This is great to automatically produce face annotations in images.
Some services also return the ethnicity per face (quite reliable), the gender (quite reliable) and the age (sometimes off by 20 years

)
Some services also offer face
recognition. For example, Amazon AWS. Here, a face is trained by uploading some samples. Amazon calculates mathematical model representing the face, but
does not store the photo itself. When you later process an image of the same person, AWS can tell you that this is "Uncle Bob". An app could do that, using existing face annotations etc. This plays well with a major feature that will be integrated into IMatch 2018...
Of course
what a user wants to be detected in an image is configurable.
If you only want labels, you only get labels. This is also a question of cost, naturally.
What all have in common, they deliver a set of "labels" per image.
This would be the main source for creating keywords.To make these simple labels more useful, IMatch or the app would have to perform a number of additional steps, e.g.
unifying the keywords or mapping the keywords into the keyword thesaurus to produce proper hierarchical keywords. But also doing cleanup tasks like applying a user-defined "skip list" for keywords which are known to be wrong or unwanted (your "potted plant" for example, or the "no person" keywords returned by one service for all images without a detected face).
The beauty is: We can do that easily in an app. Once the model for the image has been produced (by using whatever service) all the info is there. All that needs to be done is to use the endpoints offered by IMatch/IMWS to create keywords, set the title or description, produce categories or face annotations etc.
Of course many users will have no use for that. If you only shoot family motives or friends, you can add keywords and descriptions quickly using the features provided by IMatch.
Others may find it worth a few dollars to automatically add keywords and descriptions to 10,000 files in a batch.
Or to have an "index" of all the persons in their images. Or objects like cars, boats, fruit, wedding motives, landscape motives, ...
It all depends on the user. If the user does not need it, it does not interfere or cost money.
But integrating these technologies into IMatch ans IMatch Anywhere is important. I try to keep the effort on my part low, though.
It takes a day or two to learn about how something like Azure Vision or AWS Rekognition works and to write the "adapter" which links my framework to a new service takes.
But, much is learned and gained, not too much effort required.
The technology used for this could later also be used to interface with cloud-based storage, to implement "uploader" features for services like FB, Twitter, Instagram, whatever. Even, gosh, to connect to an IMatch database which runs in the cloud...
