The Help notes that including variables (e.g., Description, location tags, etc.) can return the accuracy of the AI responses. (I'm at this point mostly looking to develop more descriptive descriptions, keywords, landmarks, scientific names and a determination of whether the image is B&W or not (using custom traits AI.ScientificName [Return the scientific name of the object shown in this image]and AI.BlackAndWhite [If this image is monochrome, respond with 'black and white' else return nothing]).
Current working prompt (mostly for LM Studio but also exploration of OpenAI):
{File.MD.description}.
Describe this image in the style of a news caption. Use factual language. This image was taken in {File.MD.Composite\MWG-Location\Location\0}, {File.MD.city}, {File.MD.state}, {File.MD.country}.
Return five to ten keywords for this image.
If this image is monochrome, respond with 'black and white' else return nothing.
Observations: Including the description variable helps accuracy considerably (of course!), especially when there are similar but different characteristics in the image (e.g., two somewhat similar looking buildings). Location data variables (country, state, city and location) also help AI accuracy (e.g. to distinguish between the similar looking buildings. Sometimes the AI does have glitches or come up w/ incorrect responses, so something to check and not blindly rely on. Nevertheless, overall accuracy, including landmark recognition, is pretty good.
I need to continue going through the extensive AI help to figure out how best to apply these tools.
Question: Do any of the AI models available in IMatch take advantage of GPS coordinates? That might help with using AI for images that lack other location data or descriptions.
Quote from: jch2103 on March 27, 2025, 12:10:02 AMQuestion: Do any of the AI models available in IMatch take advantage of GPS coordinates? That might help with using AI for images that lack other location data or descriptions.
Have you tried it? Like giving the AI context information like "This image was taken at the GPS coordinates latitude nn, longitude nn, consider this when you analyze the image"?
The result of this will of course depend on the model you use. It's more likely to work with the huge cloud-based AIs.
I've made some experiments with Gemini recently, prompting for GPS coordinates. But the results I got were usually several hundred meters off, even for well-known places in Paris and London.
I had tried this
{File.MD.description}.
Describe this image in the style of a news caption. Use factual language. This image was taken in {File.MD.Composite\MWG-Location\Location\0}, {File.MD.city}, {File.MD.state}, {File.MD.country}.
This image has the coordinates {File.MD.Composite\GPS-GPSLatitude\GPSLatitude\0} and {File.MD.Composite\GPS-GPSLongitude\GPSLongitude\0}.
Return five to ten keywords for this image.
If this image is monochrome, respond with 'black and white' else return nothing.
for an image with no metadata except dates and GPS coordinates. The AI figured out that the image was in Arizona, but was not more specific. When I did a reverse geocode on the image with the same prompt, it returned information down to the street (but not address). Same for OpenAI.
I substituted your GPS prompt (without reverse geocoding)
{File.MD.description}.
Describe this image in the style of a news caption. Use factual language. This image was taken in {File.MD.Composite\MWG-Location\Location\0}, {File.MD.city}, {File.MD.state}, {File.MD.country}.
This image was taken at the GPS coordinates {File.MD.Composite\GPS-GPSLatitude\GPSLatitude\0} and {File.MD.Composite\GPS-GPSLongitude\GPSLongitude\0}. Consider this when you analyze the image.
Return five to ten keywords for this image.
If this image is monochrome, respond with 'black and white' else return nothing.
This return was much more specific (subject was a roadside attraction with dinosaurs sculptures in Yermo, California), with some expected variations in different runs. My takeaway: Prompt phrasing does seem to make a difference (as expected).
Tip: Check your prompt on VarToy to see the result. I'm not sure what MWG\Location actually contains.
You also seem to mix multiple prompts into one?
Because the is a prompt suitable for a description.
But then comes a part I would expect as the keyword prompt.
And the last part "monochrome" is a prompt I would use for a trait.
Are these three separate prompts in your setup?
Thanks.
VarToy return is OK.
Yes, I was duplicating input to the AI. I've deleted the unnecessary extra text in the prompt re keywords (it was already present in the Preferences AutoTagger setting for Keywords).
This image was taken in {File.MD.Composite\MWG-Location\Location\0}, {File.MD.city}, {File.MD.state}, {File.MD.country}.
This image was taken at the GPS coordinates {File.MD.Composite\GPS-GPSLatitude\GPSLatitude\0} and {File.MD.Composite\GPS-GPSLongitude\GPSLongitude\0}. Consider this when you analyze the image.
The second line is to account for images w/o Country/State/City/Location. I don't know if the duplicate location data helps or hinders the AI.
I have two custom traits:
AI.ScientificName (Return the scientific name of the object shown in this image.) [Interestingly, this sometimes returns 'homo sapiens', for people.)
AI.BlackAndWhite (If this image is monochrome, respond with 'black and white' else return nothing.) [I've deleted the duplicate language in the prompt.]
Including the location usually help the AI to produce better descriptions (or keywords). Depends on the model, of course.
Not sure about the GPS coordinates. Depends on how the model was trained and if it has a concept of GPS.
You can do a simple a/b test if you want:
Set the seed to a fixed number (say: 123) and the creativity to the lowest setting.
Select 10 representative images and make two copies of each image (30 images).
Run the prompt as it is on the first 10.
Remove the GPS part and run the prompt on the second 10.
Re-insert the GPS part but remove the location part and run it on the last 10 images.
Compare the results.
Reset the set to 0 afterwards.
This can be very insightful. As can be running the same prompt with different seeds and/or different creativity settings.
Note: Google Gemini 3, available in the next IMatch release, knows a lot about places and locations (Google has a treasure of data to train on, from Google Maps and Street View).
Thanks. Gemini 3 should be interesting. I also saw this yesterday: https://arstechnica.com/ai/2025/03/google-says-the-new-gemini-2-5-pro-model-is-its-smartest-ai-yet/
Interesting times.
Quote from: jch2103 on March 30, 2025, 06:43:58 PMThanks. Gemini 3 should be interesting. I also saw this yesterday: https://arstechnica.com/ai/2025/03/google-says-the-new-gemini-2-5-pro-model-is-its-smartest-ai-yet/
Interesting times.
I've commented on that already (search for it). The new model is a reasoning model, more aimed at math, reasoning, coding. For the intended purpose in IMatch (creating descriptions and keywords for images) this model should not have a big impact. It's not available publicly and still in beta. When it becomes available and I see a benefit, I will make it available in AutoTagger.
I think people will be very surprised at how Gemini 2.0 (Flash) already is with AutoTagger.
Quote from: jch2103 on March 30, 2025, 06:43:58 PMThanks. Gemini 3 should be interesting. I also saw this yesterday: https://arstechnica.com/ai/2025/03/google-says-the-new-gemini-2-5-pro-model-is-its-smartest-ai-yet/
Interesting times.
Thanks
SAutotagger with Gemini.png
How do I use Gemini 2.0 in iMatch?
If I use my API key from Google today the Model-field is locked.
Even the Description field is locked.
Is this just for keywords??
I asked Bing about which would be best to AI create Descriptions - Gemini 2.0 or 2.5:
(it is translated with Google Tranlate if there is any strange turns language vise)
Bing seems to be very convinced Gemini 2.5 is to prefer.
Still I think Gemini 2.0 would be interesting.
Gemini 2.5 stands out as the superior option for your use, even in an advanced DAM environment like iMatch. Thanks to its enhanced multimodal capabilities and stronger reasoning capabilities, Gemini 2.5 is better equipped to understand, interpret, and describe visual elements in your images. This means that the image descriptions generated can be both more detailed and contextually relevant.
In practice, this means that when you integrate Gemini 2.5 into iMatch DAM, the added precision and ability to handle complex images is likely to provide clearer, more nuanced image context – something that is crucial when working with large image archives and need to generate metadata that facilitates search and organization. However, it is important to remember that the final quality often also depends on how well you customize your prompts (prompt engineering) to instruct the model to focus on the right details in your images.
Since iMatch DAM is already designed to work with multiple AI services to handle everything from "Keywords" to "Landmarks" via advanced prompt-based settings, switching to a more advanced model like Gemini 2.5 will not only improve image analysis but also give you greater flexibility to tailor metadata output to your needs. For example, if you have different types of images with varying content or context, fine-tuning your prompts can ensure that each image description is more precise and contextual.
In other words, if you want to maximize the potential of your iMatch DAM setup to automatically generate rich and accurate image descriptions, Gemini 2.5 is the best choice. Have you considered what adjustments to your prompt strategy could elevate your metadata management even further? I can also give you tips on how to systematically test different sets of prompts to further fine-tune the results!
As I said already: Gemini 2.5 is is a reasoning model, not a model specifically tuned to describe images. It's for math, coding, reasoning and interactive use. Please lookup Google's own description and suggested uses.
I also think that most of what Bing (Copilot) has produced in the answer is proper BS, stuff learned from the Google marketing announcement and a bit of halluzination thrown in. Typical AI slop, if you ask me.
Gemini 2.5 also costs 30 times (!) as much as the Gemini 2.5 Flash (Light) model, which is perfectly good.
Image1.jpg
Almost 12 bucks for a million token, that's steep. Especially if you compare it to the Flash Lite model IMatch will support in the next release.
Image2.jpg
Let's wait until the IMatch version supporting Gemini is out.
Then wait for a couple of months to allow users to work with it and get some experience.
If I see in telemetry that there are many users using Gemini and I think Gemini 2.5 is substantially better than Gemini 2.0 and users may benefit from it despite the massive price tag, I may include support for Gemini 2.5 (or Gemini 3.0 or Gemini 3.5, ...) in the future.
I have already studied what Google writes about Gemini Pro and that was why I let even Bing have a try. Even Google themselves made it very clear that version was more focused on reasoning and text processing than image-processing. I tried it briefly before I decided not to go there.
I think the pricing will be important even if it would be interesting to test even version 2,5 but the money is no issue now for me at all when evaluating things. We are always free to abort the use before 30 days. We are all still in state of testing. Even OpenAI cost but is surprisingly cheap and Gemma is free - but that is nothing decisive for me at all for now despite that I do real work now with both OpenAI and Gemma.
By the way the Bing answer was produced by the more long running query "Deep Search" or what they called it. That takes significantly longer time and processing time matters.
Question: Is it something wrong with iMatch when processing Gemma? I have noted that if there are pending updates Gemna will rest until they are forced to update by me. Because if I do that it starts processing instantly. It us no problem as I remember ptocessing the updates first. OpenAI is not behaving like that at all.
Quote from: Stenis on April 06, 2025, 06:15:10 PMI have already studied what Google writes about Gemini Pro and that was why I let even Bing have a try. Even Google themselves made it very clear that version was more focused on reasoning and text processing than image-processing. I tried it briefly before I decided not to go there.
I think the pricing will be important even if it would be interesting to test even version 2,5 but the money is no issue now for me at all when evaluating things. We are always free to abort the use before 30 days. We are all still in state of testing. Even OpenAI cost but is surprisingly cheap and Gemma is free - but that is nothing decisive for me at all for now despite that I do real work now with both OpenAI and Gemma.
By the way the Bing answer was produced by the longer running query "Deep Search" or what they called it. That takes significantly longer time and processing time matters.
Question: Is it something wrong with iMatch when processing Gemma? I have noted that if there are pending updates Gemna will rest until they are forced to update by me. Because if I do that it starts processing instantly. It is no problem as I remember processen the updates first. OpenAI is not behaving like that at all.