Share Examples?

Mario · March 10, 2025, 11:09:12 AM

I would be great to see some examples of results you produced with AutoTagger, including

a small version of the image
the AI used
your prompt
your results (keywords, description, landmarks, traits)

Most people, me included, learn best from examples, and if you are satisfied with your AutoTagger results, let others know how you did it.
Note: Please do not upload images of persons, or blur their faces before uploading.

Jingo · March 10, 2025, 01:13:17 PM

Great idea!

Here is mine:
You cannot view this attachment.

OpenAI Prompts:

Description prompt:

[[-c-]] Describe this image in a style to make it easily searchable. Use simple English, common words, factual language, and simple sentences.
Avoid describing anything not directly observable from the image.
{File.MD.city|hasvalue:This image was taken in {File.MD.city},}{File.MD.country|hasvalue: {File.MD.country}.}

Keyword prompt:

[[-c-]] Return ten to fifteen keywords describing this image.
{File.MD.hierarchicalkeywords|hasvalue:This image has the following keywords: {File.MD.hierarchicalkeywords}.}

Results:

AI Description:
This image shows a white-throated sparrow on the ground among leaves. The bird has brown and gray feathers, with a distinctive striped head. It is in a natural outdoor setting, surrounded by fallen leaves. The sparrow appears to be foraging for food.

AI Keywords:
AI|American Sparrow; AI|Brown Feathers; AI|Foraging; AI|Ground; AI|Songbird; AI|Striped Head; Animals|Bird; Animals|Bird|White-throated Sparrow; colorful; Habitat; Objects|Leaves; outdoors; Places|Nature; small bird; wildlife

I haven't yet set up 1:1 keywords mapping for some of the returned hierarchical keywords.. but will be doing that next.

Stenis · March 10, 2025, 03:08:00 PM

Thanks for your input Jingo and god to have an explicit headline for the prompt examples Mario. I´ll contribute a little later.

One thing I didn´t understand is this:

{File.MD.city|hasvalue:This image was taken in {File.MD.city},}{File.MD.country|hasvalue: {File.MD.country}.}

Were there no data in these "Location"-elements of yours?

Mario · March 10, 2025, 03:29:15 PM

Quote from: Stenis on March 10, 2025, 03:08:00 PMOne thing I didn´t understand is this:

{File.MD.city|hasvalue:This image was taken in {File.MD.city},}{File.MD.country|hasvalue: {File.MD.country}.}

Were there no data in these "Location"-elements of yours?

This variable checks if the city tag has a value. If yes, it outputs something like "This image was taken in London, United Kingdom". If there is no city value, the variable outputs nothing.

Not all files in a database may have location data when the prompt runs. The hasvalue check ensures that the phrase is only added to the prompt when there is data.

Stenis · March 12, 2025, 01:06:30 AM

Thanks for your reply Mario.

Below you can see my basic "static configuration" I have landed in for now and a few F7 prompt examples

I use all three static prompts the "Description", "Keyword" and "Landmark" now and never changes that data.
Landmarks are written (if they are found) to merge with the data in "Description" with a variable.
I always use flat keywords
I even think the "Landmark" works surprisingly well :-)

One other important condition for my workflow is that I still use PhotoMechanic to batch update about 20 data-elements in iMatch.
For me it is much faster than doing it in iMatch.
AutoTagger updates the "Description" and "Keyword"-elements with AI.
Compared to my earlier process, this is extremely much faster and fare more efficient due to I don´t have to write descriptions manually. It is like a stone has fallen from my shoulders and work has got so much easier.

I just have to read the ready-made texts now and adjust a few things sometimes but most of the time it is much more than good enough as it is
I´m also very happy with the job OpenAI is doing there in the background.
What a change this has brought to my processes!
It feels like getting a whole new photo archiving life!

Fantastic job you have done Mario!
I was afraid the system should be Beta and not flexible enough but it really is very well engineered and solid already.
Frankly, I did not really expect that I think and am glad I´m proven wrong.

At Description prompt (static)

[[-c-]]
Always priority for gepgrafic data, the year and peoples names.
Always write text inside quotes but without the quote signs.
Describe this image in a style to make it easily searchable.
Use simple English, common words, factual language, and simple sentences.
Avoid describing anything not directly observable from the image.
Write the geografic data and year in the beginning of the text together with names

At Keyword prompt (static) (Presumption: Only non-hierarchic keywords)

[[-c-]]
Keywords never in plural form
Return five to seven keywords describing this image.
Use simple English, common words, factual language.
Max one word in each keyword.
No geografic data or time info as keywords.
No words with only capital letters.

At Landmark prompt (static) (Presumption: Merge with existing target (Description) values

[[-c-]] Return the names of known landmarks and tourist spots in this image to {File.MD.description}
If you cannot detect any landmarks or tourist spots, return ''.

At the F7-prompt (non-static)

Template

Keywords never in plural form
Always write text inside quotes but without the quote signs.

Mandatory text: "Place Country Year" - "motif"
In foreground "Name"

Examples

1.

Keywords never in plural form
Always write text inside quotes but without the quote signs.

Mandatory text: Essaouria Morocco 2025 -
"Essaouria Fishing Harbor"

2.

Keywords never in plural form
Always write text inside quotes but without the quote signs.

Mandatory text: Eze Sur Mer Provence France 2012 -
"Bye the sea"
Ingrid Sändh in the foreground

Mario · March 12, 2025, 08:44:35 AM

To attach files, use the Attach command available bellow the full post editor. If you use Quick Reply instead, click "Preview" once to see all controls for adding attachments.
Copying and pasting images from somewhere does not work.

Stenis · April 08, 2025, 01:44:43 AM

Using Open AI GPT 40 instead of GPT-40-Mini with Autotagger is a completely different thing!!!

Earlier I complained about my pretty poor results with using Open AI mini or Gemma 3 4 on the task of figure out which species there were in the pictures. They both struggled with deciding which antelope or birds it was in the pictures. Open AI got a bit better after tweaking the prompts and adding the names of the animals, then they fixed even to fetch the latin names often but then the whole idea of speeding up these workflows got jeopardised I thought.

Then I saw I also had the option of using GPT 40 and I did and my jaw dropped. It was almost 100% adding name of the species, the latin name and the families the animals belonged to and that was the case both for mammals, birds and reptiles.

You cannot view this attachment.

This was my prompt texts:

Mandatory: Write the specie and the name in latin plus the family of species they belong to into Description.
Write the specie and family name into Keywords.

These were the results:

Description: A group of Damaliscus lunatus, commonly known as topi, graze in the savanna. The topi, part of the Bovidae family, is characterized by its reddish-brown coat and distinctive dark markings. The scene features scattered trees and open grassland, typical of the African savanna.

Keywords: Antelope, Grass, Savanna, Topi, Tree, Wildlife

You cannot view this attachment.

Here another example with the same prompt av the first example above:
This picture of the really small Dik-Dik antelope neither Gemma 3 4 or GPT-40-mini managed to figure out.

This is absolutely better than I expected. So, when I will tag all my safari pictures I will definitely use the larger OpenAI model. I have quite a few since I have been to East - and South-Africa totally eight times - the first trip already in 1972 - and I also have safari pictures both from Sri Lanka and India too to take care of. So, OpenAI GPT 40 will help a lot. If it solves these problems as effective as I have seen now, I don´t really care about the cost because what I win is so much time and efforts that I don´t really have in my age. From what I have seen it is hallucinating less than the other two smaller models.

I've really struggled to get a handle on all of this before and no matter how effective PhotoMechanic has been even in my hands, there has always been two data-fields/elements that have remained: Descriptions and Keywords On the other hand a picture library just gets better and more useful the more metadata you manage to add on the pictures, that is the positive and motivating side of it.

There is a new model of OpenAI coming soon (version 4.5 I think) that might be even better. There is a real race now it seems. Are we getting access to that one too in the future?

Stenis · April 08, 2025, 02:25:10 AM

You cannot view this attachment.

Den större modellen kostar ca 16 ggr mer att använda vad det verkar

Stenis · April 08, 2025, 02:46:56 AM

I wonder if there is beginning to build a new trend where software companies are trying to push us a lot more than before. Look at this example below.

On top of all other sharpening models in Topaz AI, Topaz have now created a Super Focus-function that processes our pictures in the cloud and using it seems to cost a lot.

If I look in the list below:
To use Super Focus V2 for one of my 33 Mp pictures costs 14 points per picture and their rekommendation to buy is 400 points for 39.99 U$. For that cost per picture 400 U$ will cover 28 pictures. I tested it and the results on my pictures are better with the non-cloud models luckily enough but the trend is clear I think that we will continue to see attempts of these companies to charge us even for processing time in the cloud now.

Compared to that the AI-companies like OpenAI not just gives us a number of options to chose between but they are have options that are very reasonably priced.

You cannot view this attachment.

Mario · April 08, 2025, 10:08:42 AM

Your point?

"Better" models cost more money. You rent "compute" and when the model is more expensive to compute, you pay more.

The new beta Google Gemini 2.5 model costs 10$ per million output tokens, while the Gemini 2.0 Flash (Light) costs 0.3$, making the 2.5 model 33 times more expensive.

If the free Gemma 3 model does not do what you need, AutoTagger offers OpenAI with two models (cheaper and more expensive), Mistral (two models, cheaper and more expensive) and soon Google Gemini (cheaper and more expensive models).
Thankfully you have IMatch which gives you many options and the freedom of choice.

Stenis · April 08, 2025, 11:51:24 AM

Yes, you have done a wonderful job Mario, giving iMatch all that flexibility. It is really good we have the possibility to "step up" if it is necessary and my examples at least for me shows very clearly that exactly it sometimes is necessary, at least if you value efficiency and productivity more than the absolute cheapest solution.

On the other hand it has rarely been the most technically advanced softwares or processes that "wins" historically seen. It is mostly "the good enough" that takes home the "victory" and the money.

For me my small example with the African animals makes it very clear that 16 times higher cost will be worth it in that particular case of several reasons and maybe for others as well. Taking on many thousands of safari pictures with the more expensive model might be what it takes to get that job done. With cheaper alternatives it might be so cumbersome that it will never be done or finished.

With the mini-model of OpenAI I might need to spend much more time adjusting my prompts and rely on trial-and-error workflow instead of just letting GPT 40 doing all of the job with one single prompt for it all. There is a Swedish expression called "Dumsnålhet" (I'm sure you have one in German too) with means that I might embrace a combination of stinginess and stupidity because I´m just being obsessed with the money I pay for the service.

It could have been the same with th Topaz example but those similarities ends there today because that cloud service just isn´t good enough yet - at least for me. With the tests I have done it still doesn´t deliver because it mostly gives terrible artifacts but say you have a "once in a lifetime shot" and it would have delivered and managed to turn a technically mediocre image to something close to fantastic shot for in that case 70 cents? I could have been the market for that because I do have some historical images that I know are unique taken at the Saur Revolution when it all started in Afghanistan leading to an over 40 year long period of war.

You cannot view this attachment.

Stenis · April 08, 2025, 12:09:20 PM

Sorry, There was no more space for pictures above.

You cannot view this attachment.

This is around noon one of the las days in april 1978 when I took that picture. 30 minutes later I had taken my pictures and sat in Khyber Restarant that lays around the same gigantic roundabout, waiting for my lunch. First they started to shoot with Kalashnikovs and after maybe ten minutes they fired the first canon-shot with one of the T55 tanks parked just 10-15 meters from the restaurant. It was parked with the rear against the restaurant entrance and the shock wave was enormous. If it wouldn´t have been for the decmeter broad aluminum blinders in the windows we might have been blinded all by the glass that flew into the restaurant. The rest is history.

I have quite a few color slides that might need something more powerful than the local sharpening models in Topaz and I know that a lot of other older photographers have the same problems.

Stenis · April 08, 2025, 07:02:36 PM

I will test how well Mistral performs too. You never know what EU will run in to with to US and how that will affect the American AI companies.

I rather buy a german DAM like iMatch than PhotoMechanic especially when it is superior to PM anyway the way I use them. I ditched Lightroom many years ago becuase it was inferior when it comes to image quality to DXO Optics Pro / DXO Photolab and nothing else can compete with danish Capture One when it comes to tethering. ... and I rather use an european AI-system if it performs decently befor an American or Chinese one like DeepSeek if that would have been a real option.

So what is the opinion of Mistral Mario?

Mario · April 08, 2025, 07:39:46 PM

Quote from: Stenis on April 08, 2025, 07:02:36 PMSo what is the opinion of Mistral Mario?

Give it a try. I have written what there is to know in the help: Mistral AI

Stenis · April 08, 2025, 11:35:59 PM

I will, thanks! ... even if OpenAI GPT-40 really impressed me when it comes to identifying animals and write useful descriptions with a minimum of prompting efforts

Stenis · April 09, 2025, 04:44:35 AM

I processed a lot of different animals yesterday with OpenAI GPT-40 which is the bigger model and it only missed once with a Serval cat half hidden in the tall savannah grass that GPT-40 mistook for a Cheetah.

I have just moved this from the Ad Hoc prompt (F7) into the Description prompt:

Write the animal species names and the animal name in latin plus the family name of the species they belong to into Description.
Write the animal species names and families names into Keywords.

With that I almost never need to do anything else than to press Run.
It is very very effective.

Stenis · July 28, 2025, 01:22:20 AM

I will soon start a big effort to take on a lot of animal pictures that I have neglected for a long time because it has been so much to face. Of that reason I have felt I really have to be satisfied with my prompting before I start because I don´t want to have to redo it once more in the future. It is pictures taken between 1972 and 2016 in quite a few National Parks both in Africa and India. So, I started to discuss that with a few sessions in Chat GPT 4.1 and got the advice to add the section below to the Description prompt- so I did.

If there is an animal in the image, write the animal's common English name, its family, and its scientific (Latin) name, using this format:

Animal: [Common name]
Family: [Family name]
Scientific name: [Latin name]

If there is no animal in the image, do not write anything. Return nothing.

I also added a new section to the Keywords prompt that I have developed using part of the instructions above and adding a few other things too.

If there is an animal, flower or plant in the image, write the animal's, flower's or plant's ["Common name"]

If it is a Mammal write Animal and Mammal
If it is a Bird write Animal and Bird
If it is a Reptile write Animal and Reptile
If it is a Fish write Animlal and Fish
If it is an Insect write Animal and Insect
If it is a Flower write Flower
If it is a Tree write Tree
If it is a Plant write Plant
If it is a Fruit write Fruit

If there is no animal, flower or plant in the image, do not write anything.

This setup for addressing especially nature photography pictures has proven to work very well and leaves very little for me to add when I use it with the GPT-4.1 model. I know it costs more than the mini model but I think it is weel spent money since it is more reliable than the mini-models. Be aware though that this setup is just ment to be used with a flat keyword model. As you can se in the instructions above I also add that it IS an animal and what type of animal or plant it is etcetera.

Here a few examples (without pictures) of different type of animals

Lizzard: (Reptile example)

Serengeti Tanzania 2012 - A striking lizard rests on a rock. The reptile exhibits vibrant purple and blue scales, characteristic of the Chamaeleon species. This reptile showcases a beautiful color pattern. Common name: Agama lizard Family: Agamidae Scientific name: Agama agama

Agama Lizard, Animal, Reptile

Zebra: (Mammal example)

Serengeti Tanzania 2012 - Young plains zebra stands near water. Black and white stripe on body. Light reflects from water. Earth tone background. Typical Serengeti view. Common name: plains zebra Family: Equidae Scientific name: Equus quagga

Animal, Mammal, Plains Zebra, Stripe, Water

Verreaux's Eagle: (Bird example)

Serengeti Tanzania 2012 - A large dark eagle sits on a rock with green grass in the background. The bird faces away, showing detailed feathers. Good light and natural setting. Useful for wildlife and bird search. Common name: Verreaux's eagle Family: Accipitridae Scientific name: Aquila verreauxii

Animal, Bird, Grass, Rock, Verreaux's Eagle

Frangipani flower: (Flower example)

Zanzibar Tanzania 2012 - White frangipani flower with yellow center and dark green leaves. Close-up photo of the flower in daylight. Common name: Frangipani Family: Apocynaceae Scientific name: Plumeria alba

Flower, Frangipani, Green, Leaf, Nature, Petal, White

Inorder to get a mandatory initial text I also always includes a text like the one below in the ad hoc prompt window. The last months I have found that in my case Open AI GPT 4.1 API has proven to be the most reliable when it comes to follow formal instructions on for example text formatting. I have found Googles models have had more problems with that and that is the reason I now stick with Open AI.

Mandatory text: Serengeti Tanzania 2012 -

I have many times among more conservative photographers and museum workers working with digitizing and metadata met an attitude that their work is soooo special, creative and demanding that it can´t be rationalized or streamlined at all with automated workflows or AI and I have to say that half a year ago, before I migrated to iMatch 2025 and started to use the latest API-models we got a couple of moths ago I was prepared to agree to that.

I think just these small examples shows that Autotagger with proper prompting can do an awful lot of smart and effective things with our workflows. Of cource there will be occations when AI misses or our prompting leaves for something more to wish but no we are very close to zero metadata administration. I must say it is very few texts I correct manually these days.

Autotagger can´t always match texts that I really put some efforts in manually BUT if I am honest, I can see in my PhotoMechanic Descriptions that they often tends to be shorter and less descriptive because writing them manually is so time consuming. With the Autotagger the generated texts and Keywords often gets generally more well written and the quality gets more consistent.

It is also very interesting now with the new "resoning" AI-models that they really can help us writing better prompt instructions. This is really something that empowers us in our dialog with iMatch. It is really a paradigm shift we are witnessing and experiencing now and it is really very exiting.

jch2103 · July 28, 2025, 06:51:24 PM

Thanks; that's very useful information. I'm glad Mario set up this discussion thread.

Stenis · July 29, 2025, 06:12:22 PM

Me too and the Photools community is a very good and responsive resourse which have helped me a lot.

I want to stress one thing a bit and it is that I use the GPT-4.1 almost exclusively to nature pictures where I am interested in getting help with identifying species of all sorts. So for safari pictures with animals on I use nothing else than GPT-4.1 API. For everything else I prefer the GPT-4.1-mini which absolutely is "good enough" for a fraction of what it costs to use the bigger model.

There is also another upside too, where the smaller model both is faster and less prone to return unwanted texts like "there is no animal or plant in the picture" or "there is no texts or signs in the picture". So, they are both indispensable but of different reasons One god thing is that we just need to write one set of promts for all OpenAI 4.1 API-versions.

Mario · July 29, 2025, 06:38:13 PM

Thanks for sharing.

The next release of IMatch brings support for the Google Gemini 2.5 Flash Lite, which has just been released.
I'm not sure if it has advantages over the 2.1 model. Google claims it's faster.

I pretty much think that, for the purposes of AI in IMatch (descriptions, headlines, traits and keywords), the AI's are as good as it gets. The focus of AI development now seems to be reasoning models and agentic workflows, which offers no benefit for IMatch use cases.

I'm waiting for Google's Gemma3n model becoming available for Ollama and LM Studio (the version with vision support I mean). This model is supposed to need less memory than Gemma 3, which could mean that even IMatch users with graphic cards having as little as 4GB VRAM can work with it. When it becomes available and it works good enough, I will add support for it.

For the most part, I only use Gemma3 12b running locally. It produces more than good enough descriptions, headlines and hierarchical keywords for my needs.

Stenis · July 30, 2025, 12:05:10 AM

Agree Mario. I think what we have now in OpenAI 4.1 and also in Gemini Flash really has proven totally sufficient for almost all needs we might have. As I wrote there are situations where I have to use the bigger model GPT-4.1 when my demands are higher than the 4.1-mini can meet but for more than 95% of my needs the mini-model is just fine.

Gemini Flash is really fast already :-)

Maybe we won't need all that more speed and refinement but the reasoning focuses the developers have had lately might make their API:S write even better texts and sometimes a source like Chat-GPT or Google's Gemini might help us writing and refining our prompts.

I am very satisfied now with my workflows and have stopped testing different API-solutions for now, in order to get some real work done. Finally, I'm really making some progress now with my picture library metadata without killing myself - iMatch is a real relief:-)

It is very good and impressive though that you continue to give us both the latest commercial and free API-options. That is a huge selling point for iMatch compared to most other picture library - DAM- solutions that many times just try to lock their users into all sorts of proprietary maze-like solutions.

Stenis · July 30, 2025, 09:23:34 AM

The cost of using even a commercial service as Open AI API GPT-4.1 and especially GPT-mini is also very resonable. I prefer the payment model of Open AI where it is possible to have a total control of the costs through their system of accepting paying in advance instead of letting them charge you as you go. We know from real eamples that Google have charged people fantasy sums after having activated deep search models juts by mistakes. At least before there were no warnings concerning the costs.

It is very nice offering the totally free Google's Gemma3n model too for people with basic GPU-processing power and with a model like that it is impossible to make "mistakes" like the ones possible with Google Gemini.

Jingo · July 30, 2025, 02:39:17 PM

I've been using the Gemini 2.0 Flash Lite and it is quite successful in returning results for my bird and wildlife/flower photography. It often confuses a few birds that are really close to each other (ie: Eastern Phoebe vs Pewee) but I too have a hard time differentiating these... If the AI can't figure it out after 2 tries... I just figure it out myself, provide a keyword tag and then let the AI use that keyword to get me detailed descriptions and further keywords from it.

Overall, it has about a 90% positive return rate on identifying birds which is really great... especially for the more common things which saves me time. It even correctly identified a Juvenile Bald Eagle which I thought might have been a Golden Eagle... and nailed this female Baltimore Oriole.

A positive use of AI for sure!!

You cannot view this attachment.

You cannot view this attachment.

Mario · July 30, 2025, 03:51:25 PM

QuoteI've been using the Gemini 2.0 Flash Lite and

Have you tried the big 2.0 or even 2.5 model (which are more costly, but may produce better results in these cases).
As Stenis points out, sometimes the faster and cheaper models don't cut it. Having a copy of the AutoTagger settings but with the bigger model selected can be really helpful in such fringe cases.

Stenis · July 30, 2025, 05:33:30 PM

When it comes to identifying different animal species it often becoms very obvious that it often pays off a lot to use the bigger models. I saw it already when I compared Gemma and the Gpt-40-mini with the bigger Gpt-40.

When I then tested the smaller models they could not identify some dik-dik antelopes and waterbucks and they confused topis with hartbeasts and grant gaselles with impalas. Already the bigger API-model 4.0 fixed that and 4.1 is even better.

I have paid subscription of Chat-GPT (because I use it to discuss all sorts of things I want to learn more about and more than the free version admits) and then they let us test even version o3 (that some have called "the professor") and both version 4.1 and o3 are almost scaringly competent and write fantastically well structured texts and sometimes it suggest very interesting suggestions that really takes me further. Especially version o3 is really impressive (since it constantly displays what it actually is doing during a deep search). It is like to have a private constant access to an highly educated academic speaking partner with endless research resources and all the time you are willing to put into the dialog yourselves.

It saves me a lot of time too handling all sorts of matters I am invoved in privately and I guess you Mario use it extensively too to maintain your site and handle all the dialogs going on here. Otherwise you shouldn't be as impressively efficient as you are I guess. Other software companies with far bigger organisations and far more extensive economic resources has a lot to learn by this example. It also makes me happy to see that brilliant skills still 2025 often beats what big global companies like Adobe and many others with wast R&D resources manages to do. Sometimes less is really more!

Mario · July 30, 2025, 06:06:51 PM

QuoteMario use it extensively too to maintain your site and handle all the dialogs going on here.

Nope. I use AI for spell and grammar checking, and rarely for some SEO to make the Google bot more happier.

But I now get one or two emails per week where users complain that a feature or option an AI suggested to use with IMatch does not exist

Stenis · July 30, 2025, 06:25:45 PM

I might have misunderstood an earlier dialog with you.

Then there might even be room for future improvements :-)

Stenis · July 30, 2025, 08:05:35 PM

Mario, can you see a future where some kind of dialog should be used? When I have used the advanced resoning o3-model that also is supposed to be strong on "vision" I have seen sometimes that it kan help to respond to an ansvar I have got. Instead of just run the same picture again it might help with a more detailed prompt. (Just a thought).

In an interface it could be a check box before reprocessing that opens i dialog for reasoning input??

OpenAI o4-mini
Our faster, cost-efficient reasoning model delivering strong performance (on math, coding and (my parantesis) vision

Price
Input:
$1.100 / 1M tokens
Cached input:
$0.275 / 1M tokens
Output:
$4.400 / 1M tokens

Mario · July 30, 2025, 08:39:47 PM

I don't think that reasoning models will necessarily improve the AI output most of the IMatch user base are already more than satisfied with.

Doing a ping-pong with the AI, discussing the description/keywords/traits it provided for an image in order to maybe improve it is theoretically possible. This will generate extra fees, and transforms the automatic background AutoTagging process into some kind of picture-by-picture interactive dialog process with the user. It will also require AutoTagger to create and maintain context for each processed file in order to support reasoning and ping-pong with the AI. Quite a substantial enhancement.

Feel free to add a feature request so we can see if, and how many users, would have a use for something like this. Or even understand or care for what this means.

With placeholders I already gave users a way to provide ad-hoc details to the AI. And this feature can also be used to "discuss" a result with the AI. Use the variable containing the AI-generated description and annotate it to tell the AI what to change as a placeholder. Like, "You provided this description "{File.MD.description}" for this image but I want you to..." and send that as (part of) your prompt. There is a lot that can be done with the placeholder concept and variables.

Using reasoning models can be helpful with math, programming tasks, therapy, entertainment and similar uses. For the relatively simple task of asking the AI to describe an image or produce suitable and relevant keywords, I'm not sure that this kind of model and interactivity will be of much use for the vast majority of IMatch users. I might be wrong, as so often

Stenis · August 04, 2025, 12:40:51 AM

It is a big difference changing settings that generally works well in the "place holders" for Descriptions and Keywords or doing it in the Ad-hoc prompt from having a separate dialog-feature. There are several reasons for that.

We already have extra expenses for rerunning different pictures and that is a non-issue really since 1 miljon tokens is enough for processing 1000 pictures with the mini-version. That is peanuts and the bigger version 4.1 costs five times more and that doesn´t leave me sleepless either. Time is money! That is more important and to get it right without messing upp the present prompting.

You know sometimes I like many others might use the same text for quite a few pictures of a similar motif. Then and even with single pictures it would be very good to use a dialog like when using the new reasoning capacities of Google and OpenAI. As you know there is a big different following up with a more specific correction to correct a text than just rerunning the process since the dialog will preserve the context which a rerunning process will not. Chaninging the Ad-hoc prompt is too inefficient since it often will force us to restore what was written there before since that prompt might be used to many pictures.

I always use the Ad-hoc prompt for adding a mandatory first textline with geografic and time data like:

Mandatory first line text: Franschhoek South-Africa 2010 -

So I definitely see a good use for a dialog interface and that is regardless of which language models iMatch supports and I happen to use.

Is that making sense in your mind too?

Mario · August 04, 2025, 08:36:53 AM

Feel free to add a feature request so we can learn of and how many users see a need for that.

As I've explained already, such an interactive discussion feature would be a massive change, transforming AutoTagger from something that runs in the background automatically to something that runs in the UI, with a completely different UX etc.

If you occasionally have an image you want to discuss with ChatGPT to improve results, you can upload a small version to it and discuss it until you have a description of your liking. Then copy & paste from the browser into the Metadata Panel.

If you must discuss every or many images, with multiple turn-arounds each time, well, I'll guess your requirements are very specific and way beyond what most users will ever need. The feature request will tell us.

Stenis · August 05, 2025, 01:22:47 PM

I will try yo give you better examples. I already have "discussed" this with Chat-GPT 4.1 with some of the examples I have had on repeatet occations and these seems to be a known problem with the design of Chat-GPT and the 4.1 API.

Stenis · August 08, 2025, 12:23:12 AM

Hi Mario!

Chat-GPT 5 is out and I just saw the introduction video.
It comes in three flavours 5, 5-mini and 5 nano.
Seems very powerful and more "accurate" and less "hallucinating prone" than earlier models.
Is said to "replace" the need of choosing "the right model" among all the previous ones.

What do you think will be a realistic time plan to:
1. Making the version 5 API-set available (5 and 5-mini)?
2. Making a separate "reasoning" feature to make corrections possible that even the full version 5 might have poblems with? (that might be a project by itself)

My first question to version 5 was:

"Explain which positive contributions adding a reasoning interface to IMatch 2025 can give when it comes to picture analyzing compared to the static prompts for Descriptions, Keywords and Landmarks present today. We are using GPT 4.1 API today. What will version 5 add that 4.1 lacks? We have slight problems today with version 4.1 to get it not writing that animals or plants NOT present in a picture. Are the version 5-Mini model better than version 4.1-mini in identifying species of animals too? Today I need to use the more expensive version GPT-4.1 in order to get accurate texts."

This was the answer I got:

Adding a reasoning interface to iMatch 2025's Autotagger changes the workflow from "single-shot guess" to "think → verify → write," and that brings a set of clear benefits over today's static Description/Keyword/Landmark prompts.

1. Benefits of a reasoning interface for picture analysis
a) Multi-step verification before writing
Instead of producing final text directly, the model first decides if an animal, plant, or landmark is present.

Only if the answer is "yes" does it proceed to describe or name the species/landmark.

This breaks the common GPT-4.1 problem where it says "no animals" or invents one that isn't there.

b) Context-aware output
The model can reason over GPS, season, and surrounding frames to rule out implausible species.

Example: if GPS says "Sweden in December," it won't label a butterfly species that only appears in summer.

c) Consistent formatting and terminology
Reasoning allows the model to fill a schema (e.g., {common_name, latin_name, family, confidence}) before your iMatch templates turn that into prose.

This keeps language and keyword structure consistent across the entire catalog.

d) Built-in uncertainty handling
It can flag low-confidence IDs for manual review instead of guessing.

That saves time later compared to finding and correcting bad tags.

e) Self-critique pass
A reasoning flow can include a final "review" step:

"Remove any mention of animals/plants if no confirmed species were found."

This fixes the "absent animals" text issue without manual intervention.

2. What GPT-5 adds over GPT-4.1
Feature   GPT-4.1   GPT-5
Reasoning depth   Good at step-by-step but prone to shortcutting in single calls   More consistent multi-step reasoning, better at following multi-turn logic
Instruction following   Sometimes ignores "only describe if present" rules   Stronger adherence to "hard" rules and structured output formats
Hallucination control   Improved vs. GPT-4, but still produces "absent animals" lines   Lower rate of irrelevant or fabricated mentions
Structured outputs   Works, but can drift if prompt is complex   Tighter schema compliance, better JSON/XML stability
Cost control   Same pricing tiers   Includes gpt-5-mini for cheaper intermediate steps

3. GPT-5-Mini vs GPT-4.1-Mini for species ID
Better instruction compliance: gpt-5-mini is more likely to obey "only if present" constraints.

Improved classification reliability: noticeable reduction in false positives compared to gpt-4.1-mini.

Still not as accurate as full gpt-5 for fine-grained species recognition — but good enough for a first "presence/absence" pass before escalating tricky cases to the full model.

This allows a two-tier approach:

Pass 1: gpt-5-mini detects presence and candidate species cheaply.

Pass 2: Only low-confidence or complex frames go to gpt-5 for high-accuracy ID and Description text.

4. How this would solve your current issue
With a reasoning interface and GPT-5's stronger rule-following, the flow would be:

Check presence → If false, output no organism data, so your iMatch template has nothing to insert.

If present, run structured identification with confidence score.

Critique step to ensure no "no animals present" phrasing slips in.

Format final Description/Keywords from the clean structured data.

That combination should remove almost all of the "plants/animals not present" lines you see in GPT-4.1, and let you use gpt-5-mini as your default detector to save cost.

Stenis · August 08, 2025, 12:28:01 AM

Below is a CHAT GPT written API schema prompt for iMatch

ready-to-use API schema and prompt for iMatch that implements this "reason → verify → write" method with GPT-5 and mini-fallbacks, so you can drop it straight into your Autotagger setup without trial-and-error.

Awesome — here's a drop-in JSON schema (plus the two tiny prompts) for a "reason → verify → write" Autotagger flow. It's split into:

Pass A: Detect (cheap, gpt-5-mini or 4.1) → strict JSON only

Pass B: Format (only when needed, gpt-5 or 4.1) → strict JSON only

You can use either pass alone, but together they solve the "don't mention animals/plants when none are present" problem.

Pass A — Detection schema (strict JSON)
Use this with response_format = json_schema (strict: true).
If no organisms/landmarks are confidently present, the model must return empty arrays and organisms_present=false, landmarks_present=false. There is no place in the schema to talk about "absence", so nothing about animals/plants gets written downstream.

json
Kopiera
Redigera
{
"name": "imatch_autotagger_detect_v1",
"schema": {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "iMatch Autotagger Detection Result",
"type": "object",
"additionalProperties": false,
"properties": {
"version": { "type": "string", "const": "v1" },
"organisms_present": { "type": "boolean", "description": "True only if at least one organism is confidently detected." },
"landmarks_present": { "type": "boolean", "description": "True only if at least one landmark is confidently detected." },
"overall_confidence": { "type": "number", "minimum": 0, "maximum": 1 },
"needs_review": { "type": "boolean", "description": "True if any important field has confidence < threshold." },
"taxa": {
"type": "array",
"description": "Zero or more identified organisms; leave empty if none.",
"items": {
"type": "object",
"additionalProperties": false,
"properties": {
"common_name": { "type": "string" },
"scientific_name": { "type": "string" },
"family": { "type": "string" },
"rank": { "type": "string", "enum": ["species","subspecies","genus","family","unknown"] },
"count": { "type": "integer", "minimum": 1 },
"life_stage": { "type": "string", "enum": ["adult","juvenile","larva","pupa","seedling","unknown"] },
"sex": { "type": "string", "enum": ["male","female","unknown"] },
"behavior": { "type": "string" },
"confidence": { "type": "number", "minimum": 0, "maximum": 1 },
"evidence": {
"type": "array",
"items": { "type": "string" },
"description": "Short visual cues used to justify the ID (wing bars, petal count, bill shape, etc.)"
}
},
"required": ["scientific_name","rank","confidence"]
}
},
"landmarks": {
"type": "array",
"description": "Zero or more landmarks; leave empty if none.",
"items": {
"type": "object",
"additionalProperties": false,
"properties": {
"name": { "type": "string" },
"kind": { "type": "string", "enum": ["natural","manmade"] },
"confidence": { "type": "number", "minimum": 0, "maximum": 1 },
"external_ids": {
"type": "object",
"additionalProperties": false,
"properties": {
"wikidata": { "type": "string" },
"openstreetmap": { "type": "string" }
}
}
},
"required": ["name","kind","confidence"]
}
},
"keywords": {
"type": "array",
"description": "High-value keywords; keep concise. Omit generic terms.",
"items": { "type": "string" }
},
"notes": { "type": "string", "description": "Optional operator notes; MUST NOT state absence of organisms/landmarks." }
},
"required": ["version","organisms_present","landmarks_present","overall_confidence","needs_review","taxa","landmarks","keywords"]
},
"strict": true
}
Pass A — System prompt (short)
vbnet
Kopiera
Redigera
You are a photo-metadata detector. Output STRICT JSON conforming to the provided schema.
Rules:
1) If no organism/landmark is confidently detected, return empty arrays and set the boolean flags to false.
2) NEVER describe absence. Do not include any free text outside the schema.
3) Be conservative: abstain by leaving arrays empty unless confidence ≥ 0.75.
4) Use GPS/region/season and EXIF as plausibility filters.
Pass A — User content (example payload you provide)
vbnet
Kopiera
Redigera
Context:
- GPS: 59.3293 N, 18.0686 E (Stockholm), Month: May
- Environment: urban park near water
- EXIF: 200mm, f/5.6, 1/1000s, ISO 400
- Optional caption from photographer: "bird on reed"
Task: Fill the schema. If unsure, leave taxa/landmarks empty and set organisms_present/landmarks_present = false.
Pass B — Formatting schema (strict JSON)
Use Pass B only if organisms_present==true or landmarks_present==true. It renders clean prose for Description and a curated keyword list. Because the only inputs are the Pass A JSON + your EXIF/GPS, there is still no path for "no animals present."

json
Kopiera
Redigera
{
"name": "imatch_autotagger_format_v1",
"schema": {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "iMatch Autotagger Formatted Metadata",
"type": "object",
"additionalProperties": false,
"properties": {
"version": { "type": "string", "const": "v1" },
"description": {
"type": "string",
"description": "Final Description text. MUST NOT mention absence of organisms/landmarks."
},
"keywords_final": {
"type": "array",
"items": { "type": "string" },
"description": "Curated, deduplicated, high-signal keywords only."
},
"landmarks_final": {
"type": "array",
"items": { "type": "string" },
"description": "Display-ready landmark names (if any)."
}
},
"required": ["version","description","keywords_final","landmarks_final"]
},
"strict": true
}
Pass B — System prompt
sql
Kopiera
Redigera
You are a metadata formatter. Output STRICT JSON per schema.
Inputs: the Pass A JSON + EXIF/GPS/season.
Rules:
- If taxa is non-empty, write a 1–2 sentence Description including common + scientific names (italicize scientific names with Markdown *only*), habitat/context, and optional behavior.
- If landmarks are present, include them succinctly.
- NEVER mention absence of organisms/landmarks.
- Keep language clean, factual, and consistent; no speculation.
- keywords_final: 8–16 items max, no boilerplate (no "photo", "outdoors", etc.).
Pass B — User content (example payload you provide)
css
Kopiera
Redigera
PassA_JSON: { ...the exact JSON from Pass A... }
EXIF/GPS: { lens:"200mm", f:"5.6", shutter:"1/1000s", ISO:400, gps:"59.33,18.07", month:"May", region:"Sweden" }
Style: concise, neutral, scientific names italicized with *Markdown*.
Example OpenAI API call (Python)
python
Kopiera
Redigera
from openai import OpenAI
client = OpenAI()

# --- PASS A: Detect ---
detect_schema = {...} # paste imatch_autotagger_detect_v1 JSON above

detect = client.chat.completions.create(
model="gpt-5-mini", # or "gpt-4.1-mini"
response_format={ "type": "json_schema", "json_schema": detect_schema, "strict": True },
messages=[
{"role":"system","content":"You are a photo-metadata detector. Output STRICT JSON conforming to the provided schema.\nRules:\n1) If no organism/landmark is confidently detected, return empty arrays and set the boolean flags to false.\n2) NEVER describe absence. Do not include any free text outside the schema.\n3) Be conservative: abstain by leaving arrays empty unless confidence ≥ 0.75.\n4) Use GPS/region/season and EXIF as plausibility filters."},
{"role":"user","content":"Context:\n- GPS: 59.3293 N, 18.0686 E (Stockholm), Month: May\n- Environment: urban park near water\n- EXIF: 200mm, f/5.6, 1/1000s, ISO 400\n- Optional caption: \"bird on reed\"\nTask: Fill the schema. If unsure, leave taxa/landmarks empty and set organisms_present/landmarks_present = false."}
],
)

pass_a_json = detect.choices[0].message.content # STRICT JSON string

# --- PASS B: Format (only if organisms_present or landmarks_present) ---
format_schema = {...} # paste imatch_autotagger_format_v1 JSON above

formatting = client.chat.completions.create(
model="gpt-5", # or "gpt-4.1" for parity
response_format={ "type": "json_schema", "json_schema": format_schema, "strict": True },
messages=[
{"role":"system","content":"You are a metadata formatter. Output STRICT JSON per schema. \nInputs: the Pass A JSON + EXIF/GPS/season.\nRules:\n- If taxa is non-empty, write a 1–2 sentence Description including common + scientific names (italicize scientific names with Markdown *only*), habitat/context, and optional behavior.\n- If landmarks are present, include them succinctly.\n- NEVER mention absence of organisms/landmarks.\n- Keep language clean, factual, and consistent; no speculation.\n- keywords_final: 8–16 items max, no boilerplate."},
{"role":"user","content":f"PassA_JSON: {pass_a_json}\nEXIF/GPS: {{\"lens\":\"200mm\",\"f\":\"5.6\",\"shutter\":\"1/1000s\",\"ISO\":400,\"gps\":\"59.33,18.07\",\"month\":\"May\",\"region\":\"Sweden\"}}\nStyle: concise, neutral, scientific names italicized with *Markdown*."}
],
)

pass_b_json = formatting.choices[0].message.content # STRICT JSON string
How this fixes your pain points
No "absent animals/plants" text: The schemas have no field for absence; when nothing is detected, arrays are empty and booleans are false. Your iMatch template simply doesn't render an organism paragraph if taxa=[].

Cost control: Run Pass A on gpt-5-mini (or 4.1-mini). Only escalate to gpt-5/4.1 for Pass B when organisms_present || landmarks_present and low confidence.

Higher accuracy: The model must justify IDs via evidence and pass a 0.75 confidence bar before anything gets written.

If you want, I can also give you a tiny iMatch Description/Keywords template that reads taxa[], landmarks[], and skips organism paragraphs automatically when arrays are empty.

Källor
Du sade:
What shall I write in the Description prompt? Give me an example

ChatGPT sade:
Here's an example Description prompt you can drop into iMatch's Autotagger "Description" field if you're using the reason → verify → write setup I just outlined.

This assumes your Pass B step is producing structured JSON from Pass A's detection results, and you want the model to format a clean human-readable description from it.

Description prompt example (for Pass B in iMatch)

makefile
Kopiera
Redigera
You are creating the final Description field for a photograph in a Digital Asset Management system.

Inputs you will receive:
- taxa[]: zero or more detected organisms with common name, scientific name, family, confidence, and optional behavior/habitat.
- landmarks[]: zero or more detected landmarks with name, type, and confidence.
- EXIF/GPS data: location, season, and relevant camera settings.

Rules:
1. If taxa[] is empty, write no text about animals, plants, or absence of them.
2. If taxa[] is not empty, include each organism's common name and scientific name (scientific names in *Markdown italics*), and mention key habitat or behavior if available.
3. If landmarks[] is present, include them naturally in the description.
4. Keep the description concise: 1–2 sentences total.
5. Use a neutral, factual tone; no speculation or invented details.
6. Do not include any extra commentary or metadata terms—only a clean prose description.

Example:

Input:
taxa = [{"common_name": "Eurasian blue tit", "scientific_name": "Cyanistes caeruleus", "family": "Paridae", "behavior": "perched on a reed", "confidence": 0.92}]
landmarks = []
location = "Stockholm, Sweden"
month = "May"

Output:
"A Eurasian blue tit (*Cyanistes caeruleus*) perched on a reed in a springtime park in Stockholm, Sweden."
Why this works in iMatch:

If taxa[] is empty, the model's own rules prevent it from mentioning animals/plants at all.

If present, the format is consistent and integrates smoothly with your Description field.

It keeps the output to 1–2 sentences, which is ideal for searchable metadata.