AutoTagger Prompt Engineering

Started by monochrome, February 06, 2025, 11:31:51 PM

Previous topic - Next topic

monochrome

I've run about 10k images through the AutoTagger and while the results are good, there are three things that I think makes it less useful than it can be and this thread is an attempt at fixing that by developing good prompts for the AI. (Note - I'm using OpenAI)
  • It hallucinates. The AI makes up context that isn't there. For example, it assumes that any adult near any child is the child's parent. Sometimes right, mostly wrong. Also, it has an inner art critic.
  • Not very searchable language. I want "flowers" not "floral composition", so I can grunt something simple into the search bar and get results.
  • Using image metadata (time taken, GPS, people tagged) in an intelligent way.
I'll start off, trying for something for points 1 and 2, changes from default in bold:

Quote[[-c-]] Describe this image in a style to make it easily searchable. Use simple English, common words, factual language, and simple sentences. Avoid describing anything not directly observable from the image.

This turned:

QuoteA mother and her two young daughters are seen outdoors, kneeling beside a flower arrangement in a residential area. The children, dressed in bright, colorful clothing, appear engaged with the plants as they explore their surroundings.

Where the family relations are completely fabricated and makes up what the children are doing, into:

QuoteThis image shows a woman and two young children near a flower cart. They are looking at colorful flowers. The setting is a sidewalk with a building in the background. The area appears peaceful and has some greenery.

And:
QuoteA close-up view of a still life arrangement featuring a variety of colorful flowers intertwined with clusters of dark blue berries, showcasing the beauty of nature's bounty.

Into:
QuoteThis image shows a close-up of various flowers and fruits. There are purple blueberries and clusters of small, vibrant flowers. The flowers include a yellow flower, a pink flower, and a white flower with a yellow center. The background is green, indicating a natural setting.

Mario

#1
Very good, thanks for sharing.

Stenis

Quote from: monochrome on February 06, 2025, 11:31:51 PMI've run about 10k images through the AutoTagger and while the results are good, there are three things that I think makes it less useful than it can be and this thread is an attempt at fixing that by developing good prompts for the AI. (Note - I'm using OpenAI)
  • It hallucinates. The AI makes up context that isn't there. For example, it assumes that any adult near any child is the child's parent. Sometimes right, mostly wrong. Also, it has an inner art critic.
  • Not very searchable language. I want "flowers" not "floral composition", so I can grunt something simple into the search bar and get results.
  • Using image metadata (time taken, GPS, people tagged) in an intelligent way.
I'll start off, trying for something for points 1 and 2, changes from default in bold:

Quote[[-c-]] Describe this image in a style to make it easily searchable. Use simple English, common words, factual language, and simple sentences. Avoid describing anything not directly observable from the image.

This turned:

QuoteA mother and her two young daughters are seen outdoors, kneeling beside a flower arrangement in a residential area. The children, dressed in bright, colorful clothing, appear engaged with the plants as they explore their surroundings.

Where the family relations are completely fabricated and makes up what the children are doing, into:

QuoteThis image shows a woman and two young children near a flower cart. They are looking at colorful flowers. The setting is a sidewalk with a building in the background. The area appears peaceful and has some greenery.

And:
QuoteA close-up view of a still life arrangement featuring a variety of colorful flowers intertwined with clusters of dark blue berries, showcasing the beauty of nature's bounty.

Into:
QuoteThis image shows a close-up of various flowers and fruits. There are purple blueberries and clusters of small, vibrant flowers. The flowers include a yellow flower, a pink flower, and a white flower with a yellow center. The background is green, indicating a natural setting.


Brilliant, thanks a lot
I'm also using Open AI.
Have also seen that some input sometimes is needed for some pictures.

It is very good that we can combine general input like yours at the Description anf Keyword elements in the AutoTagger setup with more specifik input in the Autotagger dialog window.

Stenis

Thanks for your input again Monochrome. It helped a lot the Description texts got clearly a lot more precise and hallucination free,

What I did myself to get a better set of keywords was to add that I don´t want place info or info about the year a picture was taken.

In Preferences\AutoTagger I added:

[[-c-]] Return five to ten keywords describing this image.
Use simple English, common words, factual language.
Don´t save place data or time info.  (It helped a lot)


Stenis

#4
When tagging some images from Paris with Open AI mini-40 I "think" I have noticed a slightly better quality and detailed info in Autotagger text for JPG-images than RAW. If I deliberately let the AT make different text for a RAW-JPG pair I think I often can see this effect.

The JPEG-example:

Paris France 2017 - Restaurants and shops - "The shop La Cure Gourmande is located on Isle Saint-Louis. It features a wooden exterior with large windows displaying various food items. The interior is filled with colorful products and treats. The shop specializes in pastries and sweets, inviting customers to explore its offerings."

The text I got for the RAW-file was not able to localize the shop to Isle Saint-Louis for example.
Often the RAW-texts seem to have more of fluffy hallucinations than the JPEG-file texts.

Has anybody else seen this pattern?
Can it be because the undeveloped RAW-files iMatch has to handle is harder to interprete than a JPEG?
The problems with the undeveloped RAW-files seems to be a too wide dynamic range that makes them soft and less contrasty and on top of that less sharp.

Since I first had an idea to just tag the RAW and then later export the JPEG that doesn´t really seem to be a good idea.
Another reason for that workflow has been the virtual copies of the RAW I sometimes do in DXO Photolab.
In that case the metadata has to be present in the RAW before I do the virtual copy otherwise in will not get any metadata at all.
I have read recommendations here to first develop the images and then add the metadata but I can´t do that of these reasons.

I also always take RAW+JPG in camera and to do that I just might have found another reason to do so.
In iMatch I have found that the "Run once for all selected files" then works very well is I select a "JPEG"-file last and make that file the master for all that set of files.


Mario

IMatch uses the JPG or the embedded preview JPG of your RAW. See The Cache
The default image size for OpenAI is 512 pixels. Just process your JPG and your RAW in whatever image editor you use to see if there is any notable difference. By any means, IMatch cannot do anything "different" anyway. What you see in the Viewer is what it sent to the AI, just resized to 512.

If your RAW files have no or too small embedded previews (check with Help > Support > WIC Diagnostics), IMatch will use Windows WIC or LibRaw to "develop" the file. And this result will vary from whatever your RAW processor produces.

Keep in mind that sending the same image twice to the AI with the same prompt usually produces slightly (or not so slightly) different results!
Unless you dial down creativity and set a seed to "lock" things in. The latest models from OpenAI don't even support a creativity setting (temperature) anymore.


Stenis

Quote from: Mario on April 26, 2025, 08:14:37 AMIMatch uses the JPG or the embedded preview JPG of your RAW. See The Cache
The default image size for OpenAI is 512 pixels. Just process your JPG and your RAW in whatever image editor you use to see if there is any notable difference. By any means, IMatch cannot do anything "different" anyway. What you see in the Viewer is what it sent to the AI, just resized to 512.

If your RAW files have no or too small embedded previews (check with Help > Support > WIC Diagnostics), IMatch will use Windows WIC or LibRaw to "develop" the file. And this result will vary from whatever your RAW processor produces.

Keep in mind that sending the same image twice to the AI with the same prompt usually produces slightly (or not so slightly) different results!
Unless you dial down creativity and set a seed to "lock" things in. The latest models from OpenAI don't even support a creativity setting (temperature) anymore.



Only vendor specifik converters like Sony's own Imaging Edge reads their own proprietary tags in EXIF. Only vendor specific  converters vill show both undeveloped RAW qnfåd JPEG that looks the same right out of the camera.

In iMatch there is a clear difference between my RAW and JPEG- files - even the ones processed in Photolab because iMatch doesn't care to read the proprietatry DXO DOP-files and that is the main reason to these quality differences.

Of course I'm aware about that to different iterations of jobs sent to OpenAI will give slightly different results but it seems that in my case the JPEG- files seems to give the best r3sults generally. Probably because the processed files holds mire details.

Mario

#7
QuoteiMatch doesn't care to read the proprietatry DXO DOP-files and that is the main reason to these quality differences.
And how would IMatch be able to do that?
All of this stuff is undocumented, proprietary and specific to a specific version of the DxO rendering engine.
Or the C1 rendering engine. Or the Lightroom rendering engine, or ...

Changes and development settings you make and apply in a RAW processor are inaccessible for other applications unless you persist them, e.g. by exporting a JPG or TIFF or PSD file.
I thought everybody is aware of how RAW processors work by now. All edits are only visible in the RAW processor that has created them, unless you export a standard image file. IMatch cannot see the edits or apply them.

IMatch prefers the embedded preview images in your RAW files. Your camera should know how to do a good rendition of your file. If there is no preview image or the preview image is too small, IMatch uses LibRaw or WIC to "develop" a viewable image. Of course what LibRaw / WIC produce will differ a lot from what a good RAW processor will do.

The DNG format is superior because it allows to store both the RAW data and a high-quality preview. And a RAW processor can use that to store the RAW, it's settings and an up-to-date preview image representing the RAW with all changes applied.  But only Adobe does that, AFAIK.

So your experience is totally normal and to be expected. If you want to see your RAW images in IMatch like you see them in your RAW processor, export them or use DNG files in your workflow (if your RAW processor supports them).

Of configure the JPG as the proxy in a version relation.

Or autotag the JPG and apply the results to both the JPG and the RAW.