New Open AI Model

Started by Mario, April 17, 2025, 01:29:39 PM

Previous topic - Next topic

Mario

A couple of days ago, OpenAI released their latest model, the o4* generation.

I've just finished integration for the affordable o4-mini model and the next release of IMatch will support it.
There were some API changes to deal with, some parameter names differ between the gpt-40* models available so far and the new o4* generation etc. But it's now all working. I will support the gpt-40* models for a while, as OpenAI still offers and supports them.

Stenis

#1
I asked you about the status about the support for the newest version of the OpenAI API 4.1 yesterday. Sorry if it was not in a  place you expected it to be and that you had issues wwith the semantics.

Concerning my use of the word "interface" I did not mean the interface displaying the models we can chose between there but the interface between imatch and the new API.

"There were some API changes to deal with, some parameter names differ between the gpt-40* models available so far and the new o4* generation etc. But it's now all working. I will support the gpt-40* models for a while, as OpenAI still
offers and supports them."

As you write above which is an excerpt from your yesterday posted in your earlier comment to my earlier post this was just what I ment. Call it what you want but for me this is just an example where there is a need to adapt your interface of iMatch to the API of OpenAI 4.1.

When I looked at this site yesterday I must have missed this particular tread - or your tread had not even been posted yet. No strange with that since I always works in the nights till around 4 in the mornings  - the reason is that it is the only time I can count on not been disturbed and get something done with my archive.

I will probably continue posting things in the wrong places in your structure  - sorry for that - but I might not being really used to all these headlines in big fonts that is not what determines these structures since it is much more in the substructures. Hopfully I will learn better.

My question was not just about the mini-model either. I use the bigger model as well (GPT--40) and if the newer 4.1 mini as good as the old 4.0 interpreting pictures, support for the new mini-model will be highly important and much anticipated, that was why I posted the link with all those diagram examples that pointed to some really remarkable gains with the new models both when it comes to efficiency and costs.

The new models are much faster as well and that is the biggest problem with GPT-40 for me today since it takes at least 20 seconds per picture today for me. If the mini- model can make the same job with the same quality in the "Descriptions" as the bigger model gives today but with the speed of the mini -model od OpenAI or like Gemma 3 witch makes the same job in 3-4 seconds that will be a true game changer for both quality and productivity.

I posted my link at 01:29:24:

https://www.photools.com/community/index.php/topic,15128.0.html

You posted this tread at 01:29:39




Mario

#2
QuoteI will probably continue posting things in the wrong places in your structure  - sorry for that -
Please try not to. Read the descriptions. The community moderators have only so much tolerance...

QuoteThe new models are much faster as well and that is the biggest problem wit
What makes you think that? The marketing blurb posted on the OpenAI website?
Maybe for some specific use cases for reasoning models not relevant for IMatch.

I see no real difference in response time, latency or quality between gpt 4.1 and o4* and the previous models so far.
For my standard prompt (description, structured keywords and Headline trait) o4-mini responds in between 8 and 15 seconds, depending on how busy OpenAI's data centers and Cloudflare are. Depends a lot on the prompt, of course.

Stenis

#3
We'll see. I have to test on my own material. What is most important for me too is the accuracy with which the mini-models actually follows my instructions too. In that respect Gemma has been worst. Generally and verbally it is very good but I don't like the lack of structure and how bad it follows instructions for my  "mandatory textlines" I need in Descriptions". In that respect OpenAI has been fully satisfactory. I want the first lines in Desciptions to hold structured info about place country and year a picture is taken. Only OpenAI so far has fixed that with my prompting.

So far only GPT-40 has been able to live up too my demands when analyzing image content when it comes to animal species and deliver species, their latin name and the family they belong to. GPT-40 has been almost 100% on that so far with many houndreds of pictures tested in production (very happy with that). Both 40-mini and Gemma was hit and miss and disappointments in that respect.

If the new mini-model manages to interpret species accurately and get even better writing and structuring my mandatory structures of text I'm home. The present mini-model and Gemma takes much more of prompt adaptions than the GPT-40 does.So using GPT-40 today is much more simple and convenient and less labor intens than the others of that reason. So what I hope for is a higher productivity than we get today by even better accuracy, better followed instructions and better image analyzes. If their datacenter is overloaded or not can of course affect the throughput but that is really out of the scope discussed here.

If a service provider overstates their services performances that will very soon be verified in a number of already present standardized tests on the net that makes comparissons over time possible and that is already happening to a seemingly large extent now. So I don't think OpenAI would get away with that all "marketing bull" you talk about that easy.

The difference between our opinions and experiences and those sources is that they are using standardized empiric tests and I think I trust them far more than the ones we are able to conduct here even if many of them are covering usage in manny other fields than "vision" and image analyses that are far out of scooe when it comes to the way we use these API:s here.

... and the expert on my way to use these API:s is myself and nobody else and I have so far had no problem to see what works most effectively for me. What is very important though is to distinguish between the real empirically verified technical performance in various benchmarks of these different API:s and the most effective use of them in my own or even your workflows and that is the real issue here when it comes to my real benefits of these new versions.

It shall be very interesting to see how these new API:s will perform IRL and it is very good iMatch is as future safe as it is as it is able to adapt to the rapid development by adopting new API:s as they are released and you are able to make them available for us. I won't disturb you  for now so you better can concentrate on preparing that new version. Good luck with that task!

Mario


Stenis

Lovely Mario!

Thanks for your info. Looks very interesting.

Stenis


Quote:

"AutoTagger already has configurations for the Gemini AI and these models, all you need is to get yourself an API key. Pricing is very affordable. Details about Gemini in AutoTagger and how to get an API key is described in the  corresponding help topic."

Really? Where? I also have read the documentation too.

By the way, if you need some help testing these new models just tell us.




Mario


Stenis

Not impressed with the speed at all. The new 4.1 model has been flagged to be faster and after a very short try I experienced 50% longer process time for both 4.1 and 4.1 mini than forvthe 4.0 versions.

It might write better and hallusinate less about "blue skies etc" but it definitele feels slower than I expected.

What is your initial opinion?

... and when I tried Gemini it crashed:-(

I'll might give it a new try in the evening after reading a bit more at Google and maybe regenerate the API string.

Mario

#9
Quote... and when I tried Gemini it crashed:-(
What crashed? IMatch? Gemini? Your computer?
If IMatch crashed, did you see the message about the Debug Dump File? If so, did you upload it and send me an email?

QuoteNot impressed with the speed at all.
Did you enter the correct rate limits for your API key and usage tier in Edit > Preferences > AutoTagger: Rate Limits.
The defaults are for trial/free API keys.If the rate limit RPM is e.g. 3, AutoTagger will only use one thread, performing each request after another. This will severely reduce performance.
I get very good response times with the correct settings.
See Rate Limit in the IMatch Help System.

Stenis

#10
Yes Gemini crashes with error messages repeatedly. Sometimes it seems to miss to update especially my Sony ARW RAW-files.

Skärmbild 2025-05-01 011223.jpg

... with that said Gemini is just fantastic. It is just blistering fast compared to OpenAI on my computer. It was a BIG BIG difference between how the Rate Limits were set in Open AI and Gemini:

Gemini:
Reguests
per minute 2 000-4 000

In OpenAI it was set to 3 as example and it was the same with all four parameters but it still was slow when I had added about the same settings as for Gemini.
Why are the defaults so different Mario?

Gemini will give an extremely high boost in productivity as I have seen it compared to OpenAI at least as it works now in my computer, so I will probably switch.

I don´t know why I had big performance issues initially even with Gemini. At least that problem is history now but I still have these update problems with some RAW-files that Gemini fails to update.

Despite that this is a big leap forward!
Thanks for implementing Gemini!


I have tested new updates with the two Gemini-models and now it worked without any problems both of them so it seems a little unstable.




jch2103

See my post about Gemini https://www.photools.com/community/index.php/topic,15162. My issue may be separate from yours, and may just be too much demand for limited resources on that end. 
John

Mario

QuoteWhy are the defaults so different Mario?
Because each AI vendor imposes different rate limits for different models.
The OpenAI free tier (also applied during the first couple of days of use) is 3 RPM (requests per minute).
I even included a link to the OpenAI page where all of this is listed and explained, but you probably did not bother to look.

When AutoTagger reports "Files with Problems" it is not a crash.
It's just AutoTagger telling you that the AI you use has reported a problem. Each AI and sometimes model has different failure modes, from overload to license key issues. The log file contains more information.
All of this is of course documented in the AutoTagger help so see there.

Stenis

#13
So far, I haven´t seen any "Autotagger: Files with problems" yet after activating the billing for Gemini 2.0.
From what I have seen after the upgrade that problem is gone and I think that isn´t surprising at all when considering the limitations of the free variant. I think the price/performance info-sheets at Google makes it pretty clear what to expect of these limitations.

I´m also happy to see that the problems I had with Gemma to get a "Mandatory textline" on the first line with some geographic data etc also works:

"Karpatos Island Greece 2016 - in Pigadia Town - "

If I wrote that with Gemma 3 4  it started to rewrite that in its own little creative way without any of the structure I wanted at all.  I did not appreciate that. So, so far almost everything with Gemini 2.0 Flash is a success in my eyes and since it is so much more efficient than OpenAI the way I have used it I will migrate to Gemini for now. I think Gemini 2.0 Flash works so well already now that I consider my testperiod being over really. From now on it is in production.

I also have to say that I'm really pleased with the "Map"-funtion and the reverse look up too. The one in PhotoMechanic has been a pain. I use Google:s API for that too but prefer the "OpenStreetMap" every day in the week because they often are way more detailed. I don't understand why I have ended up finding Google:s maps so much more inferior.

All in all I think iMatch is a very efficient option now with all these functions finally in place. I also have to admit that I did not expect AI -generated image texts and keywords should be able to meet my demands - but they have - and that will save me tons of time. Most other metadata can mostly be batched with templates and variables but until now that hasn´t been the case with image Descriptions and Keywords. iMatch even mostly handles to write the names of confirmed faces into the texts in a stylistic manner. It is not all that much more to wish for really.