Some experiences from using the new Open AI 4.1 and Google 2.0 Flash Models

Started by Stenis, May 07, 2025, 02:30:51 AM

Previous topic - Next topic

Stenis

The older Open API 4.0 models and Gemma 3 models with Autotagger

First I was very thrilled just by seeing these different AI-supported texting in Description and Keywords really functioning but soon I started to realize there was quite a few challenges with them to get them to follow my intentions via the three field specific prompts and the general forth prompt. It was also very easy to spot some limitations and peculiarities:

OpenAI GPT 4.0 Mini was OK in general and reasonably fast (4 sek/picture maybe) and it also managed to keep a mandatory structure I wanted on the first line.

A Description example: Marrakesh Morocco 2025 - In the Medina - (Autotagger text)

BUT, it wasn´t at all possible to use on my safari pictures where I wanted it to add "Specie"."Family" and "Scientific Latin name". It was in fact hopeless and mixed Thopssons gaselles with Grants Gaselles, It missed totally DIK-DIk antelopes and Waterbucks and mixed up Wilderbeasts with buffaloes and everything in between.

BUT the bigger OpenAI GPT 4.0 nailed almost every specie I throwed at it but to a substantially higher cost but at that time I had no problem to accept that because it just ha to be correct.

I also tried Gemma 3 4GB with Ollama and I liked ho Gemma treated the langage. It wrote the language more vivid and better than these earlier OpenAI-models BUT:

- First, it was totally unaware of the "mandatory" demands I had around the first line of text which for me was a dealbreaker
- Second, it was as bad as the GTP 4.0 Mini to nail thge species.


Then we got an upgrade of the OpenAI-models and we got Google 2.0 Flash and 2.0 Flash Lite and that changed everything because both lived mostly up to what OpenAI and Google had promised in their marketing news.

Both OpenAI GPT 4.1 Mini and Gemini 2.0 Flash Lite were much better in nailing the species I trow at them nailing almost 100% but there they split because the GPT 4.1 Mini is much slower than Gemini 2.0 Flash Lite.

Gemini 2.0 Flash Lite is now my choise because it both solves to keep the mandatory structure I demand, writes really well, identifies species really well and even identifies "landmarks" in a way that often is nothing but astonisching. On top of that it is really fast and efficient as a flash compared both with the older OpenAI 4.0 models and the new ones. I think it is also cheaper to run than the competition. What this extra speed has done to my workflow´s efficiency and my productivity can not be over estimated. My wait states are almost gone compared to how it was before. Så what Google has manage to do with Gemini 2.0 Flash Lite is to both lift the quality of the output, increase the output speed and to lower the prices for the service. That will be hard to beat.

Together with the new iMatch-implementations of the Gemini 2.0 Flash-models this will be a true game changer and it will lift the metadata maintenance to a whole new level compared to how it was just a couple of weeks ago. It is hard to overestimate the importance of speed for productivity and this well designed Autotagger interface fits right into these fast workflows Gemini 2.0 Flash now has made possible for all of us.

BUT there is still a few stains! So far I have seen a few problems and the worst is that it sometimes skips to update especially RAW-files. Sometimes it is just to update them again but I have seen cases where I have to use the bigger Gemini 2.0 Flash-model instead in order to solve it. It is easy to spot these cases since they lack the yellow pen-symbol after the update.

Still all of them hangs initially for me and has to be released with the famous three finger grip of Shift-Alt-S.

Note: Concerning the OpenAI-models you also have to change and increase the "Rate Limits" for each model because they need to be changed to met or match the performance limits in the spec sheets at OpenAI. 





Mario


Stenis

Sorry for the spelling! I must have been tired :-)
I wanted to correct it but it seems closed for editing now.

I will add a few examples later so you get a hint about how good these improvements in Gemini 2.0 Flash Lite really are.

Jingo

I have also been playing with the Gemini Lite models for my bird photography and it is EXCELLENT at identifying and describing the birds. The ability to take the AI keywords and map them to the thesaurus and only file the hierarchical keywords is a big time save for me. Game changer.... Well done Mario!

Example Images fed in (keywords were not present already):

9aj4w7RD6n.png

Results:
qkMMjbQxJ0.png


Smaller harder to identify bird:
IMatch2025x64_GFxwxZMSly.png

Results:
IMatch2025x64_23Mafq45IU.png

One more for good measure.. looks like a Willet right?  But, notice the legs are yellow and not blue/green.... will the AI get it right?

IMatch2025x64_hgcqC3triS.png

Bingo! Amazing...

AI|Wildlife; Animals|Bird; Aquatic; Grey; Habitat; Long Beak; Migration; Objects|Water; Ornithology; Places|Nature; Shorebird; Shoreline; Spotted; Wading; Yellow Legs

The bird in the image appears to be a Lesser Yellowlegs (Tringa flavipes). This medium-sized shorebird is characterized by its long, slender, bright yellow legs and a slightly upturned, straight black beak. It has a greyish-brown back and wings, with a white belly.  Lesser Yellowlegs are commonly found in wetlands, marshes, and along coastlines, particularly during migration. They primarily feed on insects, small crustaceans, and aquatic invertebrates, which they forage for in shallow water. During the breeding season, they nest in boreal forests of North America. They are known for their call, a distinctive series of two to four clear, whistled notes.

Mario

Impressive. Keep in mind to switch to a larger image size when the object in the image is quite small - this gives the AI more pixels to work with. Costs a bit more, though.

Stenis

Thanks Jingo for those nice and good examples that just confirms my own experiences using Gemini 2.0 Flash Mini. For me this is a double improvement. No need to use the bigger models for jobs like this that I was forced to do with OpenAI GPT 4.0. It gets cheaper and also is so much faster.

It seems like Gemini is far better to utilize the treads and perform efficient parallell processing than Gemma and OpenAI is and was.

Thanks for the input about the image size Mario. I have also noticed though that some pictures with animal on a distance that the earlier generation missed actually gets handled perfectly now with the Gemini Lite. It is really a very big difference and the new Gemini models performs  in a way that make them much more reliable than the older models.

So Mario your implementation of Gemini 2.0 Flash is really a fantastic selling point for the whole iMatch platform that you didn´t have a couple of weeks. I remember you were a little sceptic at first :-) and Jingo is so right about his comments on the Keyword automation and about how the Teasarus kan be optimized to solve a keyword-problem many other systems do not have any solutions for.

It looks like most things around the AI-automation of Description-texts and Keywords and Theasarus maintenace just fell in the right places together with the iMatch Autotagger.