Autotagger performance data for Ollama + LLaVA 7b

KyeW · July 23, 2025, 08:03:05 AM

Autotagger performance data for Ollama + LLaVA 7b
I replied to another post, but was auto-advised to start a new post, because the other thread was old.

So, first time poster. Just purchased iMatch for this very reason.

My wife had 20,000 completely untagged images. Just 20 years worth of typical personal photos.

I used ollama with the llava 7b model.

All images updated with keywords and (descriptions too) - 5 hours in total.
Exceptionally accurate keywords and description.
Finally, AI doing something useful for society :-)

Then a few more minutes to apply those keyword/description updates to the actual images.

Hardware used - my brand new Davinci Resolve workstation;
core ultra 7 (265K)
128GB ram
TUF 16GB RTX 5070 Ti PCIe 5 x16
Samsung 9100 nvmi's. 14GB/sec read/ 13GB/sec write PCIe 5.

This has saved her weeks of work. Now finding the image you want, immediately, is childs play. Like watching magic.

Mario · July 23, 2025, 09:02:34 AM

With your 16GB VRAM graphic card, I recommend to try out the Google Gemma 3 123b model

Code Select

ollama run gemma3:12b
I use this model often and it's vastly superior to the LlaVA7b.
See my initial post about it https://www.photools.com/community/index.php/topic,15013.msg105183.html#msg105183

KyeW · July 23, 2025, 10:04:19 AM

Thank you! I'll try it out.

I forgot to mention your programs best trick.

Twenty minutes before the end of the epic 5 hour update, we got hit by lightning. Crash power outage during the keyword generation process.

When the power came back on, I restated the program and it just seamlessly continued on where it last left off.

No fuss at all.

Very impressive.

Mario · July 23, 2025, 11:02:01 AM

QuoteWhen the power came back on, I restated the program and it just seamlessly continued on where it last left off.

No fuss at all.

Very impressive.

This is how it is supposed to work.

But, a power outage is one of the very few reasons known to be able to damage IMatch databases.
If the database system was writing data to the database file, and Windows confirmed it as written, but the data never reached the disk because it was still in memory when then power failure happened, the database system will never know that some data was not written and that it's internal "model" of the database file no longer matches what's actually stored in the file on disk.

When this issue happens in a rarely used database section, the database system will only find out about the problem the next time it tries to read from the damaged section. And then the "physical database error" will be displayed. Which is fatal.

Please run a database diagnosis via Database menu > Tools to ensure your database is healthy and was not damaged by the power failure.

And don't forget to make daily backups of all your files and keep these backups for a while.

KyeW · July 23, 2025, 03:15:21 PM

Thanks for the heads up. I ran the diagnosis.

Only warning ----------------------------------------------------
Checking Metabase:
Warning: Found 7 duplicate entries in md_tag. Folding the entries into the newest one. should be not pending: [529] 'E:\Pictures\PHOTOS ARCHIVE\2019 Onwards\IMG_20220721_134220.jpg'.
should be not pending: [5135] 'E:\Pictures\PHOTOS ARCHIVE\Older photos\deefor.jpg'.
should be not pending: [11310] 'E:\Pictures\PHOTOS ARCHIVE\Older photos\IMG_0428.jpg'.
should be not pending: [11312] 'E:\Pictures\PHOTOS ARCHIVE\Older photos\IMG_0432.jpg'.
Warning: 4 files marked as pending for write-back which have NO data to write-back. Fixed.
Completed.
-------------------------------------------------------

Other than that (which was fixed), zero problems. Phew!

You were absolutely correct about the Gemma3:12b model being a dramatic improvement over llava7b.

Am re-running the keyword/description generation with it now, on the same 20,000 images.

Will take 11 hours 20 minutes. That GPU is really cranking right now.

Kind regards,
K.

KyeW · July 23, 2025, 03:25:34 PM

Mario, if I could make a polite suggestion. It would be helpful if there was a sticky document somewhere obvious, that showed current best practice for the best AI model for specific video cards, along with a 'for dummies' doc/video on how to set it up to run on all files (or batches of 10,000).

I've got a 20 year IT background, so I can cope with having to go looking for this information from all over the place.

But there are lots of other enthusiasts (potential customers) who would be amazed at having access to the autotagging capability. If it were just more easily accessible.

Either way, I'm very grateful for the immediate help that you've given here.

Mario · July 23, 2025, 03:27:32 PM

I agree. The Gemma3:12b is the best model so far for local use with image descriptions and keywords.
It's a bit slower than LLaVA, though.
On my system, it takes between 5 and 9 seconds per image, with quite complex prompts (very detailed hierarchical keyword list, description, headline and two additional traits per image). But once it's done, the images are finished.

Google recently released a Gemma3n model, which has been designed for "the edge" (PC's, phones) and apparently requires less memory for the same performance than older models. I'm waiting for the Ollama / LM Studio teams to make their software compatible and produce vision-enabled models.

If they work well, they can be an alternative for users with graphic cards having less than 8 GB VRAM. Maybe.
Or the models work faster when there is sufficient VRAM around.

Mario · July 23, 2025, 03:32:20 PM

Quote from: KyeW on July 23, 2025, 03:25:34 PMMario, if I could make a polite suggestion. It would be helpful if there was a sticky document somewhere obvious, that showed current best practice for the best AI model for specific video cards, along with a 'for dummies' doc/video on how to set it up to run on all files (or batches of 10,000)

I don't know more about the models than users. Many users already know more than me.
As for video cards, I only have two (desktop and notebook). I'm sure the tutorials on the Ollama website and LM Studio web site explain very well how to setup these products for various environments and video cards.

There is the very detailed The IMatch AutoTagger help topic.

The AI Service Providers describes the supported models and how to set them up. How to get keys for OpenAI, Mistral and Gemini.

The Prompting for AutoTagger help topic explains all I know about prompting and how to create great prompts for AutoTagger.
The default prompts included in AutoTagger work very well for the vast majority of cases and users don't need to tweak anything to get good results with OpenAI, Mistral and Gemini. Just add your key and run the Default settings for immediate results.

I have made two tutorial videos, showing how to install and use Ollama and one with an AutoTagger overview for the IMatch Learning Center.

Not sure what else I can do. I've literally spent weeks writing all these help topis, prompt tutorials and stuff.
Did you use these resources and still had questions how to get AutoTagger working?

KyeW · July 24, 2025, 05:24:49 AM

Clearly, you are an extremely technically minded person, having created such profound capability in your program. This is rare these days, as people move toward simpler user experiences - albeit at the cost of capability.

This is not a criticism. It's just an observation. My background in UX/UI design involves the removal of complexity through simplification. In the industry, typically, once someone has experience in coding/programming, they're considered unsuited to move across to UX/UI design, because they view problem solving very differently to non-technically minded people.

I wasn't aware of those extremely detailed Help pages. I will go through them as I come to terms with what is quite a different UI and process approach, compared to typical photographic application interfaces.

Mario · July 24, 2025, 09:47:18 AM

The "Reducing UI complexity by reducing the feature set" is used by quite a number of applications. If you need an easier image cataloger, there are many available. That's not what the IMatch user base wants, though.

IMatch is more like Photoshop than Windows Paint. Both are image editors but target completely different audiences. Apple's approach of "If our software cannot do it you don't need it" won't fly with the IMatch user base.

Don't get me wrong; I'm dedicated to making IMatch simpler for the new generation of users. Over the past two years, I've hidden more complex options behind an Expert Mode curtain to prevent overwhelming newcomers and avoid many "shot myself in the foot" scenarios. I've spent months redesigning the user interface with all-new icons and ribbons for a sleeker look.

I will continue trying to find ways to make IMatch easier to use without reducing its feature set. Each feature exists for a reason, as do each option and setting. This may not be immediately obvious to new users, but there is a rationale behind everything.

Every IMatch user utilizes about 50% of the software—but every user uses a different 50%.

QuoteI wasn't aware of those extremely detailed Help pages.

That explains a lot!
It seems some new users don't even know what the <F1> key is for (it's supposed to display context-sensitive help) because they have been trained not to expect documentation (since that's cheaper for vendors) or they've only used more basic software that doesn't require much explanation.

The onboarding process in IMatch automatically opens the help system, though.

The IMatch help system is your friend. It contains detailed explanations of each feature, usage tips, how-tos, and more. The Command Palette makes it very easy for new users to find and use commands, templates, and favorites. There are many free video tutorials in the IMatch Learning Center, and a large number of helpful posts in the DAM knowledge-base.

When you continue using IMatch, you will find that it is a very deep application with many features useful for typical photographic workflows. However, IMatch is not only used by photographers (pro and hobbyists); it's also utilized by libraries, researchers, small stock photo agencies, artisans, scientists, and many other users who need to manage a large number of digital files on a budget.

IMatch's rich feature set and its adaptability to different workflows and environments inevitably come with an learning curve. This is true for all more complex applications. Nobody understands Photoshop, Blender, Resolve, or any capable RAW processor in just a couple of days.

KyeW · July 24, 2025, 04:11:04 PM

As an ex software developer (and one of the rare people who moved to UX/UI design before I retired "youngish"), let me just pass on my respect. I know what it takes to be the sole author of every part of a program. From installation routine, testing, and the billion other tasks that you trailblaze and are often the first human being to solve. Presumably, you've had to look at a problem, think "this is impossible", but know that you've solved impossible problems before, so you just crack on and do it. Yet again.

My perfect wife's needs are making me get up to speed and develop depth in iMatch, at speed.
The capability of iMatch is not lost on me. Nor is the fact that "The IMatch help system is your friend".
I'll go to F1 first. Then search here. Great detailed help documentation is rare. I'll avail myself of it.

If I wanted 'simple', I wouldn't have purchased a license. Which I did yesterday. I think you're pricing is perfect, by the way.

Mario · July 24, 2025, 04:25:27 PM

QuoteI'll go to F1 first. Then search here. Great detailed help documentation is rare. I'll avail myself of it.

Good workflow. Believe it or not, I have to refer to the IMatch help frequently myself. I cannot keep all of this in my head.

QuoteI did yesterday. I think you're pricing is perfect, by the way.

Very kind of you, thank you. And thanks for your business.
I believe in delivering a quality product and excellent support for a fair price. It's a German thing...