Best Strategy for 10,000 Unsorted Scans & AI Description Accuracy

Started by biglime5590, June 02, 2025, 05:33:37 AM

Previous topic - Next topic

biglime5590

Hello IMatch Community,

I'm a new IMatch user (and new Mini PC owner, specifically for IMatch as a Mac user!) and am incredibly impressed with its depth after just a few hours. I'm looking for guidance on tackling a large project: ~10,000 scanned family photos, currently with zero metadata.

My goal is to comprehensively tag each photo with keywords, locations, events, descriptions, and people.

I. Overall Workflow & Batch Processing Strategy:
I'm aiming for maximum efficiency. The Face Recognition tool is already proving very effective for "low-hanging fruit."
  • Phased vs. Per-Photo Approach:
    • Would you recommend a phased approach (e.g., Pass 1: All faces; Pass 2: Locations/Events; Pass 3: Keywords/Descriptions)?
    • Or, is it more efficient to process each photo completely in one go?
  • Handling Disorganized Scans: My scans are completely unsorted. Photos from the same event might be thousands of images apart.
    • What IMatch features or strategies (e.g., Collections, Stacks, Timeline use, specific batch processing) are best for grouping these scattered event photos as I discover them?
    • Are there particular "first pass" metadata fields (e.g., rough dates, initial event guesses) that make subsequent, more detailed tagging easier within IMatch?
  • Adding "Year Taken" Information: Many photos will only have an estimated year.
    • What's the recommended IMatch workflow for adding an approximate year to each photo?
    • Should I, for example, create a custom tag for "Estimated Year," populate that, and then use a tool like TimeWiz to batch apply this to the appropriate EXIF/IPTC date fields? Or is there a more direct method?
  • AI for Date Estimation:
    • Is there any existing or potential workflow (perhaps involving scripting or external tools integrated with IMatch) where AI could assist in guessing a photo's date? For instance, if IMatch knows the people in the photo and their birthdates, could AI leverage this to estimate an age for the subjects and thus a likely year for the photo?
  • General Setup: Any other IMatch setup tips or considerations for a project of this scale to ensure success and avoid rework?
II. AI Autotagger (via Variables & External AI - e.g., Gemini Flash):
I've successfully used person variables to feed AI prompts for descriptions, which is amazing! However, I've encountered an issue:
  • Action Misattribution: The AI (Gemini Flash) often knows who is in the photo (from the variables) but arbitrarily assigns actions to the wrong person (e.g., says "Ben is on the swing" when it was Jim, even if Ted was also present).
  • Feeding Face Region Data to AI:
    • Is there a way within IMatch, or a known workflow, to provide the AI with face region data (i.e., bounding boxes/coordinates for each recognized person) alongside the names? This could potentially allow the AI to correctly associate actions with the specific individuals performing them.
    • Maybe I could export versions of photos with face annotations burned in, have the AI process those, and then somehow map the generated metadata back to the originals using IMatch's stacking or versioning features? Or is there a more integrated solution?
  • Alternative AI Prompting/Tools: Are there other IMatch features or prompting strategies I should explore to improve the accuracy of AI-generated descriptions, especially regarding who is doing what? Or any other ideas of how to use AI to make this whole process easier?
Thanks so much for any guidance you can offer.

mastodon


sybersitizen

That's a lot of questions! I'm sure you'll get a lot of answers, but I'll just start with one recommendation:

Don't try to analyze all 10,000 photos at once for all of those things. Just correctly identifying people can be pretty challenging and time consuming. I have many more than 10,000 photos, and for that process I'm doing it in chunks of maybe a few hundred at a time. Otherwise I'd be wading through mountains of unconfirmed faces for each person I want to create.

biglime5590

Thank you both for the replies.

@mastodon - These scans are all just paper. No film.

@sybersitizen - Appreciate that advice! Very sage and will definitely chunk the Face ID work.

mastodon

Then there is no clue for grouping the photos. It is easer to tag faces, if there are many of them from the same person. So, better tke the photos that are from an event, if it is possible (bithday, any anniversary). Tag the persons first, who are in the most pictures.

Mario

2. and 4. Use an AI first approach with IMatch AutoTagger
Spending a few dollars for OpenAI or Google will produce descriptions and keywords for your images.
Since IMatch groups files by keywords in the special @Keywords Category, files showing the same motive will be grouped together, wherever they are in your file system.

3. See Working with Uncertain Dates in the IMatch Help and Lost in Time? Tackling Uncertain Dates with Old Photos and Digital Asset Management in the IMatch knowledge base for useful approaches and tags to use.

Regarding II: Improve your prompts! Tell the AI what you expect it to return, give examples etc. 
Prompts always depend on the model you use, your images, what results you expect, what your focus points are etc. Landscape photography may require different prompts than family photos or bird photography. Spending a bit of time to figure out good prompts is always worth it.

Date estimation is really unlikely, unless your images cover widely known events and places. An AI will be unable to e.g. guess dates for family photos. Maybe it can make a 50/50 guess based on clothing style, machinery or cars visible in the image. You must ask it explicitly in your prompt for that.

2. Face regions: Never tried that and I doubt it is needed. The AI will detect faces and other objects in images automatically, this is part of it's job.

biglime5590

@mastodon - Thank you for your input! It is appreciated.

@mario - Thank you SO MUCH for these responses and guides. I will take advantage of AutoTagger, and will read those articles about uncertain dates.

One point I'm hoping to follow up on:

Action Misattribution: The AI (Gemini Flash) is fed the people metadata from a variable, but how would it be able to associate a person's name with a SPECIFIC person in an image (say there are three people, how would it know that Mary is the one holding a baby, or that Ted is the one pushing the lawn mower, while Bob is the one watering the grass?" Based on a lot of tinkering with prompts, it seems to me that AL will just make a guess. I'm assuming it doesn't have any face region data, which I don't believe is available to pass from a variable. 

This is why I was thinking -- if the image being analyzed had the face annotations burned in -- then the AI could not only detect the faces, but then could have enough context to assign specific actions to specific individuals correctly. 

What do you think? Am I missing something, or misunderstanding how this works?

Mario

The variable returns the names of persons in the order you have created (Ordering Persons in Images), right?
So, maybe include that information in your prompt, adding something like "The people shown in this image are, in this order from left to right: {File.Persons.Label.Confirmed}". Worth a try!

Face regions positions and dimensions are not available as a variable because there is no need. Regions are complex, repeatable tags, broken down into about 12 individual tags.
If you really must do something like this, you can feed the AI the region x/y and w/h values using the XMP tags, e.g.

These are the face boundiong boxes:
X: ({File.MD.MWG::Regions\RegionsRegionListAreaX\RegionAreaX})
Y: ({File.MD.MWG::Regions\RegionsRegionListAreaY\RegionAreaY}
Width: ({File.MD.MWG::Regions\RegionsRegionListAreaW\RegionAreaW\0})
Height:({File.MD.MWG::Regions\RegionsRegionListAreaH\RegionAreaH\0})
These are the person names: ({File.Persons.Label.Confirmed})

I have never needed or tried this. See what Gemini makes of it.

biglime5590

Amazing! This is awesome insight. Really appreciate it. I'll try both the ordering route, and this XMP region x/y business, to see if I can improve the results. Thanks once more!

jch2103

I would tackle this in batches. I went through a similar exercise, although in my case it's been spread over several years.

I would assign approximate dates (see Mario's links); as you add more data you should be able to refine then. I'd recommend taking advantage of the IMatch Events feature (holidays, birthdays, etc.) as well as People. You may be able to add Locations (using a Metadata Template) to give another way to refine the data. 

Good luck! Let us know if you have other questions, of course. 
John

Mario

Quote from: biglime5590 on June 03, 2025, 08:00:58 PMAmazing! This is awesome insight. Really appreciate it. I'll try both the ordering route, and this XMP region x/y business, to see if I can improve the results. Thanks once more!
Note: I've had the X coordinate twice and a random = in my variable (typing on mobile). I've corrected the variable in my post. Test in VarToy!
Playing with the prompt and trying different routes is often required when somebody wants very specific responses from the AI. Giving the AI more context and telling it what you expect is helpful. It's a learning process.

Stenis

Quote from: biglime5590 on June 02, 2025, 05:33:37 AMHello IMatch Community,

I'm a new IMatch user (and new Mini PC owner, specifically for IMatch as a Mac user!) and am incredibly impressed with its depth after just a few hours. I'm looking for guidance on tackling a large project: ~10,000 scanned family photos, currently with zero metadata.

My goal is to comprehensively tag each photo with keywords, locations, events, descriptions, and people.

I. Overall Workflow & Batch Processing Strategy:
I'm aiming for maximum efficiency. The Face Recognition tool is already proving very effective for "low-hanging fruit."
  • Phased vs. Per-Photo Approach:
    • Would you recommend a phased approach (e.g., Pass 1: All faces; Pass 2: Locations/Events; Pass 3: Keywords/Descriptions)?
    • Or, is it more efficient to process each photo completely in one go?
  • Handling Disorganized Scans: My scans are completely unsorted. Photos from the same event might be thousands of images apart.
    • What IMatch features or strategies (e.g., Collections, Stacks, Timeline use, specific batch processing) are best for grouping these scattered event photos as I discover them?
    • Are there particular "first pass" metadata fields (e.g., rough dates, initial event guesses) that make subsequent, more detailed tagging easier within IMatch?
  • Adding "Year Taken" Information: Many photos will only have an estimated year.
    • What's the recommended IMatch workflow for adding an approximate year to each photo?
    • Should I, for example, create a custom tag for "Estimated Year," populate that, and then use a tool like TimeWiz to batch apply this to the appropriate EXIF/IPTC date fields? Or is there a more direct method?
  • AI for Date Estimation:
    • Is there any existing or potential workflow (perhaps involving scripting or external tools integrated with IMatch) where AI could assist in guessing a photo's date? For instance, if IMatch knows the people in the photo and their birthdates, could AI leverage this to estimate an age for the subjects and thus a likely year for the photo?
  • General Setup: Any other IMatch setup tips or considerations for a project of this scale to ensure success and avoid rework?
II. AI Autotagger (via Variables & External AI - e.g., Gemini Flash):
I've successfully used person variables to feed AI prompts for descriptions, which is amazing! However, I've encountered an issue:
  • Action Misattribution: The AI (Gemini Flash) often knows who is in the photo (from the variables) but arbitrarily assigns actions to the wrong person (e.g., says "Ben is on the swing" when it was Jim, even if Ted was also present).
  • Feeding Face Region Data to AI:
    • Is there a way within IMatch, or a known workflow, to provide the AI with face region data (i.e., bounding boxes/coordinates for each recognized person) alongside the names? This could potentially allow the AI to correctly associate actions with the specific individuals performing them.
    • Maybe I could export versions of photos with face annotations burned in, have the AI process those, and then somehow map the generated metadata back to the originals using IMatch's stacking or versioning features? Or is there a more integrated solution?
  • Alternative AI Prompting/Tools: Are there other IMatch features or prompting strategies I should explore to improve the accuracy of AI-generated descriptions, especially regarding who is doing what? Or any other ideas of how to use AI to make this whole process easier?
Thanks so much for any guidance you can offer.

Why not read through some of the treads here about face recognition, prompt engineering and our joint experiences from using the different AI-models Autotagger supports and not the least how to handle keywords before you start and the Help-texts about these topics. If you don't especially keywords will get unessessarily messy.

It is not a good idea just to let Autotagger lose on its own. You will probably need to do some testing with prompting. If you haven't confirmed the identity of people on the pictures the texts will get written unessessarily unpersonal for example.

If Location data is imporant for you it will be a good timesaver to use Google Map API too with Reverse geocoding. It will automatically fill most of the location tags.

Learn how to use the metadataform efficiently.

It is almost always a good strategy to organize all your picture folders under one single topfolder and point at that at the initial indexing.

It is also common to add general static data first and then let Autotagger and the reverse geocoding do the rest and then finish  with adding specific data and correct Autotagger if you feel it us needed.

Since  most of my pictures are from travelling I always use to add a "Mandatory" first line in Descriptions that might look like this in the Ad hoc prompt - opens with F7:

Mandatory text: Paris France 2012 - and after that Autotagger writes.

There is also a very important check box in that dialogbox that let you update a single picture or let Autotagger pushvthe same text on all selected pictures.

If you don't like a common text you get it might be an idea to let Autotagger create few and pick the best and then copy with Ctrl+C and paste to the others with Shift+Ctrl+V.