IMatch freezes when reading metadata

Started by thrinn, June 28, 2025, 10:30:23 PM

Previous topic - Next topic

thrinn

After installing the newest version, I suffer from frequent freezes where the UI is completely blocked. This happens in the "Reading Metadata" phase after a write-back, when IMatch re-reads some of the changed files. But I see it also when I try to explicitly propagate metadata (F4, P).

In the log file I see many ##BLOCK messages like below, often in combination with a "TranBegin wait loop" with a very long time (>130.000 ms):
06.28 22:05:15+ 1234 [71B4] 50  I> TranBegin wait loop: 600 134453ms.
06.28 22:05:15+    0 [71B4] 10  I> ##BLOCK: 29108 (71b4) 'Engine.UPDQDelegate.Meta' TransBegin 0 ms from bool __cdecl CIMRelationManager::Propagate(const class std::vector<unsigned int,class std::allocator<unsigned int> > &,bool,class CBitVector *,const class CBitVector *,bool) (3647) lev: 0

I already dialled down the Performance Profile to "Low", just in case. I use Windows 11 with Defender, IMatch process and executable declared as exception, whole folder containing the database likewise, even the folder containing the photos. We are talking about ARW and JPG, no large TIFFs or videos or such. 

I suspect that somehow the file relation rules interfere with the "Reading Metadata" step: Specifically, I shoot ARW+JPG, and I have a Buddy rule as well as a Version rule to link the JPG as buddy and version to the ARW master. Some Metadata is propagated, but this has never been a problem before. Included is some XMP metadata, some collections (Pins, Dots), and some categories. 

The attached log file (incomplete, as I had to close IMatch the hard way) is from a session where I tried to propagate metadata for about 350 ARW files. 

Is there any possibility to find out what exactly is blocking by what? At the moment, I have no idea where to look.

I saw one other bug report from today concerning slow import, but in my case, the initial metadata load when ingesting new files did not have any issues. Propagation and Write-back, however, caused the blocking.

You cannot view this attachment.
Thorsten
Win 10 / 64, IMatch 2018, IMA

thrinn

Another test: I switched back to the standard Performance Profile. Selected just 16 ARW files. Hid all panels with F9,Backspace. Pressed F4,P to propagate MD. Immediately, IMatch got stuck "Propagating".You cannot view this attachment.

Log file shows a lot of "Failed to get transaction or CS lock" messages:
06.28 22:37:32+   63 [6464] 02  I> CIMCollection::InnerCalculate: Failed to get transaction or CS lock for 'Callout'
After waiting about 15min, IMatch finished propagating. But 15min to propagate MD for 16 files from ARW to JPG? 
At least, this time I did not get any ##BLOCK messages.
You cannot view this attachment.

Thorsten
Win 10 / 64, IMatch 2018, IMA

dcb

Hi, David from the other thread about slow import. For what it's worth, I've also experienced slowdown when doing forced propogations using F4,P. I'll get some testing done today and a log through to Mario.
Have you backed up your photos today?

Mario

#3
I see nothing unusual in the two (?) log files in the post from 28-06-2025, 22:52:53, except fomr ##WAIT logs where a background metadata write thread had to wait < 2 s before it could acquire a transaction. That's an indication for a busy system, but not harmful.

In the log attached to the first post, something really bad happened. IMatch logs

06.28 21:47:26+    0 [797C] 10  M>  <  0 [134000ms #sl] PTMetabase::SetFileValues

Over two (!) minutes for writing the data of one file to the database. This is usually done in 30 to 100 ms.
The same thread has imported metadata from many files before, ExifTool extracts about 400 tags per file in 0.5 to 1.5 s. All pretty normal. This continues until this operation is logged:

06.28 21:45:12+    0 [797C] 10  M> >  0 PTMetabase::SetFileValues

After this operation is logged, we suddenly see 3 to 4 second "idle" times between each log entry. Only independent background threads log something every few seconds. All threads currently in the process of importing file values or updating the database are silent, stopped dead. Nothing is happening anymore. At this time, several threads running in parallel are trying to import metadata values extracted by ExifTool into the database. Which usually takes only ~20 ms.

Before this happens, I see that the time ExifTool needs to extract metadata from files goes up from ~400 ms to 1500 ms.
Also, the time for  PTMetabase::ImportFiles goes up from 20-200 ms to 1500ms, but then falls back to 16 ms again, shortly before the "freeze" happens.

This looks like all disk write operations to the database suddenly stall. Like in a deadlock, where parallel threads lock each other infinitely. The shared resource between the threads is the database. All have imported metadata and now want to write data to the database. Which took ~20ms before (max once 200ms) but now takes 2 minutes?

At
06.28 21:47:26+    0 [797C] 10  I> ##BLOCK: 31100 (797c) 'Engine.UPDQDelegate.Meta'

a thread trying to write metadata to the database bails out with a timeout trying to get a transaction. The other threads running in parallel doe the same, shifted by a few seconds. The typical transaction duration for setting metadata in the log file before was 20 ms, so no thread should run into any issue. Even if 5 threads come before the 6., they should complete in ~100-200 ms, which causes no issues at all for 6.

Instead all threads wait for the database/disk until they run into hard-coded timeouts and give up. None could get a transaction in the wait time.

The SetFileValues method sets the metadata for one file. It opens a transaction to ensure that either all or none database updates are performed, then processes and writes the metadata and closes the transaction again. The transaction is held by a RAII object that ensures that the transaction is released even if SetFileValues exits unexpectedly or raises an exception. The typical transaction time in the log before the deadlock was ~20ms.

SetFileValues is called every time metadata is imported or you change metadata in other ways (adding keywords, changing a rating, running a metadata template...). It is designed to be super-fast and reliable.

Still, at 06.28 21:45:13 all database writing stops and all background busy importing metadata are stalled until the bail out with a timeout 2 minutes later. All of them. Some threads are unlocked after 533 seconds (!)

This repeats over and over until the log file ends.
There are no write operations to the database which succeed, all end up running into timeouts.
Reading operations (folder scans etc.) work fine. So does writing to the log file. But no write operations to the database file work anymore.

The database and the images are on drive Z:; the log file is on drive C:

The only explanation I currently  have (based on experience) for such a sudden problem is a virus checker, blocking write access to the database file. Or maybe Z: crapped out? Or was locked by the AV?

Update

Tried to reproduce this. Dumped 50 ARW files into a new folder in a 100K files test database.
Database, log file and image files are on the same SSD.
12 cores with 24 threads HT, performance profile "Performance".

Used Lr to create JPG versions. Setup a ARW file relation that propagates collections, rating, label, title, headline, description, hierarchical keywords hand all XMP IPTC and IPTC Ext tags.

Refreshed relations. 50 masters and 50 versions detected.
Changed rating, label for the masters. Added headline, description and hierarchical keywords with AutoTagger.
Selected the 50 masters and forced propagation with <F4>,<P>

Total runtime of the propagation was less than a minute. No warnings, no locks, no problems.

thrinn

Thank you, Mario, for taking time to analyze this. I will try to track this behaviour down or at least find some influencing factors. Unfortunately I don't have much time today, so it might take some days until I can come back with more information.

Regarding hardware, I will double check, but both Z: (where the IMatch DB and my pictures are stored) and C: are SSDs. C: is a Kingston 1TB SSD, Z: is a Samsung 860 EVO 1 TB. Nothing fancy, but also not a bottleneck so far. I also had a look at the Windows Ressource Monitor during my tests. I could not see any SSD under heavy load.

I will first try to reproduce it with a test database. I don't want to make to many changes back and forth in my main DB.

Thorsten
Win 10 / 64, IMatch 2018, IMA

Mario

We need to learn more to understand this. I've repeated my 50 RAW propagation test on my notebook and in a fresh Windows 11 container. The notebook was faster (newer SSD) and the virtualized Windows machine took almost 2 minutes (write-back via network bridge). But all completed, no hang. 100K file test database with plenty of categories. Tried to make this as similar as possible to your case.

Mario

#6
Just in case, I've improved monitoring for the next release. In addition to monitoring transaction duration (and disk util. and mem util. and cpu util.) IMatch now also monitors the average time it takes to begin a transaction. To handle situations where threads are starved by having to wait to long for entering a transaction. The average transaction duration measurement should already takle care for that, but who knows...

Aaannd I've had a neat idea to improve the scheduling and make it scale smoother. This should help on systems with specific ratios of processor cores and disks. And probably everyone a bit.

All of these performance measures are used to determine if background processing system can add more threads (when the system is not too busy) or reduce the number of parallel threads to reduce system load.

Leaving resources idle is not good for performance, maxing out resources is also bad for performance. The balance is controlled via the performance profile. And this seems to work really well, for the vast majority of users. And I have tested this for months on all kinds of hardware.

Exceptions like the one you report or the other currently open case are to be expected. There is one billion Windows PC's out there, and each one is different.

In many of the reported cases, the virus checker was the culprit.
And once a cheapo SSD which became 10 times slower once the internal high-speed cache was filled and it had to move data to the slow storage chips it uses. Replacing the SSD with a known good Samsung SSD (which only costs 20$ more) fixed the issue. Now the SSD holds up the write speed.

What the issue is with your Z: drive (or IMatch) I don't know yet.

thrinn

I think I figured it out, more or less. And the culprit is not IMatch as such but one of my own Apps :o . In my tests, the freezes only appear if the App is active. When I close the App window, I can write back 40 ARW/JPG pairs without any blocking and as fast as expected. Some more testing is needed because I also modified my relation rules for the tests, but it is pretty clear to me that my App is somehow involved. So there is no need for you to spend more time on this "bug".

Just for information:
This App displays some information about the focused file, using calls to v1/files and v1/files/relations like this (focusedFileId.value is the ID of the focused file):
        return Promise.all([
            s.IMWS.get("v1/files", {
                id: focusedFileId.value,
                fields: "id,name,versionstate,namene,ext",
                varPersons: "{File.Persons.Label}",
                varPersonIds: "{File.Persons.OID}",
            }) as Promise<FilesResponse>,
            s.IMWS.get("v1/files/relations", {
                id: focusedFileId.value,
            }) as Promise<RelationsResponse>,
        ]);

The App reacts on IMatch messages if the focus changes. So my first guess was that flooding it with messages somehow caused locks. But the App is properly "debounced", and I added some logging to make sure that the App behaves as expected, only calling the endpoints if no new messages are received during 3000 ms.

Therefore, my best guess now is that one of the IMWS calls above, maybe v1/files/relations, tries to get the versions of the focused file, while at the same time IMatch tries to update them, thus blocking each other. Does this make sense?

At least, I have a working solution now: Disabling my App while performing actions like write-back or propagating metadata.

I am sorry that you had to waste valuable time on this "bug" - which is no bug of IMatch at all..
Thorsten
Win 10 / 64, IMatch 2018, IMA

Mario

I don't think this is the reason. There are many calls to HandlerGET_FilesRelations logged, and they all execute in 16ms. HandlerGET_Files is also called often, but takes < 5ms.

The last calls to these handlers in the log are way before Z: "stops responding" and all write access to Z: seems to stop dead.