v1/search/filename with scope 'files' takes longer when many files in database

Started by ubacher, April 28, 2019, 05:39:20 PM

Previous topic - Next topic

ubacher

I have a script which searches a set of files looking for a particular name pattern.
IMWS.get('v1/search/filename', {
                    scope: 'files',
                    pattern: '^20[0-9]{2}\-[0-9]{2}\-[0-9]{2}.{1}[0-9]{3}\-Pano.* OR ^20[0-9]{2}\-[0-9]{2}\-[0-9]{2}.{1}[0-9]{3}_stitch.* OR ^20[0-9]{2}\-[0-9]{2}\-[0-9]{2}.{1}[0-9]{3}\-\-.*', 
                    advancedmode: true,
                    id: ids,
                    fields: 'id,filename,name'
                })

When I run this in a database with only thousands of files it executes quickly (on my laptop). When I run it in my main database (holding 200,000+ files) it takes tens of seconds to perform this search!
This makes me suspect that the scope is not being restricted to the specified files only ( a few hundred usually).

Very puzzling is that the task manager shows very little CPU activity and virtually no disk activity while this is running. Execution is thus not CPU or I/O bound.
There is plenty of memory!


Mario

1. You are using a very long and complex regular expression. An advanced mode. With Boolean operators on to.
This means that IMatch has to match the expression 3 times to each of the 200,000 files to produce the result.
Doing this for 200,000 files will take a lot longer than for 1000 files. Obviously.

2. How many files are in the scope? What do you supply in ids?

I've had a look at the code and IMatch iterates over all files in the scope (ids parameter) and applies the search term to each file.
So this should be linear, depending mostly on the number of files in ids.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

ubacher

I did a lot more tests and it turns out that the search is not the slow part - it is actually lightening fast.
The problem seems to be locks
I include a debug log.
I start reading in 49 files in one directory and 9 in a sub-directory (two calls to IMWS.get('v1/files')

Im seems to be updating the collections and runs into problems there. Now the following: (example)
04.30 11:45:36+ 1172 [1965C] 02  I>           CIMCollection::InnerCalculate: Failed to get transaction or CS lock for 'Rw2 to Jpg'

RW2 to Jpg is a file relation which is defined but not enabled! All disabled relations are involved.

Next there is (example)
04.30 11:45:46+ 1172 [1965C] 02  I>           CIMCollection::InnerCalculate: Failed to get transaction or CS lock for 'Emma Bruckner'

Now Emma Bruckner has once been defined in a file annotation but the keyword has since been removed!

I am writing into the log from my script:
between
A>         ====== Fix Pano starting
and
A>         ====== Fix Pano got files, now searching :  57
happen the lock problems. When
A>         ====== Fix Pano, found/fixed:  9
is reached the script has finished.



Mario

I'm not sure what you are trying to explain to me.

Collections are updated for many reasons, from adding to removing files, accessing files etc.
This is a secondary task that may fail when other parts of IMatch are busy and the database cannot be locked temporarily. IMatch will just retry a bit later.
However, messages of this kind should only appear sparingly, unless the database is super-busy or the disk is too slow to keep up.

Your database has 330,000 files. Which is a lot. Keep that always in mind. Not even IMatch can do magic.
What takes 1 second for a 100,000 files database will take 3.3 seconds (or longer) for your database.

Everything, from re-calculating data-driven categories to updating collections will take some time. There is no free lunch and doing something that takes 1s for 100,000 files will take 3.3 seconds for your database.
If too much is going on, the database will be unable to deliver the data fast enough and some secondary tasks will run into timeouts. This is expected and handled for things like collection updates or data-driven category updates. And collections and categories are updated only on-demand (when the UI needs the data to display something).

Getting a bunch of files or doing a search does not trigger a collection update. If the search index is not yet built, doing the first search can make the database busy for a few hundred ms, maybe even a second or two for databases with 100,000 or 200,000 files.

But you are searching only for file names, and this is done using data already in-memory. No database access needed at all.

I don't know what else your app does or how you have configured the IMatch UI. But IMatch will be running when your app runs, and required screen updates may trigger data-driven category and collection updates. Usually this can run in the background, unless the UI needs the data right-now ...

This is where I would  start searching. Close some of your panels, especially very expensive panels like Categories. If you are in the Category / Collection view, switch to the Media & Folder view.

The log file only reports a few slow operations, like loading the database (normal for 330K files)
There is also a 'slow' log for the background task which updates the Collection tree (Collection View, Collection Tree in the Filter panel). It takes almost 5s to run. I guess this is pretty normal for a full recalc for 330,000 files.

I see one request for the "search filename" endpoint. It takes only 31 milliseconds (0.031 seconds!) to complete. This is pretty dam fast. This cannot be the reason for your app not performing well.

-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

ubacher

Your mentioning that collections updates only get triggered made me look at my code very carefully.
And I found the culprit this way. My call to get files from a folder was like this:
IMWS.get('v1/folders/files', {
                    path: otherFolder
                   })


After I specified the fields there was no longer a delay.
Quotefields: 'id,filename,name,namene,ext,size'

So I assume not having specified the fields caused all fields to be returned and that included info on collections.
This triggered the updates which caused the delay.

Problem solved! Thanks!

But you did not comment on the strange update attempts on non-enabled file relations. Those may not cause any problems but are they cosher?

The collection of Names associated with annotations are probably OK, the names do exist with the annotation even if there is no
keyword associated with it. But can one access such a collection or is it used only internally?

Mario

If you request no fields, all fields are returned.

QuoteNow Emma Bruckner has once been defined in a file annotation but the keyword has since been removed!
Have you set @Keywords to remove keywords no longer in use? Else @Keywords retains them, to allow you to use them again.

Even if the relation is not enabled, it still exists and the corresponding collection as well. Remove relations you don't need anymore.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

ubacher

I have removed all keywords.
But when I look at the face annotation (of Emma Bruckner in this case) the name is still shown.
Is there then still a way to search for the name?
( I don't use face annotations at the moment, the existing ones are from some test I did years ago.
So this is not a concern of mine although it might show an unexplored issue.)

Mario

QuoteBut when I look at the face annotation (of Emma Bruckner in this case) the name is still shown.

Of course. The tags you assign to annotations are unrelated to keywords.
You can set IMatch to copy the tags you assign also into keywords. But removing the keyword from the file or even deleting the keyword in @Keywords has no impact on face annotations.

To find all files with a specific face annotation tag, click on the corresponding collection in the Collection View.
Or use a collection filter.
Or use a formula-based category for the collection.
Or use a data-driven category based on the variable representing the collection
...
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

ubacher

Learned something! Thanks.

I already thought of a way to use this as an alternate method to mark files which need fixing: Make an annotation to show the spot
needing attention - then display the collection of all annotated files.

Mario

Quote from: ubacher on May 02, 2019, 07:51:55 AM
Learned something! Thanks.

I already thought of a way to use this as an alternate method to mark files which need fixing: Make an annotation to show the spot
needing attention - then display the collection of all annotated files.

This is how many users use annotations. Face annotations are just a small part of what the annotation system in IMatch is capable of. Its a powerful and generic way to add graphical and textual annotations to any file, in a non-destructive way. Usually you will find features like this only in high-priced corporate-grade DAM systems.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook