Can you recommend a work-flow to split a database into two?

Started by Tallpics, March 20, 2015, 08:44:20 PM

Previous topic - Next topic

Tallpics

I currently have a database made up of approx 330,000 files with 5,000 categories in 200 folders.

Everything is running very well when you consider how much work IMatch has to do to keep all the information and displays running in the background. Well done Mario.

However I noticed a thread the other day from a guy who was working with over 500,000 pics ( https://www.photools.com/community/index.php?topic=4283.msg28762#msg28762 ) and Mario made a justified comment that IMatch wasn't really designed for such a large collection of pics. That number of pics doesn't fall into the 'normal' user category.

This got me wondering about my future with IMatch. I am a Pro photographer and, due to the amount of pics I shoot each year, I can envisage that I will approach 500,000 pics in a couple of years time.

So that is the background. Now the question :-)

Can anyone recommend to me how I can split my existing database into two parts... so avoiding the 'over-sized' problem (for a while).

As my work covers two distinct genres of photography and I already separate my pictures into separate folders. This makes me think that it would be fairly simple to make the split into two new databases. But I have a question regarding the off-line cache.

Possible work-flow could be:

1. Duplicate and re-name my existing database.
2. Remove one half of the folders from this database.
3. Open the original database and remove the other folders.
4. I could then remove 'empty' categories from each database.

However I guess that each ' database update' would also remove the cached files from the 'one original' cache folder as each database would still be 'associated' with it.... leaving it empty!

Would it be possible to duplicate and rename the original cache folder and somehow associate it with the new database before making any changes? That way the removal of folders would only affect the associated databases.

I am trying to avoid having to re-scan all the folders in the new database to rebuild the cache.

I have checked the detailed IMatch help files but (I hope I haven't missed something) I can't find any relevant info.

I hope this makes sense. I welcome any advice and comments.

Ferdinand

There's one additional step.  For the duplicated database, you need to change the database ID.  You do this in preferences on the database tab, at the bottom of the dialog.

The database ID is used to set the precise cache folder.  The top level cache folder is set under the cache tab in preferences, and I think the default is C:\ProgramData\photools.com\IMatch5\previewcache\ .  But under this folder there's a sub-folder for the cache for each database.  So on this computer there's three sub-folders with names like 6E0C407C-D09B-4C60-B443-C02F8CD790F4

So when you change the database ID for the duplicated database, as you must do or IMatch will get very confused over time and use the wrong cache file, you create a new empty cache sub-folder.  So your fears are well founded.

I don't know whether there's scope to copy the contents of the cache folder for the original database to the cache folder for the new database, and then have IMatch clean it out, which I imagine would be faster than building a new cache folder from scratch.  Mario will have to advise on whether this is possible.  You'd need a lot of spare space to be able to do this.

Mario

The step to generate a new unique database ID is very important. See also the help on that topic (Open Edit > Preferences > Database and then press <F1>).
The database is is not only used to separate the cache folders, but also to maintain per database settings.

There is no "clean cache folder" command that cache entries for files no longer in the database. This is usually automatic, because IMatch removes the cache files when it removes files from the database. I suggest this:

After duping your database and generating a new ID, close and re-open the new database. This will generate the cache folder under the root cache folder. You will see a folder structure like:

c:\
  -- imcache
    |-- 22D8588A-3D84-472F-8C53-34D68212E8D1
       |-- 01
       |-- 02
       |-- ...
    |-- F0BFA6B2-9844-46A3-A091-4A5594427B8F


Assuming that 22D8... is the original folder, select all sub-folders (01..) and copy them to F0BF...
This temporarily duplicates your cache.

Now open the original database in IMatch and remove the files you want to manage only in the second database. This will also remove the corresponding cache files from the 22D8 folders. Open the second database and remove the files you want to manage only in your first database. This removes also the cache files from the F0BF folder. You now have separate cache folders and each folder has only the files for the corresponding database.

Some notes about mass data and IMatch...

One of my local test users is a pro photographer who manages currently 350,000 files in IMatch 5. He has a fairly new computer and the database is kept on a SSD disk. He is more than satisfied by the performance he gets from IMatch.

I have a 160,000 files database with my own images taken over 20 years. It performs well. I first had it on a normal hard disk, now on a SSD I purchased for about 100€. My computer is almost 5 years old, but performance is excellent with that database. I use this computer also to tune IMatch performance - assuming that if it runs fast on an old PC it will run even faster on a modern computer  ;)

If you are a user who creates tons of formula-based categories, you use categories with (computing expensive) @RegEx formulas or you need to slice and dice your file collection using data-driven categories, you may run into performance issues much earlier. As you said in your original post, all the powerful features in IMatch come at a price, and when you utilize them all, even a modern computer may run into performance issues with databases managing 150,000 or 200,000 files.

There is no explicit upper limit for how many files you can manage in IMatch 5. But we have to keep sensible. If you look at the 'commercial' systems used by larger photo agencies (which are often custom-built) news agencies or corporations, you'll often see dedicated server farms, dedicated database servers, massive storage arrays and probably even dedicated support staff. These systems can manage millions of images. But you won't get them for as little as 110US$.

Unfortunately the user with the 500,000 files database has not yet provided a log file. I wonder if this is really an IMatch problem or if we can find something to change on his box to make it work. But without a log file and detailed information about how much memory IMatch uses, which operations are slow, I cannot provide any help to him.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

sinus

Quote from: Tallpics on March 20, 2015, 08:44:20 PM
I currently have a database made up of approx 330,000 files with 5,000 categories in 200 folders.

Everything is running very well when you consider how much work IMatch has to do to keep all the information and displays running in the background. Well done Mario.

However I noticed a thread the other day from a guy who was working with over 500,000 pics ( https://www.photools.com/community/index.php?topic=4283.msg28762#msg28762 ) and Mario made a justified comment that IMatch wasn't really designed for such a large collection of pics. That number of pics doesn't fall into the 'normal' user category.

This got me wondering about my future with IMatch. I am a Pro photographer and, due to the amount of pics I shoot each year, I can envisage that I will approach 500,000 pics in a couple of years time.

Thanks for your question.
Because I thought the same like you, when I had read the answer from Mario.

But here, below he gives us more details about the amount of images, managend by IMatch and that makes me more relaxing ;)

I have at the moment  about 180'000 files (nef, jpg, pdf, docs, mp3 and so on) and specialy with the newest IMatch-version I have no problems, IMatch is still quick and stable.

And I guess, growing does not only your files to manage within IMatch,  the hardware generally does also grow in speed.
So I think, you must not be afraid.

And if you one day wants split your DB, you will be able to do so. If you will have then some troubles, we all will still be here  :) ... or most of us!  ;D
Best wishes from Switzerland! :-)
Markus

Tallpics

Thank you Mario, Ferdinand and Sinus for your replies.

Mario, can I just make it clear that IMatch5 is currently running VERY smoothly and fast for me with 330,000 files in multiple folders and 5,000 Categories. You have developed a great update with IMatch5. I have been a very long-term user of earlier versions but they now look really dated..... proof that you have really got things right up-to-date :-)

My only concern was that (in this digital age) the number of pics I take as a Pro Sport and Gig photographer amount to tens of thousands a year. This is unlikely to slow down for me meaning that in the future some limit might be hit in IMatch.

With the information you have now supplied I intend to make the split into two databases as suggested.

This is a small compromise for me to make as I rarely need to search for a pic that would be in another database and maybe there might even be a small gain in overall speed! It will also re-align me nearer to a 'normal' user database size.

For your information I already store the database on a SSD and have just ordered a new 1GB SSD to hold the Cache folders... things should continue to fly :-)

Please keep up the great work, Andrew

Mario

Storing the IMatch Database on an SSD is a real performance booster. The SSD should also contain the system TEMP folder, because many applications create, write, read and delete tons of temporary files.

The cache folder must not necessarily reside on a SSD. Loading the image from disk is a real fast process, even on an normal disk. What takes most of the "load time" for a cache image is resizing it to fit whatever size is needed and performing color management. All that is performed in memory, after the image has already been loaded from disk. Of course a SSD makes everything faster, including writing and reading the cache files. And seek times are near zero, which helps with everything.

330,000 files could already be considered a larger database. I always wanted, anonymously, ask some of the big DAM vendors what it will take (hardware and money) to manage 300,000 files with their product. Maybe I'll do that some day        
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

sinus

I am in the progress to split my DB into two, like suggested above.

Meanwhile I can easy work with two DBs, because in one are then older images, what I do use seldom, and some music-stuff and so on.

And in the second (the acutal) DB, I will have the newest images, going back say, 10 years.

I think, both DBs should then be quicker and smaller and so on.

I did simply copy the DB, renamed one, opend it and changed the ID.
Then I setted the cache-folder the the correct folder, like

C:\ProgramData\photools.com\IMatch5\previewcache\8581C204-B4E1-4A47-B1E3-BF3E1214DB87


(changing the DB changed this in both DBs into the same path, although there are two pathes, the setted path for both was
C:\ProgramData\photools.com\IMatch5\previewcache\
so I pointed them to the correct path, hope this is ok)


update: so it seems to me, that the "simple" path is correct:

C:\ProgramData\photools.com\IMatch5\previewcache\

otherwise IMatch does create a second folder inside the cache-folder.



Now I am in the process to "remove folder from the DB".
I think and hope, I do not have some problems.
Best wishes from Switzerland! :-)
Markus

Tallpics

Good luck Sinus with the database split.

It would be great if you can report back with an update of how things worked and any problems.

Thanks :-)

cthomas

Quote from: Mario on March 22, 2015, 09:08:39 AM
Storing the IMatch Database on an SSD is a real performance booster. The SSD should also contain the system TEMP folder, because many applications create, write, read and delete tons of temporary files.

The cache folder must not necessarily reside on a SSD. Loading the image from disk is a real fast process, even on an normal disk. What takes most of the "load time" for a cache image is resizing it to fit whatever size is needed and performing color management. All that is performed in memory, after the image has already been loaded from disk. Of course a SSD makes everything faster, including writing and reading the cache files. And seek times are near zero, which helps with everything.

330,000 files could already be considered a larger database. I always wanted, anonymously, ask some of the big DAM vendors what it will take (hardware and money) to manage 300,000 files with their product. Maybe I'll do that some day        

Quote from: Mario on March 22, 2015, 09:08:39 AMThe SSD should also contain the system TEMP folder
Mario  my system is Windows 8.1 64 bit and I have 32 gigabit of memory. So how would I move my system TEMP folder from drive C: to my SSD drive?
Carl

Montana, USA
The Big Sky State

Mario

http://answers.microsoft.com/en-us/windows/forum/windows_7-files/change-location-of-temp-files-folder-to-another/19f13330-dde1-404c-aa27-a76c0b450818?auth=1

Works the same on W8.
Of course your computer would most benefit from the SSD if it would be the system drive (where WIndows is instaleld, all your apps, and the TEMP folder). I have changed all my PC's over time to use a SSD as the system drive, and the effect is usually dramatic. Windows boots in 10 seconds instead of a minute, for a start  ;)
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Carlo Didier

Quote from: Mario on August 23, 2015, 08:01:42 AMI have changed all my PC's over time to use a SSD as the system drive, and the effect is usually dramatic.
+1