Number of files in one folder

Started by JohnZeman, November 06, 2017, 03:08:57 AM

Previous topic - Next topic

JohnZeman

Recently I have been making a lot of changes to the structure of my database and tonight it occurred to me that if I moved all of my files (all JPGs) into 1 single folder, that it would simplify my workflow significantly.  Right now I have just under 60,000 photos in my database.

I know technically I can move all 60,000 photos into one folder with IMatch, my question is from a performance standpoint would that be a smart thing to do?

I doubt if my database will ever exceed 100,000 total.

Edit: I forgot to add my system is Windows 10 with NTFS drives.


Mario

I would advice against such a thing.
Windows still has issues with folders containing more than a few thousand files. You may notice big lags in Windows Explorer, for example.
Also routines used by IMatch which work by the file system become slower the more files are in a folder.
IMatch would also load information about 60,000 files from the database whenever you click on that folder in your database. This can be slow, depending on your file window layout etc.

If you think this would simplify your workflow, at least add a second level of folders, maybe based on the file year or similar. A few thousand files per folder is OK.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

JohnZeman

Thanks guys, and Mario I'm not surprised by what you said but I wanted to ask.

The reason I even considered this is because I'm always wary of the possibility of my adding a reprocessed photo to my database but putting it in the wrong folder which would create a virtual duplicate of the original.  I call it a virtual duplicate because it wouldn't be a true binary duplicate since the original had been reprocessed then added to the database with the intention of replacing the original.

So I'm trying to come up with the best way to check for duplicate file names throughout the database.  I know there are ways to do this, it's just a matter of using the best way.

I'm hoping the new @MetadataTag will solve the problem for me, because if I have identical file name images they'll also have virtually identical metadata.

Mario

I'm not sure how this new formula will help with this.

If you have copies of your files on your disk, but these copies are not binary identical (and thus cannto be found with the Dupe checker in IMatch) you will need to use a filter, or the File Finder App or one of the Visual Query features.

If the metadata was written by IMatch (and not mangled by some other application), the files should have the same value for XMP::xmpMM\DocumentID\DocumentID\0.

If the file name is the same, the File Finder App may do the trick.

I have a "dupe finder" feature on my to do list, which allows more control over what a dupe is (file name, certain metadata etc:) ... but there is always so little time and probably not many users for that. But as an app it should be fairly quickly to write...  :D
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

JohnZeman

Thanks Mario.  The problem with using the File Finder app is I'd have to know which file name I'd want to search for duplicates.

What I'm looking for basically is a trigger that simply tells me there are one or more files in my database with duplicate file names.

I think I've come up with a non IMatch way to do this.  I'm not that proficient at creating apps at this point so I'm going to write a CMD script that scans my database folders and returns all file names that exist more than once in the database.

At that point it'll be easy for me to resolve the duplications in IMatch.

JohnZeman

Disregard my last post about using a CMD script because I've found a MUCH easier, better, and faster way to do what I want.

Directory Opus, the IMatch of the GUI file management world, has a tool that quickly scans one or more folders for duplicate file names.  I just tested it, it's fast, and it does exactly what I want. :)

Mario

Just to mention this: When you select all files in a folder (or multiple folders) and then you run the File Finder App - Similarity Match with a value of 0 (= identical file names) it finds all files in the database with identical file names as the one(s) you have selected.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

JohnZeman

Thanks for that too Mario.  For now I'm going to use the other method I mentioned since it finds all duplicate file names in my 60,000 file database in about 1 second.

Mario

DO is a specialized app and they probably maintain an in-memory index for this. Great job.

The 2017.11.2 version of the File Finder app takes 8 seconds on my system to find all duplicate file names for 50.500 files (700 matches).
I've tuned the distance=0 case for the next release. 8 seconds for 50K files is OK for a general purpose app. I can live with that.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

cthomas

Quote from: Mario on November 06, 2017, 05:08:46 PM

I have a "dupe finder" feature on my to do list, which allows more control over what a dupe is (file name, certain metadata etc:) ... but there is always so little time and probably not many users for that. But as an app it should be fairly quickly to write...  :D

Mario I would use it.  :)
Carl

Montana, USA
The Big Sky State

Mario

Quote from: cthomas on November 07, 2017, 04:57:45 PM
Quote from: Mario on November 06, 2017, 05:08:46 PM

I have a "dupe finder" feature on my to do list, which allows more control over what a dupe is (file name, certain metadata etc:) ... but there is always so little time and probably not many users for that. But as an app it should be fairly quickly to write...  :D

Mario I would use it.  :)

Feel free to post a request in the corresponding board.

Include information about how you would like the App to find duplicates, which file name elements, metadata tags or other attributes should be used to determine duplicate files etc.

Do you know the File Finder App? It may already do what you need.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

ubacher

I have had a duplicate file finder script since IM3. It reads in all files in the db, sorts the file names
and the goes through them to see if two adjacent names are identical. Very primitive but it works and is
reasonably fast.
The File Finder app with similarity=0  makes this script obsolete.

DigPeter

Silly question,  Why is there a problem with duplicate file names?  A duplicate can only reside in a different folder to the original.

sinus

Quote from: DigPeter on November 07, 2017, 08:24:00 PM
Silly question,  Why is there a problem with duplicate file names?  A duplicate can only reside in a different folder to the original.

If you have hundreds of folders and if you are not very strict, it is very easy done, that you do add/copy or edit photos with the same name several times and then store them in different folders.

Hence you have at the end several files with the same filename in different folders.
And it is not nice to know, that you have hundrets of duplicate files, what are not necessary.  ;D
Best wishes from Switzerland! :-)
Markus

ubacher

QuoteWhy is there a problem with duplicate file names?

If you have the name of a file (may be a version) and you need to find the original then
it is easy to search for the name. If you only have a few files of the same name and you know what
the image looks like, then it is easy to find. If you have very many identically named files or you don't know
what the image looks like then it gets difficult.

On the other hand if you have a clear naming convention for files and directories then you can find
the file without needing a search. But such a structure calls for unique file names.
A date based hierarchical structure is the most logical and (I think) most
commonly used structure for organizing files. Works neatly if you have camera based files. For files from scanners
or other documents it gets difficult to find a good structure for storage.