Author Topic: Number of files in one folder  (Read 2188 times)

JohnZeman

  • Global Moderator
  • *****
  • Posts: 1451
  • I'm too damn old to act my age.
Number of files in one folder
« on: November 06, 2017, 03:08:57 AM »
Recently I have been making a lot of changes to the structure of my database and tonight it occurred to me that if I moved all of my files (all JPGs) into 1 single folder, that it would simplify my workflow significantly.  Right now I have just under 60,000 photos in my database.

I know technically I can move all 60,000 photos into one folder with IMatch, my question is from a performance standpoint would that be a smart thing to do?

I doubt if my database will ever exceed 100,000 total.

Edit: I forgot to add my system is Windows 10 with NTFS drives.
« Last Edit: November 06, 2017, 03:34:24 AM by JohnZeman »

jch2103

  • Oldtimer
  • ****
  • Posts: 2243
Re: Number of files in one folder
« Reply #1 on: November 06, 2017, 04:48:27 AM »
John

Mario

  • IMatch Developer
  • Administrator
  • *****
  • Posts: 29760
Re: Number of files in one folder
« Reply #2 on: November 06, 2017, 09:04:09 AM »
I would advice against such a thing.
Windows still has issues with folders containing more than a few thousand files. You may notice big lags in Windows Explorer, for example.
Also routines used by IMatch which work by the file system become slower the more files are in a folder.
IMatch would also load information about 60,000 files from the database whenever you click on that folder in your database. This can be slow, depending on your file window layout etc.

If you think this would simplify your workflow, at least add a second level of folders, maybe based on the file year or similar. A few thousand files per folder is OK.

JohnZeman

  • Global Moderator
  • *****
  • Posts: 1451
  • I'm too damn old to act my age.
Re: Number of files in one folder
« Reply #3 on: November 06, 2017, 04:53:13 PM »
Thanks guys, and Mario I'm not surprised by what you said but I wanted to ask.

The reason I even considered this is because I'm always wary of the possibility of my adding a reprocessed photo to my database but putting it in the wrong folder which would create a virtual duplicate of the original.  I call it a virtual duplicate because it wouldn't be a true binary duplicate since the original had been reprocessed then added to the database with the intention of replacing the original.

So I'm trying to come up with the best way to check for duplicate file names throughout the database.  I know there are ways to do this, it's just a matter of using the best way.

I'm hoping the new @MetadataTag will solve the problem for me, because if I have identical file name images they'll also have virtually identical metadata.

Mario

  • IMatch Developer
  • Administrator
  • *****
  • Posts: 29760
Re: Number of files in one folder
« Reply #4 on: November 06, 2017, 05:08:46 PM »
I'm not sure how this new formula will help with this.

If you have copies of your files on your disk, but these copies are not binary identical (and thus cannto be found with the Dupe checker in IMatch) you will need to use a filter, or the File Finder App or one of the Visual Query features.

If the metadata was written by IMatch (and not mangled by some other application), the files should have the same value for XMP::xmpMM\DocumentID\DocumentID\0.

If the file name is the same, the File Finder App may do the trick.

I have a "dupe finder" feature on my to do list, which allows more control over what a dupe is (file name, certain metadata etc:) ... but there is always so little time and probably not many users for that. But as an app it should be fairly quickly to write...  :D

JohnZeman

  • Global Moderator
  • *****
  • Posts: 1451
  • I'm too damn old to act my age.
Re: Number of files in one folder
« Reply #5 on: November 06, 2017, 05:22:59 PM »
Thanks Mario.  The problem with using the File Finder app is I'd have to know which file name I'd want to search for duplicates.

What I'm looking for basically is a trigger that simply tells me there are one or more files in my database with duplicate file names.

I think I've come up with a non IMatch way to do this.  I'm not that proficient at creating apps at this point so I'm going to write a CMD script that scans my database folders and returns all file names that exist more than once in the database.

At that point it'll be easy for me to resolve the duplications in IMatch.

JohnZeman

  • Global Moderator
  • *****
  • Posts: 1451
  • I'm too damn old to act my age.
Re: Number of files in one folder
« Reply #6 on: November 06, 2017, 05:47:43 PM »
Disregard my last post about using a CMD script because I've found a MUCH easier, better, and faster way to do what I want.

Directory Opus, the IMatch of the GUI file management world, has a tool that quickly scans one or more folders for duplicate file names.  I just tested it, it's fast, and it does exactly what I want. :)

Mario

  • IMatch Developer
  • Administrator
  • *****
  • Posts: 29760
Re: Number of files in one folder
« Reply #7 on: November 06, 2017, 06:04:46 PM »
Just to mention this: When you select all files in a folder (or multiple folders) and then you run the File Finder App - Similarity Match with a value of 0 (= identical file names) it finds all files in the database with identical file names as the one(s) you have selected.

JohnZeman

  • Global Moderator
  • *****
  • Posts: 1451
  • I'm too damn old to act my age.
Re: Number of files in one folder
« Reply #8 on: November 06, 2017, 06:34:48 PM »
Thanks for that too Mario.  For now I'm going to use the other method I mentioned since it finds all duplicate file names in my 60,000 file database in about 1 second.

Mario

  • IMatch Developer
  • Administrator
  • *****
  • Posts: 29760
Re: Number of files in one folder
« Reply #9 on: November 06, 2017, 07:01:34 PM »
DO is a specialized app and they probably maintain an in-memory index for this. Great job.

The 2017.11.2 version of the File Finder app takes 8 seconds on my system to find all duplicate file names for 50.500 files (700 matches).
I've tuned the distance=0 case for the next release. 8 seconds for 50K files is OK for a general purpose app. I can live with that.

cthomas

  • Sr. Member
  • **
  • Posts: 472
Re: Number of files in one folder
« Reply #10 on: November 07, 2017, 04:57:45 PM »

I have a "dupe finder" feature on my to do list, which allows more control over what a dupe is (file name, certain metadata etc:) ... but there is always so little time and probably not many users for that. But as an app it should be fairly quickly to write...  :D

Mario I would use it.  :)
Carl

Montana, USA
The Big Sky State

Mario

  • IMatch Developer
  • Administrator
  • *****
  • Posts: 29760
Re: Number of files in one folder
« Reply #11 on: November 07, 2017, 05:05:04 PM »

I have a "dupe finder" feature on my to do list, which allows more control over what a dupe is (file name, certain metadata etc:) ... but there is always so little time and probably not many users for that. But as an app it should be fairly quickly to write...  :D

Mario I would use it.  :)

Feel free to post a request in the corresponding board.

Include information about how you would like the App to find duplicates, which file name elements, metadata tags or other attributes should be used to determine duplicate files etc.

Do you know the File Finder App? It may already do what you need.

ubacher

  • Oldtimer
  • ****
  • Posts: 2358
Re: Number of files in one folder
« Reply #12 on: November 07, 2017, 07:59:31 PM »
I have had a duplicate file finder script since IM3. It reads in all files in the db, sorts the file names
and the goes through them to see if two adjacent names are identical. Very primitive but it works and is
reasonably fast.
The File Finder app with similarity=0  makes this script obsolete.

DigPeter

  • Super Hero
  • ****
  • Posts: 1238
Re: Number of files in one folder
« Reply #13 on: November 07, 2017, 08:24:00 PM »
Silly question,  Why is there a problem with duplicate file names?  A duplicate can only reside in a different folder to the original.

sinus

  • Global Moderator
  • *****
  • Posts: 4464
  • IMatch-User since 2001 (IMatch 3.6)
Re: Number of files in one folder
« Reply #14 on: November 07, 2017, 09:00:47 PM »
Silly question,  Why is there a problem with duplicate file names?  A duplicate can only reside in a different folder to the original.

If you have hundreds of folders and if you are not very strict, it is very easy done, that you do add/copy or edit photos with the same name several times and then store them in different folders.

Hence you have at the end several files with the same filename in different folders.
And it is not nice to know, that you have hundrets of duplicate files, what are not necessary.  ;D
Best wishes from Switzerland! :-)
Markus

ubacher

  • Oldtimer
  • ****
  • Posts: 2358
Re: Number of files in one folder
« Reply #15 on: November 08, 2017, 05:39:35 PM »
Quote
Why is there a problem with duplicate file names?

If you have the name of a file (may be a version) and you need to find the original then
it is easy to search for the name. If you only have a few files of the same name and you know what
the image looks like, then it is easy to find. If you have very many identically named files or you don't know
what the image looks like then it gets difficult.

On the other hand if you have a clear naming convention for files and directories then you can find
the file without needing a search. But such a structure calls for unique file names.
A date based hierarchical structure is the most logical and (I think) most
commonly used structure for organizing files. Works neatly if you have camera based files. For files from scanners
or other documents it gets difficult to find a good structure for storage.