Duplication of folders in database

Started by suttonbg, July 27, 2016, 11:22:27 AM

Previous topic - Next topic

suttonbg

Dear Mario,

I seem to be locked in a loop where rescans of my database result in the same set of folders being duplicated in the database. At first, I didn't recognise that I had a problem, so the origins are not logged, but the latest situation is.

The origin is as follows. Part of my image archive is shown in FE.png (in the attached zip file). The fully  qualified name of the top-level folder shown is D:\Pictures\Catalog in Windows File Explorer. I attempted to add the new folder "DeltaScanPolaroids", which had not previously been in the database, using drag and drop. The operation seemed successful and it was some time before I noticed that there were two entries in the database. The first was for "Catalog" as shown in File Explorer and the second was for another Catalog, with just one folder, a duplicate of DeltaScanPolaroids. Examination showed that these were not identical copies, as none of the iMatch attributes or categories in one instance were mirrored in the other. Rescanning the database did not improve the situation. At all times when I have run a database diagnosis (frequently during this), there have been no errors.

The next steps are unclear in memory as a certain sense of panic had set in. I decided to move the files in DeltaScanPolaroids to other folders on different drives and then remove this folder's instances from the database. Somewhere in there, I finished up with a second "geosetter test" as shown in "Two_geosetter.png". The File Explorer state remains unchanged through all of this.

At this point, I started logging and the logfile is attached.

I removed both the second geosetter test and Catalog folders from the database, leaving the situation shown in "After_Remove_Folder". Because Catalog now showed a need to rescan, I set that in motion manually. (I do have background processing switched on.) The results of the scan are shown in After_Rescan, which shows the original folder still apparently unscanned and a second complete copy added. It's not quite complete as any iMatch attributes are not attached to these "images".

At this point, I closed the database and thought I had better seek advice on how to get out of this mess.

Probably as a consequence of reconstructing a major folder of the  database, the debug log file is quite big (59MB), so the zipped file of images and log file is 2.5 MB. That seems to exceed the total attachement size allowed. Happy to send it through another path.

Looking forward to whatever assistance you can offer.

Bruce



sinus

Best wishes from Switzerland! :-)
Markus

Mario

Attachment missing.

I'm not sure that I can follow any detail of what you are saying. But whenever a folder shows up a second time in a database, the user has added it again, from another drive or location, instead of using the relocate command to tell IMatch that the original folder was moved.

IMatch tracks disks and folders by the unique disk serial number. This allows IMatch to uniquely identify each drive and to deal with drives that work with removable media (DVD, BR-D, Tape Libraries).

If you move your files to another disk, IMatch will mark the folder as off-line (yellow icon). All the info is still in the database, IMatch is just telling you that the folder is no longer physically accessible. To tell IMatch that you have moved the folder to another location or disk, you use the relocate command.

If you instead add the folder again to your database (by a rescan) IMatch will add all files again. For IMatch, these files are new because they come from another disk. And then you have your disks/folders twice in your database.

Solution: Revert to a database backup from before you started to do whatever you did. Open the database. IMatch will mark all drives / folders as off-line it cannot find anymore. If you have moved all files from an entire disk to another disk (even if the drive letter is the same!) use the Relocate command available int he right-click menu of the drive in the IMatch Media & Folders View. Relocate the old drive to the new drive. If the drive letter is the same (e.g., C:) relocate the drive to itself.  IMatch will pick up the new drive serial number and bring all folders back on-line.

If you have moved individual folders to another disk, another folder or renamed them, use the Relocate command available in the right-click menu of folders. It is sufficient to relocate on the topmost folders, because Relocation is recursive and relocates all child folders automatically.

See also the Relocate help topic in the IMatch help system (search for relocate in the help index) to find the info.

-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

suttonbg

Dear Mario and Markus,

Thanks for the reply. Yes, there is no attachment. As I noted, it slightly exceeds the allowable size so I would be happy to send it via another route, if available.

Bruce

Mario

Break it up into two attachments.
Or Upload it to your cloud space.

Or send to me by email (less than 10 MB). Include a link to this topic in your email.
No Word or other Office documents. Plain text files and screen shots in JPEG or PNG format.
Office documents are not allowed through my firewalls for security reasons.

Did you follow the advice I've given above?
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

suttonbg

Dear Mario,

1. I have sent the previous zipped attachment to "Support@...".

2. Re the Relocate command. After I upgraded to Windows 10 a few weeks ago, which involved a bit of fiddling with hard drives, iMatch displayed the yellow triangle alert that I needed to relocate folders and files on that drive, which I did and all worked well. In the current situation, no such icons have appeared and I have not physically changed the drives or relocated the files.

3. However, I have attempted to use relocate tonight as you suggested. Both "Catalog" level folders show exactly the same location, and it is correct. However, I tried the relocate command. For one folder, there was no effect. For the other, older version (I say that because this was the folder where previously added iMatch categories and attributes were present on the files), relocating it to itself worked, sort of. It has worked to the extent that I now have only one instance of the Catalog folder and its subfolders. It has not worked in the respect that each folder now contains a duplicate set of thumbnails and twice as many file counts as originally. The actual files as seen by File Explorer remain unique and untouched.

4. I have attached to this post the logfile for that operation.

5. Is there an automatic process for removing the duplicate entries in the database? It's only about 8000 images so I could do it manually as I edit them, but ....

Again, thanks for your help.

Bruce


Mario

#6
QuoteIt has not worked in the respect that each folder now contains a duplicate set of thumbnails and twice as many file counts as originally. The actual files as seen by File Explorer remain unique and untouched.

1. You had added that folder again to your database before you relocated.
2. You relocated the old folder to match the current folder / disk.

By applying these two steps you basically folded both folders together.  There is only so much IMatch can anticipate and do to prevent a user to shot himself in his foot - sorry.

The proper way to fix the problem:

1. Remove the folder you have accidentally added a second time to your database using the "Remove from Database" command.
(This is the folder without categories, Attributes and suchlike).

2. Relocate the original folder in your database to match the new location of your files.


Quote5. Is there an automatic process for removing the duplicate entries in the database? It's only about 8000 images so I could do it manually as I edit them, but ....

Not really. Why don't you just go back to your last daily backup of the database and fix it from there?
This is the preferred solution., Go back to the backup of your database from before you started with the operations that caused this mess. Then just relocate.

If you don't have daily backups of your database (you should!), you should be able to find the images you have added accidentally using the Recently Added collection. Does this work?

If not, If you don't do backups (BAD) you can use the filter panel to filter for files without categories or maybe attributes or whatever allows you to tell the duplicate images from the 'real' files.

I'm not sure what else you did to the database (you mentioned you moved folders outside IMatch for some kind of damage control) and this may have made problem worse. Using a suitable backup of the database and start fresh is the best solution. You can then just relocate the folders, tell IMatch where they are now on your new disk / system. Should take all but a minute.

Log files will not help much in this case. Because all I can see is how IMatch adds thousands of files to a database. Perfectly normal, no problems. It's just that you should have used relocate instead of adding the folder again. IMatch just did what you've asked for.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

suttonbg

Dear Mario,

I may have shot myself in the foot, but not in a way either of us would have imagined. I did a little more examination of the problem.

1. First, I followed your suggestion and went back to a P&G backup version that seemed stable, and relocated the offending folder to itself. This provided an apparently sound version of the database.

2. However, as I definitely had not relocated any folders, I was still puzzled as to the origin of the problem. I investigated it as follows.

2a. I reloaded a backup from about 3 weeks ago. Now, my Catalog folder is the main point of entry of images into my database, from scanning and downloading. Almost all editing is done through here and all file and folder moving, deleting, copying etc to final archive folders is managed by iMatch. So it's dynamic and as expected, loading an older version came up with obvious indications of some subfolders with yellow triangle icons, which genuinely had been relocated or removed, as well as some indicating a need for  rescanning. However, the main Catalog folder showed no evidence of having been relocated, ie, no yellow triangle. I tested this hypothesis by assigning Photoshop (Ps) as a favourite application and calling it  from within iMatch on selected files. As expected, selecting a thumbnail in a folder that indicated relocation produced no result, as obviously Ps could not find the file. Equally, a selecting a thumbnail from a folder  that appeared not to have been relocated resulted in that file opening in Ps and edits being made, which seemed to suggest that iMatch really did know where to find that file.

2b. There were some curiosities that slowly became apparent.
*While the edits were quickly visible when the file was examined in Bridge, no change appeared in the iMatch thumbnail, which was puzzling, as I mostly use the dng format so that a revised thumbnail is available for iMatch to use.
* I noticed that while I had been doing the editing, iMatch, in the background, had duplicated some of the folders on that drive, including Catalog, but also some others. Inspection showed that the new instances contained the changes that had occurred since the database had been saved. For instance, where some files had been marked as rejected, only the subfolders containing those changes were included in the new instance.
* Attempting a rescan of the original Catalog folder resulted in the new instance being completely populated with all folders and files actually present.

3. At this point, I began again by restoring this version of the database to its starting condition and allowing sufficient time for all folder duplication to occur. I store my images across three physical locations: two external RAID boxes (hardware-managed RAID 1) and an internal drive array on the machine running iMatch.
3a. I first examined the files on the external RAID drives by attempting to call and edit files both internally, opening Ps from within iMatch and externally via Bridge. In all cases, the files opened, edits were made and the saved edits were rapidly visible in both Bridge and iMatch. Vast relief at this point.
3b. Turning to files on the internal drives (called D:, where Catalog is one of the folders), I discovered that I could open for editing thumbnails from both the old and new instances of each folder. Edits could also be made externally through Bridge. However, the effect of the edits was visible only in the new instance of each folder. That is, it seemed as though iMatch could see the files, of which there was only one physical instance, through both old and new versions of the containing folder, but for some reason, it also felt that the folder was different and would only respond to changes through one of them. At no time, was there any of the usual indications that it thought a  relocation of the folder had taken place.

4. So, where did the pedal shooting occur? I now think it occurred when I upgraded to Windows 10. You might remember that some months ago I asked if anyone had experience with Storage Spaces in Windows 10. Well, in order to continue to use the slightly older but still very competent i7 cpu in the machine running iMatch, I decided to use Storage Spaces to manage the mirrored hard drives. The reason behind this decision is that the motherboard is just old enough for Intel to not offer a hardware RAID manager that will run under Windows 10. Storage Spaces seemd to be the viable alternative.

My research before I made this move suggested that, effectively, Storage Spaces virtualizes the disk storage, which allowed me, nay, encouraged me, to combine a mix of hard drives: SATA and SAS at the moment, with thoughts of including SSDs in the future. Storage Spaces also optimises where the data is physically stored in the drive pool, with more active files being optimized to the faster disks. I think this means that, in the background, Storage Spaces might be physically shifting files between drives, in a manner that is normally totally transparent to the user.

Mario, I wonder if this means that whatever data is used in iMatch to check file location is no longer fixed in the Storage Space environment, even though I, as user, am not commanding any file relocation? If so, it must be a funny signal, strong enough to trigger the automatic insertion of a new instance of changed folders but not triggering the usual yellow triangle.

I'm going to try a workaround. I managed to pick up a good QNAP box secondhand and will move the image contents of D: drive to that, in the expectation that it will provide the stable location environment iMatch expects. However, I suspect that Microsoft will slowly encourage us to use its new systems (Storage Spaces and then Resilient File System) more, so the problem, if I have guessed correctly, will not go away.

Aren't computers fun?

Best wishes,

Bruce

Mario

I have not worked with storage spaces yet. They are more designed for corporate environments. But I had no other feedback about any kind of problems so far.

But IMatch does not work around the Windows file system in any way. IMatch stores the media serial number reported by Windows for each drive and folder in your database. IMatch uses Windows functions to check if a file is accessible (FindFirstFile API function).

If the media serial number recorded for a disk/folder in your database does not match the media serial number reported by Windows when IMatch queries it, IMatch can only assume that a wrong media is in the drive. And then marks the drive / folder as off-line.

This worked well for more than 10 years, with normal disks, external disks, UNC chares on network servers, drives mounted by NAS boxes via SAMBA, the various disk management an virtualization technologies used by Windows over time, large storage arrays used by commercial customers etc.

Unless you give me some specific details about how using storage spaces changes the volume serial number Windows presents to IMatch, I don't see a need to change anything. If you think that Windows moves files around your back, moving them to other hard disks to IMatch cannot longer find them, I would see that as a problem in Windows, not in IMatch.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

suttonbg

Dear Mario,

Many thanks for your previous reply. I have done a little more digging since then and think I may have found the answer to my problems.

The problem does not appear to be with storage  spaces per se but with my changing the name of the storage space drive with my images from the default "storage space" to the more meaningful  "Data_Area".

I have checked that hypothesis by creating a new test database. Into it, I put a test folder that had been copied, with different names, to different drives. For the initial test, I chose a simple, un-mirrored, internal drive. Not a storage space drive. The test went as follows.

1. Create database and for selected files, note the File.Folder.Media.Label, File.Folder.Media.SerialNumber and File. LTID. Back up the database using Pack&Go.

2. Change the name of the drive. Open the  database.

At this point, the database appeared as before.  Inspection showed that neither the File.Folder.Media.Label  or the File.Folder.Media.SerialNumber had changed.

By calling Photoshop from within iMatch, I edited one of the files and saved the edited version to the same folder.

3. After a few minutes, the folder was duplicated by iMatch. The File.Folder.Media.SerialNumber was unchanged for all files in both duplicates, but the files in the new instance showed the changed drive label in their File.Folder.Media.Label value.

4. I restored the database again to its original status and, before opening it, edited a different file and saved this version to the folder. (The drive at this point has the changed label.) On opening, the database automatically duplicated the folder. Both instances again had the original File.Folder.Media.SerialNumber but different values for File.Folder.Media.Label.

5. I restored the database and changed the drive label back to its original value. The database opened as originally saved and did not duplicate the folder.

6. I repeated 5, but before opening the database, copied an additional file into the folder. The database opened as originally saved and, after a few minutes, included the new file in the original folder in the normal manner.

At no time did the database indicate the need to relocate the folder or drive.

However, the lesson I have drawn is that iMatch does pay attention to the value of File.Folder.Media.Label. The only way I have found to avoid folder duplication if the name of a drive is changed is to immediately, before any file changes occur, open iMatch and  relocate the drive to itself, which updates the value for  File.Folder.Media.Label.

Thanks again for your support.

Bruce

Mario

Don't change the media label or serial number. If you need to do this, use Relocate to tell IMatch where you have moved your files.
A quick check indicates that IMatch does not look at the label , but maybe there is a 3rd party library which does. Changing the name of a disk is a rare operation, and changing the serial number only happens (if at all) during a complete re-format.

In both cases, the Relocate command is the cure.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook