Author Topic: Categories to Hierarchical Keywords Script  (Read 9910 times)

Ferdinand

  • 100 years since I was shot and a war was started
  • Global Moderator
  • *****
  • Posts: 1670
Categories to Hierarchical Keywords Script
« on: July 11, 2013, 04:36:41 PM »
A common approach with IMatch 3.6 has been to store keywords in category hierarchies, including the so 5 Ws - Who, What, Where, When, Why.  Such as "Who.Family.Fred" and "Where.France.Paris".  We have used scripts to insert some parts or perhaps all parts of such hierarchies into IPTC Keywords.

I wrote this script to run in IMatch 3.6 to help us all migrate those categories that we want to treat as keywords in IMatch 5 to the correct metadata fields in advance.   I've just released the IMatch 3.6 version on the old scripting forum.  The script will also assist people who want their IMatch categories recognised in Adobe software like Lightroom.

 The version posted here is a direct port of the script to IMatch 5.  It provides one way to continue to maintain your keywords in categories, and still have them populated in your files, in ways that will be visible in @Keywords and the keywords panel.   There are a couple of things to note about this script:
  • The script triggers ExifTool externally to Imatch, and so will write to files - image files and / or XMP sidecars, even if immediate writeback is off;
  • For this reason, you will need to refresh the metadata in IMatch before you see the effect, which IMatch may do automatically or you may need to trigger manually depending on your preferences.
  • The option to create a text file for importing into the thesaurus is probably not necessary, as you can populate the thesaurus from existing keywords, but it may prove to be useful if you choose the exclude option.
Because this is a direct port, anyone interested can compare the two to see some of the changes necessary to migrate a script.  But if you do, note that the approach I've taken with settings is to store them in both the pts file and the registry, and to only read them from the registry if they don't exist in the pts file.  I.e.  the registry is used as a backup of the script settings in the pts file.




Please first TEST this script on test images and test databases.

The Location of ExifTool:  If you have iMatch V5 installed in Win7 then the default should work.  If you want to use a different copy of ExifTool, click the browse button and point to it.

Parent Categories for hierarchical keywords:  You click the "Add" button to select the parent categories for each category branch that contains keywords.  People using the 5-Ws would click on Who, What, Why etc.  You can check more than one at a time.  These categories will be written to the file as keywords.  The following options control how they are written.

Keyword Export Options:  Hopefully most of these should be obvious.  For V5 you should at least check the lr:hierarchicalSubject option, as iMatch V5 fills out the other fields from this one.  If you also check IPTC Keywords field 25 and XMP-dc:Subject it may make things faster in V5, but I'm not sure.

Write path elements as well as leaves to IPTC keywords field (25) and XMP dc:subject fields:  This does the same thing as "Preferences | Metadata | Keywords | Path Elements" in V5.  So for "Where.France.Paris" you will get Where and France and Paris as flat keywords, rather than just Paris, but you can control how many of these you get with the next option.

Treat the first {x} levels of path elements as 'Group' or 'Exclude' and exclude them from IPTC and XMP:  For flat keywords, this option leaves out the top {x} path elements or nodes if you select the previous option - "Write Path Elements... ".  So if x=1, it would leave out "Where" and just write "France" and "Paris" as flat keywords.  This option is good, for example, for people who use the 5-W categories, and want to see the W-words in @Keywords but do not want to export the W-words to flat keywords.  Using the next option you can also have these nodes left out of hierarchical keywords and marked as 'Group' in the thesaurus.

Mark these path elements as 'Group': This option does two things.  First it strips the first {x} levels from the hierarchy when writing hierarchical keywords to the image file.  So if x=1, it would leave out "Where" and just write "France|Paris" as a hierarchical keyword.  Second, it wraps [ ] around the first x levels of the hierarchy in the thesaurus, so that they will be treated as group levels by the thesaurus manager, which makes the thesaurus structure consistent with the keywords you exported to the file.  (It is possible to mark levels or nodes as 'Group' levels in V5 even after importing the thesaurus, even if you don't select this option, but that won't change the keywords in the image file.)

Mark these path elements as 'Exclude':  This is a milder and alternative version of 'Group' (i.e. you can't select both).  It still strips the first {x} levels from the hierarchy when writing flat keywords to the image file, but not from hierarchical keywords.  So if x=1, it would leave out "Where" and just write "France" and "Paris" as flat keywords, but still write 'Where|France|Paris' as a hierarchical one.  The thesaurus export file is not affected by this option, i.e. there are no [ ] around the excluded nodes.  When you import the thesaurus file into V5, you MUST manually edit the properties for the excluded nodes in the thesaurus manager, and turn on the 'Exclude' property before you use the thesaurus to write keywords to image files.  'Exclude' is the default option when you leave both 'Group' and 'Exclude' unchecked, and is only included for the sake of completeness.

For these (comma-separated) filetypes only write to XMP sidecars
For these (comma-separated) filetypes write IPTC to files and XMP to XMP sidecars
The default is for this script to write metadata to the image file.  These options allow you to vary this for certain file types,  The first one specifies that data will only be written to the XMP sidecar for listed file types.  The second one lists filetypes that will only have IPTC written to them and the XMP will be written to an XMP sidecar. If the image file is read-only, then data will only be written to the XMP sidecar as Imatch 5 itself does.

Category Thesaurus Output:  This option will produce a text file called "category_thesaurus_export.txt" in the nominated output folder.  If you don't specify a folder you won't get the file.  The way to use this file in V5 is to open the thesaurus manager in the keywords panel, click the import option from the toolbar, make sure that you're searching for .txt files and not .imths files (different format), browse to the output file and select it, then in the "import text file" dialog make sure that you import it into keywords (and not into another field), choose either merge or replace, and press ok.  To import the thesaurus before IMatch imports the new keywords, which is important for the Exclude case, I suggest you following these steps:

0. Turn off background automatic indexing of new and updated files
1.  Run the script in IMatch 5
2.  Import the Thesaurus file into IMatch 5, as per the instructions above
3.  Manually mark the Excluded nodes as Exclude (I can't automate this, at least not until Mario releases some more scripting code.)
4.  Import or refresh the files.
5.  Re-enable  background automatic indexing of new and updated files, if you want it enabled by default.

Notes:

(a) Because the script runs images through ExifTool in batches, to speed things up, it will appear to be unresponsive for periods of time.  To understand why, suppose that you're running for 89 images. The script will run quickly for the first 50 while the parameters to feed to ExifTool are assembled in an args file, then become unresponsive while the parameters in the args file for the first 50 files are processed by ExifTool, then it will run quickly again while it accumulates the parameters for the remaining 39, then become unresponsive again while it runs the next 39 through ExifTool, and then it will finish.  There's nothing that I can do about the unresponsiveness, not if I want to speed things up by processing them in batches.  When IMatch 5 is writing metadata to images it is able to run ExifTool in the background, but I can't do that in a script, not even in IMatch 5. 

[attachment deleted by admin]
« Last Edit: July 28, 2013, 02:25:02 PM by Ferdinand »

ubacher

  • Super Hero
  • ****
  • Posts: 1999
Re: Categories to Hierarchical Keywords Script
« Reply #1 on: September 16, 2013, 10:28:45 PM »
Would it not be easier to convert the (wanted) categories into keywords and let IM5 write it out to the file?
After all you would not want categories and keywords of the same name.
By writing the categories out directly they will get read in by IM5 afterwards and appear then as keywords.


Procedure as I see it:
Import your IM3 categories
Run a script which creates a keyword for each category (brute force i.e. all)
Then manually delete categories and keywords you don't want.

Alternative I just thought of and tried:
Assign all (keyword)categories to an imagefile in IM3 and write the keywords out to the file.
( I use All Purpose IPTC Writer)
Import the Imagefile into IM5 - all the Keywords should be read in and created.
Just tried this - did it in parts and ran into various weird problems but finally got my keywords defined in Im5.

Ferdinand

  • 100 years since I was shot and a war was started
  • Global Moderator
  • *****
  • Posts: 1670
Re: Categories to Hierarchical Keywords Script
« Reply #2 on: September 17, 2013, 08:22:41 AM »
I don't disagree with what you say.  You need to understand the history of this script.  I wrote the IMatch 3.6 version quite some time ago much earlier in the beta test. The idea was to allow people to clean up their categories and migrate them to keywords *before* migrating to IMatch 5.  It also had the side advantage that it could be used to make IMatch categories available to Lightroom as  hierarchical keywords.

I expect that many people will do as you suggest - to remove those regular categories that they are migrating to keywords after the migration is complete.  But some people may prefer to continue to use categories to manage this information and only write it out as needed.  Yes, you will get duplicates, but they may not see this as a problem. 

So I ported the script to IMatch 5 so that people could perform the migration there if they so wished, rather than in IMatch 3.6, and could use it to write out categories to keywords on demand if that's their preferred workflow.

Sure, you can use other scripts such as APIW to perform the migration.  The advantage of this script is that it gets both the flat and hierarchical keywords filled according to your preferences (path elements, group exclude), and the APIW mappings to do this can get quite complex.  Also, APIW won't write to XMP fields, only IPTC.  So at best you're going to fill out flat IPTC keywords.  What this script does is put the hierarchy in XMP:HierarchicalSubject, which is where IMatch 5 looks for and stores hierarchical keywords, and also puts in flat keywords (XMP & IPTC) those parts of the hierarchy that you want included there.

Of course if you only use flat keywords, then life is simpler and you may well find other migration approaches.

cytochrome

  • Hero Member
  • ***
  • Posts: 540
Re: Categories to Hierarchical Keywords Script
« Reply #3 on: September 17, 2013, 03:41:05 PM »
Ferdinand, I have not yet used your script, but will do so for some trials. For now I am happy with categories (normal, formula- or data-driven).

In the (very unlikely) case I would drop IMatch, will this be fit to copy complex category trees to sidecars (yes, for this I would bite the bullet and use sidecars 8) ), even data-driven cats (transformed to normal cats) that are real huge?

Is there no space limit to keyword content and depth of hierarchy?

Francis

Ferdinand

  • 100 years since I was shot and a war was started
  • Global Moderator
  • *****
  • Posts: 1670
Re: Categories to Hierarchical Keywords Script
« Reply #4 on: September 17, 2013, 05:42:45 PM »
In the (very unlikely) case I would drop IMatch, will this be fit to copy complex category trees to sidecars (yes, for this I would bite the bullet and use sidecars 8) ), even data-driven cats (transformed to normal cats) that are real huge?

A short answer is yes, I think.  Note the options:
For these (comma-separated) filetypes only write to XMP sidecars
For these (comma-separated) filetypes write IPTC to files and XMP to XMP sidecars
but I haven't really tested the "huge" case.

Is there no space limit to keyword content and depth of hierarchy?

To be honest, I don't know, but I don't know of one.  At least not in sidecars.  I think there is some kind of limit in image file headers.

p.s.  I know that you are not happy about the issue of where IMatch reads from and writes to and what it shows, but I still think that this reflects remaining bugs, which we are here to find, and migration issues.  I haven't seen a metadata issue so far which would concern me in production, i.e. after bugs are gone and migration is finished.  If you want to pursue this, perhaps it's best to do it elsewhere, rather than hijack this script thread.
« Last Edit: September 17, 2013, 05:47:16 PM by Ferdinand »

cytochrome

  • Hero Member
  • ***
  • Posts: 540
Re: Categories to Hierarchical Keywords Script
« Reply #5 on: September 17, 2013, 06:48:50 PM »
This has nothing to do with my likes or not, it is a question of workflow purely. I am a rather old guy, many of my pictures and cataloging are meant for friends, children, grand children, nephew and so on. Also people in the valley where I live now. They may or may not use IMatch, most probably not.

I write metadata to the image files. So this travels with the image, there are viewers, also free ones, capable of reading and displaying it.

The categories are IMatch only. The only way I see to transmit them is via keywords. Viewers with search capabilities could retrieve them. Not as Imatch can, but still, better than nothing. Hence my questions.

And thanks for your kind answer.

Francis

Ferdinand

  • 100 years since I was shot and a war was started
  • Global Moderator
  • *****
  • Posts: 1670
Re: Categories to Hierarchical Keywords Script
« Reply #6 on: September 18, 2013, 10:50:01 AM »
I understand and sympathise, particularly in relation to developed images such as JPG & TIFF.  Which is why I propagate.  But in relation to RAW files, anyone who uses them is most likely going to have to deal with sidecars of one kind or another, and so I don't see the harm in giving them XMP sidecars as well.

But, to get this thread back on-topic, this script can do both.

cytochrome

  • Hero Member
  • ***
  • Posts: 540
Re: Categories to Hierarchical Keywords Script
« Reply #7 on: September 18, 2013, 02:56:25 PM »
Ferdinand,

My question was simply whether the script (in fact the Tiff-EP file structure that Nef and most other raw formats uses) could handle big amounts of data. I know xmp sidecars can do it since Adobe writes a lot in his private parts (mon dieu, English is so ambiguous) but I wondered about Tiff based files.

I will test and we'll see.

Francis

NeilR

  • New Members
  • *
  • Posts: 24
Re: Categories to Hierarchical Keywords Script
« Reply #8 on: March 09, 2014, 07:26:47 PM »
I've been thinking about using this script to help keep a Lightroom database's keywords in sync with my iMatch categories.

I have considered something like this for some time, driving categories down into hierarchical keywords XMP.  The seemingly insurmountable problem I run into is how to keep the file/sidecar XMP in sync when category assignments are done randomly.  I can't depend on remembering every image I update (with new/changed categories) nor do I want to have to run this script after each and every category change.  I need some way to tag an image that says the keywords need to be exported.

This issue was brought up by someone else in the old forum, and there it was suggested that it may be easier to do in iMatch 5.  I don't see anything new that would help keep track of images with changed categories. 

I guess it would need something like a "CategoryAssignmentChanged" event in the database script?  If such an event existed then I would just create a boolean image attribute and flip it TRUE when a category is assigned, or assignments change.  Then periodically filter on that attribute and run this script.  I would mod the script to flip the attribute FALSE after writing the XMP.

Any suggestions?

Mario

  • IMatch Developer
  • Administrator
  • *****
  • Posts: 22281
Re: Categories to Hierarchical Keywords Script
« Reply #9 on: March 09, 2014, 08:09:20 PM »
Quote
I've been thinking about using this script to help keep a Lightroom database's keywords in sync with my iMatch categories.

Lightroom writes hierarchical keywords into the XMP record, IMatch picks these up and automatically mirrors them in the @Keywords category hierarchy. No need to do this manually or with a script???!!!

NeilR

  • New Members
  • *
  • Posts: 24
Re: Categories to Hierarchical Keywords Script
« Reply #10 on: March 09, 2014, 09:05:49 PM »
Mario, I need to go the other way:  iMatch---> Lightroom

Since you probably can't do that, the path of least resistance here is to use @keyword categories and let iMatch do it that way.  So I'm mulling that over.  I'm just getting started on the keywords vs category decision and still investigating the implications for me.

Ferdinand

  • 100 years since I was shot and a war was started
  • Global Moderator
  • *****
  • Posts: 1670
Re: Categories to Hierarchical Keywords Script
« Reply #11 on: March 09, 2014, 11:05:05 PM »
Mario is right (as (almost) always).  My script was written some time ago as a migration tool, not really as something to be used to replicate categories as @keywords, in a workflow similar to one we used in V3.6.  You can use it this way if you like, although metadata templates are a more elegant way to write categories to @keywords. 

However it's not clear what you gain by doing so.  One problem is that the keyword will appear twice in the category list - once as a regular category and once as an @keyword.  Moreover, you'll lose one of the big advantages of categories - which is to be able to reorganise the hierarchy easily, i.e. without writing to files.  You're still going to have to write to files after such a reorganisation, in which case you may as well stick with @keywords for those keywords / categories that you want to have available in LR, once you've done the migration to @keywords.

Mario

  • IMatch Developer
  • Administrator
  • *****
  • Posts: 22281
Re: Categories to Hierarchical Keywords Script
« Reply #12 on: March 10, 2014, 07:46:38 AM »
The @Keywords category has been created to combine XMP hierarchical keywords and IMatch categories. This is the way to go if you want to use categories like keywords and you want to exchange that data seamlessly with other XMP applications which handle hierarchical keywords. Use it. Don't work against it.

Use "regular" categories when you need data-driven categories, formulas, aliases etc. Or when you want to categorize your images by means which should not be visible to other applications as keywords. For all else, use @Keywords. It's automatic and easy.

If you have used categories in IMatch 3 and then somehow stored these in your files as as keywords or supplemental categories, import them once into IMatch 5 hierarchical keywords (see the options available for that purpose under Edit > Preferences > Metadata). From then on, use only the features and methods provided by IMatch 5 to work with hierarchical keywords. @Keywords will update automatically, you can use @Keywords and the Keyword panel to work with your keywords, and LR and other software will see your keywords automatically.

lanerellis

  • New Members
  • *
  • Posts: 30
Re: Categories to Hierarchical Keywords Script
« Reply #13 on: April 12, 2014, 05:07:19 AM »
Hi folks,

I'm moving from IMatch 3.6 to 5 and in 3.6 have used only categories, having never written then to IPTC in any way.

My hope has been that IMatch 5 would take my 3.6 categories and allow me to copy or somehow access them the the @Keyword system. Since so many IMatch 3.6 users have used only categories I've been confident that IMatch 5's database converter would pull over my categories, and that then IMatch 5 would let me choose whether I wish to use my imported 3.6 categories under the @Keyword system.

I'm running the IMatch 5 Database Converted right now and while it's converting my 90,000 files and 25,000 categories, I'm reading relevant forum threads (and help topics), and am hoping that I won't need to resort to running a special script such as Ferdinand's. I suppose I'll find out soon.

Surely 3.6 users who've used only categories, who convert to 5, must have a way conveniently built into 5 to take their existing categories and use them in the @Keywords system when wishing to start writing metadata to files?

I'm hoping that when the 3.6 to 5 database converter is complete I'll be able to activate my 25,000 categories under the @Keywords system. It would be quite a disappointment if I had to run a third-party script just to take my 3.6 categories and begin experimenting with using them under the @Keywords system to write metadata to my files -- the main reason I purchased IMatch a number of years ago. I've never written my 3.6 categories to file IPTC, and have anticipated doing it in IMatch 5. My fingers are crossed that all will be well and my concerns will be unfounded once my 3.6 database is converted ; it's been roughly an hour so far and I expect that it will take up to several hours to complete.

Cheers,
Lane

Mario

  • IMatch Developer
  • Administrator
  • *****
  • Posts: 22281
Re: Categories to Hierarchical Keywords Script
« Reply #14 on: April 12, 2014, 09:10:23 AM »
The special @Keywords category mirrors the keywords existing in your files. If your files don't have keywords, @Keywords will be empty.

The database converter has no feature to copy all your categories as keywords into your files. I'm not sure that many users would want such a thing, and considering the amount of time required to do this, read-only or off-line files not available at the time of conversion and other factors, it would be a really complicated and error-prone task to do.

If you really want to transfer your 25,000 files into XMP hierarchical keywords, a metadata template can most likely help with that. It depends on the circumstances, your category tree and other things if and how best to tackle this.

Richard

  • Guest
Re: Categories to Hierarchical Keywords Script
« Reply #15 on: April 12, 2014, 12:33:07 PM »
Quote
I've never written my 3.6 categories to file IPTC, and have anticipated doing it in IMatch 5.
Hi Lane,

I would not recommend being in a rush to do so at this time. You are accustomed to working with the regular categories but will need to learn before you can make the best use of @Keyword categories. It is good that you have done a lot of reading but nothing beats doing to decide what is best for you.
IMO, a mix is probably what you will want. Not all one way or the other.

DigPeter

  • Super Hero
  • ****
  • Posts: 1102
Re: Categories to Hierarchical Keywords Script
« Reply #16 on: April 12, 2014, 12:39:45 PM »
I'm hoping that when the 3.6 to 5 database converter is complete I'll be able to activate my 25,000 categories under the @Keywords system. It would be quite a disappointment if I had to run a third-party script just to take my 3.6 categories and begin experimenting with using them under the @Keywords system to write metadata to my files -- the main reason I purchased IMatch a number of years ago. I've never written my 3.6 categories to file IPTC, and have anticipated doing it in IMatch 5. My fingers are crossed that all will be well and my concerns will be unfounded once my 3.6 database is converted ; it's been roughly an hour so far and I expect that it will take up to several hours to complete.
Lane - This thread to which you have added your post, I believe has the answer.  I understand that you want to use @Keywords in IM5 and that you presumably would not not want these duplicated in regular categories.  Also that you only have categories in IM3 and that your files do not have any XMP keywords in metadata.  Ferdinand offers a script that in IM3 will: 1)  write your IM3 hierarchical categories to the XMP metadata in your files;  2) create a thesaurus which you can add to IM5.    This is what I have done.  It means that you would not wish to use Mario's excellent conversion facility, which transfers your IM3 categories to regular categories in IM5.  But before you do this, please read and carefully follow Ferdinand's instructions and decide how you want to use and display the keywords in IM5.  Your settings in metadata preferences in IM5 are important.  These and the Thesaurus should be set up in the new IM5 database, before ingesting the images.

Added after reading Richard's last Post   I agree with Richard.  Set up a trial database with a selection of your files and try different methods, before doing the whole thing.
« Last Edit: April 12, 2014, 12:44:22 PM by DigPeter »

Ferdinand

  • 100 years since I was shot and a war was started
  • Global Moderator
  • *****
  • Posts: 1670
Re: Categories to Hierarchical Keywords Script
« Reply #17 on: April 12, 2014, 01:51:47 PM »
Can I make a couple of general comments.

1.  It's worth taking some time to consider which categories you want to migrate to @keywords and which to leave as categories.  This is not a simple question.  Each has advantages and disadvantages.  It's not something you'd want to automate in the converter.  Everyone will want a different way of handling this.  I also agree with Richard that it's not something that I'd rush into.  I've been debating this for years.

2.  In IMatch 5 there's a lot less need for scripts for manipulating metadata.  You can do a lot with metadata templates.

That said, I still think that there are advantages in migrating categories to @keywords in advance of moving to IMatch 5.  One thing that motivated me to write the script was that if your keywords in the files are not perfectly aligned with your IMatch 5 metadata preferences, then there's a good chance that IMatch 5 will trigger a lot of metadata writeback immediately after you open the converted DB.  I.e., you need to have XMP:HierarchicalSubject, XMP:Subject and IPTC:Keywords populated in a way that is consistent with your XMP preferences.  If you're going to use Group or Exclude in the thesaurus you will also need this setup in advance to avoid triggering mass writebacks, which the script can help you with.

If you configure the script correctly, and if you configure the DB you use for the conversion correctly before you convert, then you can open the converted DB in IMatch 5 and not have any pending writebacks.  However I would say that this takes quite a bit of planning and testing to achieve. 

The alternative is do the migration in IMatch 5 with metadata templates.  I haven't tried this, but I think it's going to be hard to avoid multiple rounds of writebacks.  Perhaps it's possible to set up a template that writes flat and hierarchical at the same time.  A lot of planning and testing will still be required.


lanerellis

  • New Members
  • *
  • Posts: 30
Re: Categories to Hierarchical Keywords Script
« Reply #18 on: April 13, 2014, 01:04:02 AM »
Thanks Mario, Richard, DigPeter, and Ferdinand for your excellent advice.

I'm going to reply in my "Should I write my category data to images in iMatch 5 or iMatch 3?" thread in the General Discussion area ( https://www.photools.com/community/index.php?topic=1593.0 ) so as not to post too much not directly related to Ferdinand's script thread here.

Cheers,
Lane

cytochrome

  • Hero Member
  • ***
  • Posts: 540
Re: Categories to Hierarchical Keywords Script
« Reply #19 on: August 22, 2016, 03:50:48 PM »
Ferdinand, I tried to use the script to copy my WHO and WHAT categories to a series of JPGs. It works fine with either WHO or WHAT but when I add WHAT only one category is selectable and only WHO is copied as keywords to the files.

I miss something or it is because some IMatch internals have changed?

I can use a metadata template but it takes time because of the multiple writebacks that I don't know how to handle.

Francis

Ferdinand, sorry for this. In fact the script works, but it is slow and I was not patient enough :-* And the writeback problem is here too, although it seems to stem from an attempt to write Ratings (I don't use them).
« Last Edit: August 22, 2016, 04:18:39 PM by cytochrome »