Location Data Hierarchical Keywords

Started by jmhdassen, June 24, 2020, 08:04:40 AM

Previous topic - Next topic

jmhdassen

I am in the process of migrating some 30,000 legacy images into IMatch. These images already have metadata, but not very consistent. I built my self a panel app where I can validate image metadata against what is in Thesaurus and that helps a lot.
However location data is still problematic. For me location data as 'country|state|city|location' is the most important metadata [This four level location tuple does not work in most countries, but that is unfortunately how the standards were defined].

For every location 4-tuple I want to have a keyword and that is where I run into trouble.
Sometimes City (or State) is missing. For instance if on a boat offshore, or along a road in the countryside. My 4-tuple then becomes for example: 'country|state||Along Road 123'. This is also what I want my hierarchical keyword to be like.
In previous systems I could use '~' as an invisible character in the hierarchical keyword like this: 'country|state|~|Along Road 123'. This would translate to 'country|state||Along Road 123', keeping the levels in place.
But if I try that in IMatch the keyword becomes: 'country|state|Along Road 123', which for me is incorrect. It suggests a City called 'Along Road 123'.
I tried to put '_' or '.' for location. That works, but looks very ugly.

Any way to get an invisible level in a hierarchy, without shifting the leaf? Maybe I am missing one of the many settings?

Thanks JD

 

Mario

QuoteIn previous systems I could use '~' as an invisible character

I don't know what your previous system was or why it treated ~ as an invisible character, or for which purpose.
IMatch does not do that.
There are no 'invisible' keyword levels, except group levels in the thesaurus. I don't recall that a invisible keyword feature was even requested, ever. It makes not much sense to me.

Hierarchical keywords are modeled as graph or a tree, and when you leave out one of the leaves/edges, the keyword ends up in a totally different part of the tree.

Location|beach|Daytona is a different keyword than Location|Daytona or beach|Daytona.
There can be no empty level, because there can be no empty keyword in any of the metadata standards.
Leaving out levels will also break the re-mapping of flat keywords to hierarchical keywords during import or re-import of files. Not good.

By leaving out one or more levels, you basically mess up your entire keyword hierarchy. Somehow 'suppressing/hiding' the information that a keyword is missing is not good, either.

You can always use a special "unk" "unspecified" or "nn" or even "~" ² in your keyword hierarchy to denote a missing level: Country|State|unspecified|road name.
This produces a well-formed keyword with all levels and also carries the information that one level of this keyword is unspecified or unknown.
This is also the established standard for dealing with missing information in many standard workflows.

In the @Keywords Category hierarchy, which is dynamically created from the actual keywords in your files, all files with missing information in your hierarchical location keyword path will show up in the "unspecified" nodes and are thus easily to identify. And you can of course also filter or search easily for files with missing location data.

That all said: Storing location data in keywords is usually mostly a side-effect of adding GPS-based location data to your files.
From your description I have the impression you add these keywords by hand?

Note that you can configure IMatch (see Geo & Maps) to automatically produce hierarchical keywords from the standard GPS/XMP location data it adds to your files.

Do your file have GPS coordinates?
Do you use the IMatch Map Panel to add GPS data to your files?
Do you use the integrated Reverse Geocoding in IMatch to automatically set location data from the GPS coordinates in your files?


Or, if your files already have location data (county, city, state, location, lat/lon) set by another application, you can use a Metadata Template in IMatch to automatically produce keywords from this information.

If you always shoot at the same location, creating a number of Locations with coordinates and country,state,city,location will improve your workflow and also may help to avoid missing segments in your keywords.


²
You could even use "~", but note that this character has a special meaning in many applications (in IMatch e.t. for variables, in regular expressions etc.) and you should not use this for keywords at all.
For interoperability reasons, keywords should be restricted to plain words, without any special characters. Sometimes even regular characters from a non ASCII code page make problems, e.g. when you process files on platforms other than Windows (Linux). Not all software out there can deal with UTF-8-encoded keywords, and there are many issues, especially if you still need to use legacy IPTC metadata for exchange.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

jmhdassen

Mario,

When running into another problem I discovered the Grouping flag in the hierarchical keywords. This caused the shift of the leaf.
So now I get 'country|state|~|location' as the keyword which is already a step closer and usable for what I need.

However I would still like to see (for locations at least) hierarchical location keywords such as country|state||location, allowing for a NIL value. NIL values happen all the time in programming. Using names like 'undefined' is just ugly.

My camera does not have GPS. I have used GPX files before but although they give me a point on the map, often they do not give location names. All this stuff seems to be designed for well documented places in Europe and maybe US. But I have pictures of 35 countries now, many in Africa and Asia. There that stuff is not helpful (same for GeoNames).

Anyway, for the moment I am not stuck and can work with what there is.

Thanks, Jozef Dassen

Mario

Keywords are plain text. Heck, many popular applications still cannot handle hierarchical keywords or keywords containing spaces...
There can be no NIL/NULL or || segment in the keyword hierarchy. This has nothing to do with programming.
Such a keyword just does not exist, cannot be stored or become part of the keyword hierarchy. None of the relevant standards has a notion for keywords that "do not exist". And I pretty much doubt we really need such a thing.

Location data should go into the official location metadata tags in XMP. That's why they exist.
Adding them as keywords is redundant, but may aid some simpler applications or workflows.
Missing levels in a keyword hierarchy should be the exception, not something that happens regularly. And then using "unk" or "notset" or "unknown" or "?" to make it clear that this part of the keyword is, well, unknown, is the best way to keep well-formed keywords, your hierarchy intact and for clearly communicating what's what across applications and platforms.

Strange proprietary features like treating a ~ as a sort of invisible keyword and hiding it in the UI (?) but storing it in the file (?) are never a good thing when it comes t metadata - which should be standardized and compatible as much as possible.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

JohnZeman

Perhaps a non-keywords solution might work?

I use a data-driven location category that accepts metadata null values, here is a simple example of the way it works.

Country: USA
State: Arizona
City:
Location: City Park

Displays in a tree format as
Country: USA
State: Arizona
City: Null Value
Location: City Park

As a bonus I also have a formula category called Incomplete Locations that quickly isolates any files with missing location metadata.

jch2103

I also use a non-keywords approach, with a data-driven category for locations.

Some years back when I was still using GeoSetter (IMatch didn't then have its useful map tools), I had GeoSetter generate keywords (not hierarchical ones, though). After thinking about this for a while and realizing that I hadn't been creating location keywords for more recent images, I've gone back and eliminated location keywords that were just cluttering up my keywords. (I also deleted the IMatch Standard Category for Location because it had only three levels, and set up my own four-level location category with Country|State|City|Location.) I now have a much simpler keywords structure and still have a very useful locations Category hierarchy. Another win for IMatch!
John

jmhdassen


Thanks for the various answers.
I have not looked Categories a lot yet. I will explore that as an alternative.

Having said that, as an old computer scientist, I think that 'country|state||location' is a perfectly well-formed expression. The only question is how it is represented in the hierarchy. NIL is a value in most computer languages. It just means 'empty'. It is true that Javascript handles this very badly with it 'undefined'.

Cheers, JD

Tveloso

#7
I read with interest the discussions in this topic on whether or not to also have Location info as Keywords. 

I myself have gone back and forth on whether or not to do that.  I was initially of the mind that Location Data should be stored only in the Metadata fields intended for that (i.e. in just one place), and that as Mario said:

Quote from: Mario on June 24, 2020, 10:53:03 AM
Location data should go into the official location metadata tags in XMP. That's why they exist.
Adding them as keywords is redundant, but may aid some simpler applications or workflows...

...but I later decided to also have the Keywords.

Perhaps in part because IMatch makes it so easy to do that (via the Create keywords from this expression parameter in Preferenced->Geo & Maps, and/or the Keywords in the Edit Locations Dialog, and/or via a Metadata Template).  But also because for some locations, I wanted to continue on with a Sublocation in the Keyword (such as a articular room, in the case of Photos taken at home).  So IMatch would automatically set the location data during indexing, and also automatically add the keyword:

        country|state|city|location

...and I would come along and change the keyword to:

        country|state|city|location|sub-location

...(where sub-location is say, kitchen).

But after having read the discussion here, I think I'm back to wanting to do away with the "Location Keywords", and to use a Data-Driven Category to give me essentially the same thing that branch of @keywords currently does (and keep only those "Sub-Locations" as Keywords).  The only question is where I'll put those Sub-Locations within my Thesaurus (and ultimately what those Keywords will be...perhaps part of indoor/outdoor?)

But back to the subject of the OP's question...I could have sworn that what was being asked for, is how IMatch used to behave...(i.e. that in the absence of a value for one of the location fields, IMatch would indeed return the double pipe to represent that NULL value).  So for an expression like:

        {File.MD.country}|{File.MD.state}|{File.MD.city}|{File.MD.location}

...where the City was not entered, IMatch would return:

        country|state||location

But when I checked, I found that it is in fact as the OP describes...(i.e. the empty node is simply omitted, and the leaf node gets shifted up):

        country|state|location

However, I found that if the Variables are formatted with default (and the default value given, is the NULL value itself):

        {File.MD.country}|{File.MD.state|default:""}|{File.MD.city|default:""}|{File.MD.location|default:""}

...then you do get the Keyword the OP expected (i.e. this appears to "force" the variable to emit the NULL).  For example, using the above expression in a MD Template:

       

...and applying that Template to a file that has this Location data:

       

...results in the Keyword:

        United States|New York||One World Trade Center
       

Hopefully this helps the OP...but for me, I think that I will go back to using the Location fields only (without repeating that data in Keywords), per the suggestions given here. 

Thank you everyone.

--Tony
--Tony

jmhdassen

Thanks Tony for you interesting reply.
I had not looked yet at these Media Templates. IMatch is so full of features that 6 weeks use is not enough to discover them all and evaluate their usefulness.

What I understand however is that this is about creating flat keywords from File metadata. So if one can create a flat keyword here with empty node (double pipe), I find it all the more surprising that you can not get such same flat keyword from the Hierarchy tree.

I do not dispute with anyone that the location data ultimately should be in the metadata tags. And that is also where I put them. But it is a matter of workflow. I know some people think my workflow is too complex, but I have my reasons.
I use (and have been using for many years) the location hierarchy as reference, starting point. It is where I "design" my location data. I have been using it with other applications, but I find the IMatch Hierarchy display especially useful since it is so clear and easy to use. I find it much easier to spot spelling and accentuation mistakes and transliteration problems in a tree. I have an app which then compares the location tags against this tree. So my workflow for location data is: Hierarchy tree -> XMP tags > XMP File, which is then used by other applications on Linux. (Linux is/has been/will be/ my main platform. I grew up in the Unix world).
[I know now that I can also get a hierarchy view using Categories, but that is after-the-fact, very useful as quality check.]

One of my main problems with location data is the standard: country|state|city|loc is ludicrously inadequate. For many countries it is not working and moreover, it is different for different countries. Sometimes I would need: Country|Region|Province|City|Loc for others I would need Country|Province|District|Subdistrict|City|Loc. So a compromise needs to be made, mostly for how to use the XMP tag for State. A tree make this much easier to visualize and standardize (for instance: "Spain|Galicia|Lugo|Viveiro|Playa de Covas", did I use Galicia or Lugo for XMP::\photoshop\State last time ? I can see that from the tree easily).
I recently "discovered" the Group level and Exclude flags in the hierarchy editor. Maybe I can use that to enter the full 5 or 6 part keyword and map to flat keywords. I will explore that once I finished my main migration of legacy images.
If I then could also have the hierarchy produce a flat keyword with an empty field (double pipe) it would be perfect. I do not care how the node is labeled in the hierarchy viewer (~, NILL, NULL, "",() ), as long as it produces an empty field in the flat keyword represented by double pipe. Consistency is very important. I do not want to see my City node shifted away, even if it is empty.

Thanks, JD



Tveloso

Quote from: jmhdassen on June 29, 2020, 12:31:48 AM
IMatch is so full of features that 6 weeks use is not enough to discover them all and evaluate their usefulness.
Yes indeed...IMatch is extremely feature rich.  I have no doubt that you'll be able to create a setup that will do exactly what you need...(there are usually multiple ways to accomplish something in IMatch).  You should definitely read over the Help Topics on Metadata Templates...they're invaluable.

Quote from: jmhdassen on June 29, 2020, 12:31:48 AM
What I understand however is that this is about creating flat keywords from File metadata. So if one can create a flat keyword here with empty node (double pipe), I find it all the more surprising that you can not get such same flat keyword from the Hierarchy tree.
I'm not sure that I'm understanding this...but that's not unusual.  I have been following posts here in the community for quite some time - to help guide my learning of IMatch.  There have been times when folks have asked a question that I did not quite understand, and it took reading Mario's response, for me to say "Ah, I see", and to then also explore the IMatch feature he suggested.

The primary message in my long-winded post was that you can in fact map the Location Tags into XMP::Lightroom\hierarchicalSubject, while retaining the "empty" levels in the resulting Hierarchical Keyword

In fact I was excited that by formatting the individual variables that make up the expression, that could be achieved...and I thought you might even be able to have that happen automatically during indexing of your files.  But it occurs to me now that your files probably already contain those Keywords as you need them...(and perhaps that's where you're having the issue?).  This is where I may not be understanding completely...you say Flat Keywords, but (based on other posts I have read here) that's XMP::dc\subject and/or Composite\MWG::Keywords.  IMatch takes care of mapping the Hierarchical Keywords to those Flat Keywords.  And that's what the Exclude in flat keywords flag in the Thesaurus does.  For example, the Hierarchical Keyword:

        Country|United States|New York|New York|One World Trade Center

...results in these three Flat Keywords:

        United States;New York;One World Trade Center

...if the Level Country has its Exclude in flat keywords flag set.  And IMatch does not create duplicate Flat Keywords, so New York occurs only once there, even though it occurs twice in the Hierarchical Keyword.  If a Level has its Group Level flag set in the Thesaurus, then it doesn't get written to the file at all (and exists only in the Thesaurus).

Quote from: jmhdassen on June 29, 2020, 12:31:48 AM
One of my main problems with location data is the standard: country|state|city|loc is ludicrously inadequate. For many countries it is not working and moreover, it is different for different countries.
I have seen what you mean here...almost all of my photos are taken in the US, but I have a few from family members that were taken in Europe and China, and I have scratched my head a bit about how exactly to update the Location Tags for some of those.

Anyway, I'm sure that with the help of the very knowledgeable people here (and of course especially Mario), you'll be able to resolve whatever issues you may still have.  Good luck!

--Tony
--Tony

jmhdassen


Yes I was probably misusing the term flat keyword. I did not mean the dc:Subjects. They are not useful to me.
What I meant by flat is something like: country|state|city|location or country|state||location as opposed to having the keywords tree hierarchy display.
If I know how to interactively create empty nodes in the tree display, that result in country|state||location, then I am happy.

JD