Metadata synchronization and codepage

Started by vasekk, July 13, 2021, 08:26:57 PM

Previous topic - Next topic

vasekk

I have a problem with metadata. I use iMatch 2020.14.2 and I have 48,000 photos in the catalog. About 2,000 photos have a problem with metadata. I do not use automatic writing of xmp files, but I create xmp files manually using the function Commands -> Metadata Write-back.

The first problem is the problem with encoding Czech, when if I write Czech characters correctly in the photo, for example "Česká", after updating the xmp file using the "Metadata Write-back" function, the word "?eská" appears incorrectly in the metadata panel.

The second problem is that even if I update the xmp file for photos using the "Metadata Write-back" function, after a while the photo is marked with the "pencil" icon again and iMatch requests the metadata update.

I tried the procedure where I manually delete the xmp file, then use the "ExifTool Command Processor" function to delete all metadata from the photo and then write it to the photo again, but the problem recurs. The problem does not occur with all photos, but for some it is possible to solve the problem using the above procedure.

I attached a log file and an xmp file where I have a problem. Nef file is here https://www.uschovna.cz/en/zasilka/NJGPHEVUIGDI2H8L-5CZ
Please can you advise me?

Mario

#1
Do your files contain legacy IIM3 IPTC data or EXIF data? You did attach only an XMP file, which does not help in this case.
You did not state which file formats you use.

Check in the ECP. If your files have legacy IPTC data, I recommend to remove it. Legacy IPTC data always causes problems with character set encoding.
I have explained this several dozen times here in the community over the past years.

The XMP file correctly shows (as far as I understand):

<Iptc4xmpExt:CountryName>Česká republika</Iptc4xmpExt:CountryName>
photoshop:Country>Česká republika</photoshop:Country>


So the encoding is correct, right?

Quoteafter a while the photo is marked with the "pencil" icon again and iMatch requests the metadata update.

Which tags are listed as pending write-back? Keywords?

Run the Metadata Analyst app on this file and use the GREEN BUTTON to copy the results into your reply.


-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

vasekk

Hi Mario, thank you for your help.
I use RAW (nef) file, and this file is here: https://www.uschovna.cz/en/zasilka/NJGPHEVUIGDI2H8L-5CZ

I attach Metadata analys output:
Metadata Analyst Results. Version 2020.14.2. 7/14/2021 9:37:18 PM
File analyzed: D:\Foto\2017\2017_10_07_Vlk\_VK01441.NEF
Errors: 2
Warnings: 4

Warning: [System] File has unwritten metadata (pending write-back).<br/>The metadata loaded from the image and the data in the database may not match.
Warning: [Legacy IPTC] Character Set Encoding: unspecified.
Warning: [XMP] [ExifIFD]:UserComment not mapped to [XMP-dc]:Description (sidecar).
Error: [Keywords] Different keywords in IPTC and XMP (sidecar).
Error: [Keywords] Flattened hierarchical XMP (sidecar) keywords don't match IPTC keywords.
Warning: [Detailed Validation] Non-standard format (undef) for IFD0 0x8649 PhotoshopSettings

Best Regards, Vaclav

Mario

The NEF file contains metadata in a legacy IPTC record, but no character encoding is specified.
By default ExifTool then assumes that the text has been encoded in the UTF-8 character set. But it has not.

I've tried all available character sets for legacy IPTC data  (see https://exiftool.org/faq.html#Q10) which you can set in IMatch under Edit > Preferences > Metadata, but without any success.

This is the reason why the imported IPTC data (after being mapped to XMP) shows up wrong in IMatch.
There seems to be no usable transformation to convert the data in these legacy tags into UTF-8 encoding used by XMP.

In the ExifTool Command Processor, you can try these out using:

-G1
-iptc:all
-a
-charset
filename=UTF8
-charset iptc=latin1
{Files}


None of the supported character set encodings produced any usable text.
There is no encoding for Czech and there is no encoding for Czech for Windows command line either. Which code page do you use in your country?

Try this out on your PC, maybe your results differ. See the link to the ExifTool page above for a list of character sets to try.

This is a typical problem caused by legacy IPTC data in modern times. There is a reason IIM IPTC has been deprecated 20 years ago and replaced by XMP, which has no such issues.
The IPTC in the file has no character encoding set. None of the character sets we can dial in for ExifTool works with this text.

At least I could not get it to work. But such problems were solved 20 years ago, and I don't have to deal with this legacy stuff anymore, except for commercial conversion projects maybe.

Recommendation:

After importing the files:

1. Delete the legacy IPTC record with the unknown encoding with the ExifTool Command Processor.
Use the "Delete legacy IPTC data preset".

2. Correct the wrong text in the Metadata Panel.
This works for any number of files, e.g. to fix the country name.

Or, use a Metadata Template or the Metadata Mechanic app to replace incorrect characters using a variable for the tag and the replace variable function.

3. Write back.
This creates a clean XMP sidecar for the NEF, with proper character set encoding.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

vasekk

Hi Mario, thank you for your help.
In our country we use Latin2 (CP1250).

I tried the following:
- I manualy delete XMP files via file explorer
- I use ExifTool - "Delete legacy IPTC (IIM) metadata"
- now, I don't see any metadata in the metadata panel right now (it is ok, I delete XMP file)
- I write correct word (Česká republika, Vysočina)
- I see correct czech character, It is OK
- I use "Metadata write" and I again see incorrect character (?eská republika, Vyso?ina) :(

It is possible reset all settings, or is there any way to fix this nef file?

I tested your tip:
-G1
-iptc:all
-a
-charset filename=UTF8
-charset iptc=Latin2
{Files}

and I see this output:
Warning: Tag 'charset' is not defined
Warning: Tag 'charset' is not defined
[IPTC]          City                            : Hlinsko
[IPTC]          Province-State                  : Vyso?ina
[IPTC]          Country-Primary Location Name   : ?eská republika
[IPTC]          Application Record Version      : 4
[IPTC]          Date Created                    : 2017:10:07
[IPTC]          Time Created                    : 08:56:19+02:00
[IPTC]          Digital Creation Date           : 2017:10:07
[IPTC]          Digital Creation Time           : 08:56:19+02:00
[IPTC]          Country-Primary Location Code   : CZE
[IPTC]          Copyright Notice                :

Mario

Sorry, I did explain all the options I know for IMatch and ExifTool.
I don't know which software wrote that data and in which character set. None of the known character encodings can extract any usable data from these records.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

vasekk

Hi Mario, thank you for your help!
I found a way out of this situation:
- copy the problem file (and xmp) to another (temp) folder using the file manager
- iMatch, use "Rescan" on folder (now the photo is no longer in iMatch)
- copy file back to original directory
- iMatch, use "Rescan" on folder (now the photo is back in iMatch)
- iMatch use "Delete legacy IPTC (IIM) metadata"


if I use the above procedure, the encoding is fine.

Thank you very much for your help.