Workflow for Scanned Images (from Scanner)

WebEngel · March 02, 2021, 09:41:57 PM

Hi guys,

there are many posts here about workflows for photographers. However, I haven't seen one for scanning images. Here is an attempt with lots of questions and open issues:

1) Scan as PNG or TIFF or Camera Raw

From flatbed scanner, I would take PNG or TIFF because I want to keep the original without compression. I guess PNG 24 bit is sufficient for most pics and is more DR than the positives can generate, so TIFF does not seem to add value to me. For some pics, I still may go TIFF 48 bit.

For silk pictures, I tend to use my camera because it generates fewer of the silk patterns, but I would follow the same workflow, just TIFF would be ARW.

2) File relations

I could generate one or two 24 bit PNG out of the 48 bit TIFF, most likely with DXO, focusing on white balance, contrast and dynamic range. I could then create a JPG from the PNG with whatever tool seems appropriate, focusing on the silk artifacts mainly.

Master would be PNG/TIFF, buddy and version would be Png or Jpg with the same filename (as long as there is only one derivative)

The Png (secondary) or Jpg (preferred) would be the visual proxy, so I would view the Jpg in the Imatch viewer

Q: If I have multiple Jpgs per TIFF with different processing, how would I indicate different versions and make sure the right one is the visual proxy

Q: I dont have a good idea on how to treat a second scan of the same picture and make sure they are somehow connected. So I may have the same original picture scanned two months ago dry and I scanned it again wet yesterday. And each of the scans may have several processing versions. Ultimately, the worse one should be deleted, but it will take be some time to find out.

Q: Assume I split pictures (original shows grandma and uncle, and I want one picture with the original content and two portraits of the two persons alone), then only the full content would be the visual proxy). I fear this will create a complete mess. Anybody doing this already?

3) Rename

Q: What is a good naming schema? For pics from a camera, I use YYYYMMDD of date taken, because that is a constant. For the scanned image, the date is of course unknown. I can guess it, but I may later have to change it. Should I then also change the filename (which will then lead to both the old and new name being stored on the backup)? Or use a completely different filename?

I tend to use YYYYMM00-[Sequence] as a filename, with YYYYMM as the estimated month when the picture was taken

4) Fix date manually, in the metadata browser

(was described here already: https://www.photools.com/community/index.php?topic=1129.msg6687#msg6687=

5) Fix orientation/rotation: In Imatch: Commands, Image, orientation

6) Metadata

I would maintain the metadata on the PNG/TIFF, I assume. And I assume, this would end up in XMP sidecar files for PNG and embedded data for TIFF and Jpg versions.

7) Categories and Face recognition:

Q: Run on the master file? What if the master shows 2 persons and there are 2 Jpgs out of it with the persons alone?

Martin

jch2103 · March 02, 2021, 10:34:17 PM

Good questions. I'm sure others will have better answers than I, but here are a few of my thoughts:

- Because you're using a scanner, you may want to consider using VueScan for your scanning software. It works with almost any scanner on the market and has a 'raw' mode that can help in optimizing output images.
- One key issue with scans is that they don't usually have original date/times (the date/time the original document/image was created). The same date/time issue applies if you're capturing images with a camera. You found my post on this issue. There are also discussions here in the Forum on how to deal with approximate dates. I use a relatively simple approach (e.g., approximate year = 1870:01:01; known year/month = 1870:07:01; etc.). There are more sophisticated approaches as well.
- By storing date/times in your metadata, you have more flexibility in choosing file names. You can of course also add additional relevant metadata that makes sense for your image collection (e.g., locations, original image format, etc.). I have to confess I'm inconsistent re file names; for family history photos, I often use the name of one of the people (e.g., Smith, Walter age 12.ext, etc.). It's easy to find/sort these if original data and face recognition data are in the image.
- For second (or third, etc.) scans of the same image, I assume some postfix naming scheme would make sense, e.g., [name]-2-[comment]. This would help group these images together (and if you add original date/time you have an additional grouping/sorting option).
- Regarding recognizing persons, you may find it easier to just use IMatch face recognition (recognizing all the relevant people in the image) for the original image, rather than creating two separate copies. Again, having the face recognition metadata should solve several problems.
- Metadata propagation. I assume you'll want to add metadata (faces, other tags) to the master image and propagate to versions. IMatch can certainly handle this with versioning, etc. You'll want to test to make sure it's working properly w/o complications. And you'll want to adjust your workflow to avoid unnecessary duplication for different versions.

Let us know if you have questions/issues. Several folks here have dealt with these issues, even if our workflow isn't perfect.

Carlo Didier · March 02, 2021, 10:45:27 PM

Hmm, never seen wide use of PNG for photos. It is more aimed at web use (as a replacement for GIF).

sinus · March 03, 2021, 08:01:09 AM

Quote from: Carlo Didier on March 02, 2021, 10:45:27 PM
Hmm, never seen wide use of PNG for photos. It is more aimed at web use (as a replacement for GIF).

Yep, I agree.
I would go for tiff, if you think, jpg is not enough good.

For my scanned pics I have basically the same workflow, like for all photos.
I see not a real difference, except the date.

We have had here some threads about scanned images, but have not time to search now, sorry.

WebEngel · March 05, 2021, 09:10:46 PM

Thanks, jch2013

Quote from: jch2103 on March 02, 2021, 10:34:17 PM
- By storing date/times in your metadata, you have more flexibility in choosing file names.

I know that a date belongs into the metadata. It is just that I chose the date as the filename for digital pictures, so that pictures can be sorted by date on any device, and people whom I send pictures and who may have no idea what metadata is still know when the pic was taken.

Quote from: jch2103 on March 02, 2021, 10:34:17 PM
- For second (or third, etc.) scans of the same image, I assume some postfix naming scheme would make sense, e.g., [name]-2-[comment]. This would help group these images together (and if you add original date/time you have an additional grouping/sorting option).

How exactly then are they grouped together? You mean some Imatch functionality? Because they are most likely in different folders.

Quote from: jch2103 on March 02, 2021, 10:34:17 PM- Regarding recognizing persons, you may find it easier to just use IMatch face recognition (recognizing all the relevant people in the image) for the original image, rather than creating two separate copies.

Well these are two goals:
Face recognition -> make sure an image is associated to the person and can be found
Splitting an image into to halfs -> Create a portrait style picture without distracting content or other persons

To avoid misunderstandings, I am not talking about two identical copies but a split!

Quote from: jch2103 on March 02, 2021, 10:34:17 PM
Again, having the face recognition metadata should solve several problems.

I don't think Face Recognition makes sense for scans. There are so few images per person and per decade that AI does not add any value--there is more training needed.

Regards
martin

WebEngel · March 05, 2021, 09:14:04 PM

Thanks sinus

Quote from: sinus on March 03, 2021, 08:01:09 AM
For my scanned pics I have basically the same workflow, like for all photos.
I see not a real difference, except the date.

There are some differences though:

Date taken:
is exact for digital, is an estimate for scan
is automatic for digital, manual for scan
is constant for digital, and may change for scan

Instances:
one instance for digital (well unless your cam outputs versions, like for HDR, but this is rare I assume).
Multiple for scans

"cutouts/clippings":
Make hardly any sense for digital. If you want a dedicated headshot, you shoot it
Make sense for scans, as you cannot repeat the shots. So 1 group picture plus several portraits from one scan

File name: Digital: can be from date taken. Scan: If naming schema contains date, file name may change, not a good idea.

That's why it is different for me.

Quote from: sinus on March 03, 2021, 08:01:09 AMWe have had here some threads about scanned images, but have not time to search now, sorry.

I did search the forum for scanning, and all my findings went into my first post. Unfortunately most hits were for folder scanning.

Mario · March 06, 2021, 08:38:21 AM

QuoteHow exactly then are they grouped together? You mean some Imatch functionality? Because they are most likely in different folders.

Do you know about File Relations: Versioning? This allows you to define file relations, making one file the master and other files versions. This automatically groups files together, even if they are in different folders.

QuoteI don't think Face Recognition makes sense for scans. There are so few images per person and per decade that AI does not add any value--there is more training needed.

This depends. Even if you manually add face annotations, you will benefit from all 'People' features in IMatch, from searching to sorting to the People View.

WebEngel · March 06, 2021, 09:19:13 PM

Quote from: Mario on March 06, 2021, 08:38:21 AM
QuoteHow exactly then are they grouped together? You mean some Imatch functionality? Because they are most likely in different folders.

Do you know about File Relations: Versioning? This allows you to define file relations, making one file the master and other files versions. This automatically groups files together, even if they are in different folders.

I use file relations for my digital pictures, where the Jpg is a version of the Raw. Great feature!

How would you suggest to use it for Scans. Again, the scenario is as follows: For one paper picture from 1955, I could end up with:
19550000_001_dry.tif (a dry scan)
19550000_001_dry.png (lossless bitmap from above scan, with contrast and wb fixed)
19550000_001_dry.jpg (final jpg with fast Fourier to remove the silk pattern)
19550000_001_photo.raw (photographed)
19550000_001_photo.jpg (jpg from digital photo above)
19550000_001_wet.tif (a wet scan)
19550000_001_wet.jpg (jpg from above scan)
19550000_001_wet_Paul.jpg (excerpt from above, showing just Paul)

Creating the above files is already a challenge, as I don't know the file name 19550000_001 when I do the wet scan, as I did the dry scan two years earlier. Do I search for visually similar files in Imatch?

Once I have the files with these names, which one is the version, which the master, and how can I view all of them side by side? Are there three masters (the first digital version being the master, so we have 3 of them), or is there just one master for the 8 files?

By the way, what is a recommended meta data field to indicate processing (i.e. "picture scanned dry on Epson scanner, white balance and contrast done in DXO")

Martin

ubacher · March 07, 2021, 09:21:30 AM

Defining relationships:

You could add _MASTER to your master file. All other files with the same root (w/o _MASTER)
can be specified as versions.

Look at Preferences->File Relations and study how you can define the relationships.
You can remove the _MASTER with the replacement relation. This leaves the file root.
Any files starting with this root and having additional (or specific) characters added can then be versions.

Mario · March 07, 2021, 11:32:07 AM

QuoteDo I search for visually similar files in Imatch?

See Finding Duplicate Files

Quotewhich the master, and how can I view all of them side by side?

There can be only one master. You decide which file you consider a master and make all the other files versions.
A master and all it's versions form a version stack and you can see the files in the Version Panel and you can open all the files in the stack in a result window at any time.

QuoteBy the way, what is a recommended meta data field to indicate processing (i.e. "picture scanned dry on Epson scanner, white balance and contrast done in DXO")

There is none. Pick one. Description comes to mind. If you really need to preserve the information which settings you applied.
I would also put Epson and DxO as keywords, which makes filtering easy.

loweskid · March 07, 2021, 01:38:35 PM

Quote from: WebEngel on March 06, 2021, 09:19:13 PMBy the way, what is a recommended meta data field to indicate processing (i.e. "picture scanned dry on Epson scanner, white balance and contrast done in DXO")

Do you really need to put that in the metadata? I use attributes and/or annotations for that sort of 'extra' information.

jch2103 · March 07, 2021, 06:36:23 PM

Quote from: Mario on March 07, 2021, 11:32:07 AM
QuoteBy the way, what is a recommended meta data field to indicate processing (i.e. "picture scanned dry on Epson scanner, white balance and contrast done in DXO")

There is none. Pick one. Description comes to mind. If you really need to preserve the information which settings you applied.
I would also put Epson and DxO as keywords, which makes filtering easy.

There is a 'software' metadata tag (EXIF::Main\305\Software\0) which some software programs use to show that they've created the file. However, it won't do what you want, as there can only be one entry; if you use successive programs to work on the image, the programs will overwrite the prior value.

As Mario says, keywords may be what you want to use in your case. Or, as loweskid said, if you don't want this information in your image metadata you can use attributes and/or notations.

Mario · March 07, 2021, 06:45:11 PM

Tip: Avoid updating EXIF tags directly. Always identify and update the corresponding XMP metadata tag instead.
Otherwise you risk your EXIF field being overwritten during write-back by the XMP tag.

XMP EXIF has a "file source" tag which can be 1, 2 or 3. See https://exiftool.org/TagNames/XMP.html

jch2103 · March 07, 2021, 07:40:45 PM

Quote from: Mario on March 07, 2021, 06:45:11 PM
Tip: Avoid updating EXIF tags directly. Always identify and update the corresponding XMP metadata tag instead.
Otherwise you risk your EXIF field being overwritten during write-back by the XMP tag.

XMP EXIF has a "file source" tag which can be 1, 2 or 3. See https://exiftool.org/TagNames/XMP.html

Yes.

Code Select

FileSource	integer	1 = Film Scanner
                        2 = Reflection Print Scanner
                        3 = Digital Camera

But this probably isn't a good choice for end users to modify either, as it seems to be a copy of the EXIF tag (EXIF::\Main\41728\Filesource\0) which cameras seem to write to (scanners I've used don't seem to populate it, though), so subject to conflicts/being overwritten/etc.

Mario · March 07, 2021, 07:50:47 PM

Agreed.
I would always stick to the metadata tags intended for humans. Description, title, caption, keywords etc. Basically what is default in the Metadata Panel and Keyword panel.

IMatch Attributes offer a powerful way to store whatever you like in whatever format you need.
If you have a number of repeating workflows, you can store them in a global Attribute Set and link them to a per-file set. Then you can select the workflow/process (or whatever) from a drop-down list.

IMatch really gives users a lot of options. Pick the simplest thing that does the job

herman · March 07, 2021, 07:56:10 PM

Quote from: WebEngel on March 06, 2021, 09:19:13 PM
[...]

By the way, what is a recommended meta data field to indicate processing (i.e. "picture scanned dry on Epson scanner, white balance and contrast done in DXO")

I use just the filename for this purpose.

Suppose my master file would be named HP-20210307-0001.dng
HP are the initials of the photographer, who is me in this example.

A version derived by DxO then will be HP-20210307-0001_D.jpg

When the IMatch batch processor produces a web version of the DxO file it becomes HP-20210307-0001_DIw.jpg

In this flow D stands for DxO, I for IMatch and w for web use.
Everything before the underscore _ is the master file name, after the underscore it is the processing workflow.

Similarly, when the file developed by DxO is prepared for an external printservice by QImage it will be named HP-20210307-0001_DQp.jpg

sinus · March 08, 2021, 09:16:26 AM

Quote from: herman on March 07, 2021, 07:56:10 PM
Quote from: WebEngel on March 06, 2021, 09:19:13 PM
[...]

By the way, what is a recommended meta data field to indicate processing (i.e. "picture scanned dry on Epson scanner, white balance and contrast done in DXO")

I use just the filename for this purpose.

Me too.
You can create only for you some unique abbreveations and then use them in the filename.
I think, something like this does a lot of photographers.

Further I see no difference between scans or digital new images in relation to a DAM.
The only thing, what matters really, it the date, but this can be (or must be) handled somehow.
Maybe at beginning we do know ony rough the date.
Later, maybe relatives gives better dates, then we can change this date ... and later again.

Basically I am with ubacher's proposal. I do it also, like he wrote.