Imatch and PDF management- A good match.

Started by rlgreen, May 01, 2021, 02:34:29 PM

Previous topic - Next topic

rlgreen

As a long-time user of Imatch, I value it as a useful photo manager- but I manage very few photos.  Instead, I manage a few thousand personal documents, and no consumer-oriented document manager comes close to matching the approach and capabilities of Imatch.  I appreciate greatly Mario's continuing effort making Imatch a competent PDF manager.

Currently, I use 3 programs to manage documents... 1) A program that scans and/or performs OCR (I prefer PDF-XChange Pro), 2), a search tool that indexes and queries the text in those PDF's (e.g., open-source Recoll), and 3) Imatch which allows me to additionally categorize and search for files with informative descriptors not dependent on filename or OCR text.

Although all are necessary tools, Imatch is indispensable because of its powerful and flexible category features. Thanks Mario!

Mario

Very good. Thanks for the information about how you use IMatch.
I always learning from my users.

OCR is not on my to-do list for IMatch. It's complicated and there are already many good free and paid systems out there.
If I believe some of them can fill XMP data like title, headline, keywords and an abstract from the data they gather (?) - which would you to use that info.

You will probably be happy to hear that the Quick View Panel in IMatch 2021 has been enhanced to show live previews for PDF files.
You can view the contents of the PDF file, navigate, zoom in, see annotations, two-page preview, the thumbnails in the index etc.
This is because IMatch now implements PDF preview using a new IMatch app, which utilizes the PDF support in modern web browsers.

THERE IT IS - a first sneak peek on IMatch 2021:
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

rienvanham

Perfect Mario,

It was one of my first questions when I started using iMatch!

Thanks!

rlgreen

Because Imatch is developed and maintained only by Mario (with much support from users), I resist suggesting new features associated with PDF files. I do think native (in-program) generation of PDF thumbnails is important as it allows for reliable and rapid identification of documents. For better or worse we are visual beings! : )

Implementing "search for text within OCR'd files" would be a secondary request, but merely a luxury. Mature, cross-platform open-source solutions exist in the OCR/Text Search space (Tesseract, Xapian, Recoll), so I see this as subordinate to the (seemingly minor) issue of PDF thumbnailing in Imatch.

Imatch already excels in document management because of its robust category paradigm and its ability to manage PDF metadata.

Mario

IMatch creates PDF thumbnails using a 3rd party library, from the first page.
It works well with all PDF files I have here. Does it not work for your PDF files? If so, attach one so I can have a look.

The products you mention which do OCR on PDF files and extract the text are applications specialized for that, and all most likely backed by more than one developer.
If they put the extracted text somewhere in the XMP record contained in the PDF, IMatch will use it and allows you to search it. If they store the extracted text somewhere else, IMatch cannot reach it.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook