View Single Post
Old 5 Nov 2020, 01:34 AM   #3
n5bb
Intergalactic Postmaster
 
Join Date: May 2004
Location: Irving, Texas
Posts: 8,929
Arrow Search matches the contents of basic zip files

Quote:
Originally Posted by BritTim View Post
Did you happen to see if packed files (such as zip) were supported? The lack of such support has often been a problem for me in other search domains.
Yes, in general it seems that zip files without passwords are searched. Here is what I found testing a zip of 7 PDF files:
  • A search discovers the filenames contained in the zip.
  • A search discovers text contained in a PDF file contained in the zip.
  • When searching a PDF file (even one contained within a zip file), multiple spaces and newlines are ignored so that each token (separated by spaces, commas, or other delimiters) is compared.
So if the PDF file contained:
Code:
usually, a dog
barks
using a search string with embedded spaces and commas such as the following matches, ignoring the spaces, commas, and newline in the document text:
Code:
usually a         dog, barks
This is true even if a search is enclosed within quotes. This means that the search matches a string of tokens, both contained within the search string and the document, while ignoring delimiters such as spaces, commas, and newlines. So it's not an exact match search, but a search for matching token content. I think this is a good choice, since you usually won't know in advance if the search text is split with a newline.

Bill
n5bb is offline   Reply With Quote