![]() |
|
Google Gmail Forum Discussions related to Google's Gmail service should go here: suggestions, tips, comments, requests for help, tech issues etc. |
![]() |
|
Thread Tools |
![]() |
#1 |
Cornerstone of the Community
Join Date: Jul 2003
Posts: 821
|
Gmail search for exact word suffers false matches on hyphen
1 Verify "..." should match exact word or phrase: https://support.google.com/mail/answer/7190 https://i.imgur.com/ttTBQVD.png
2 Search for "x-com" (including quotes). Expected: messages containing exactly x-com Observed: also messages containing x.com I guess Google doesn't understand the meaning of the words word and phrase - specifically that they may contain hyphens. |
![]() |
![]() |
![]() |
#2 |
Cornerstone of the Community
Join Date: Dec 2017
Location: Scotland
Posts: 565
|
If you look at the support-page example for stuff in quotes, it's ambiguous. It says that quotes cause a "Search for emails with an exact word or phrase" and gives as an example: "dinner and movie tonight".
But I would be astonished if it didn't find those words irrespective of how many spaces separate them. So .. only sort-of 'exact' ... because that's what people probably expect. It /might/ find the phrase even if it occurred in a mail with punctuation between some/all of the words. Bear in mind that since most people post in 'html', the text being searched has to have arbitrarily complex markup elided first; it might not even be all that clear where word-boundaries are. As a programmer I'm used to lots of situations where 'word' just means a sequence of non-blank, or non-whitespace characters, possibly with some sort of allowance for punctuation, and having to be precise which I mean. It's the sort of thing that can make use of regexes ("regular expressions") complicated. I would guess that Google's algorithm strips punctuation & whitespace from both search arguments and the text being searched, to reduce the complexity of the search - and make it usually do what most people would expect. In your case "x-com" and "x.com" have common punctuation characters between the 'words'. It'd be interesting to test if (with suitable test mails to search) it will properly find eg "x(com" or "x\com" where special characters might not get elided. |
![]() |
![]() |
![]() |
#3 | |||||
Cornerstone of the Community
Join Date: Jul 2003
Posts: 821
|
Quote:
Quote:
Quote:
[quote=JeremyNicoll;639983As a programmer I'm used to lots of situations where 'word' just means a sequence of non-blank, or non-whitespace characters, possibly with some sort of allowance for punctuation, and having to be precise which I mean.[/QUOTE] In this case, I think the user is entitled to reply upon "word" meaning word. Quote:
And if Gmail can't deliver what it promise then the problem is easily solved. Don't make that promise. Quote:
And even if it wasn't, it would be a phrase. Gmail promises a search for an exact phrase and on this it fails. |
|||||
![]() |
![]() |
![]() |
#4 | |||
Cornerstone of the Community
Join Date: Dec 2017
Location: Scotland
Posts: 565
|
It's ambiguous because it could mean "precisely the same", or also "near enough". If (as I think it does) it means the latter, there will always be instances which look wrong to some people.
I don't agree. You think "x-com" is one word, but (I think) that is only because you, knowing the context of what you were searching for, regard it so. How would you expect the algorithm to know? Do you think eg "commonly-held" is one word? How about "brother-in-law"? I think those are two & three words. Quote:
Quote:
Programmers will implement what they're told to, if they can. But what if they're given vague instructions or no instructions? Quote:
Would you /always/ want a search for "dinner and movie tonight" NOT to find an email which contains "dinner and movie, tonight"? If so you'd have a struggle finding all the (pedantically) almost-identical phrases - each of which you'd have to search for one at a time - which many people would think were the same (or /near enough/). Does Google let you ask for mails containing "dinner" then "and" then "movie" then "tonight", while letting you say what is & is not allowed, this time, to be between those words? A further complication is that the rules for different natural languages may or may not matter. For example, what exactly a 'word' is might differ from one to another. If someone searches for "M?ller" [oops, that question mark is an umlauted 'u'], should that also find "Mueller" (& vice versa)? Should specified accented letters have to match exactly, if they're often omitted (eg when they denote stresses in words rather than distinct sounds)? Which language's rules would you use if a mail contains snippets of text in several? Last edited by JeremyNicoll : 15 May 2025 at 08:17 AM. Reason: clarify umlauted u |
|||
![]() |
![]() |
![]() |
#5 | |||
Cornerstone of the Community
Join Date: Jul 2003
Posts: 821
|
Quote:
Quote:
It is a string of contiguous letters and hyphens. Quote:
... then it is failing the promise of exact. When I was searching for exact, yes. |
|||
![]() |
![]() |
![]() |
Thread Tools | |
|
|