EmailDiscussions.com  

Go Back   EmailDiscussions.com > Email Service Provider-specific Forums > Google Gmail Forum
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
Stay in touch wirelessly

Google Gmail Forum Discussions related to Google's Gmail service should go here: suggestions, tips, comments, requests for help, tech issues etc.

Reply
 
Thread Tools
Old 7 May 2025, 01:52 AM   #1
chrisjj
Cornerstone of the Community
 
Join Date: Jul 2003
Posts: 821
Gmail search for exact word suffers false matches on hyphen

1 Verify "..." should match exact word or phrase: https://support.google.com/mail/answer/7190 https://i.imgur.com/ttTBQVD.png

2 Search for "x-com" (including quotes).

Expected: messages containing exactly x-com
Observed: also messages containing x.com

I guess Google doesn't understand the meaning of the words word and phrase - specifically that they may contain hyphens.
chrisjj is offline   Reply With Quote

Old 15 May 2025, 05:18 AM   #2
JeremyNicoll
Cornerstone of the Community
 
Join Date: Dec 2017
Location: Scotland
Posts: 565
If you look at the support-page example for stuff in quotes, it's ambiguous. It says that quotes cause a "Search for emails with an exact word or phrase" and gives as an example: "dinner and movie tonight".

But I would be astonished if it didn't find those words irrespective of how many spaces separate them. So .. only sort-of 'exact' ... because that's what people probably expect. It /might/ find the phrase even if it occurred in a mail with punctuation between some/all of the words.

Bear in mind that since most people post in 'html', the text being searched has to have arbitrarily complex markup elided first; it might not even be all that clear where word-boundaries are.


As a programmer I'm used to lots of situations where 'word' just means a sequence of non-blank, or non-whitespace characters, possibly with some sort of allowance for punctuation, and having to be precise which I mean. It's the sort of thing that can make use of regexes ("regular expressions") complicated.

I would guess that Google's algorithm strips punctuation & whitespace from both search arguments and the text being searched, to reduce the complexity of the search - and make it usually do what most people would expect.

In your case "x-com" and "x.com" have common punctuation characters between the 'words'. It'd be interesting to test if (with suitable test mails to search) it will properly find eg "x(com" or "x\com" where special characters might not get elided.
JeremyNicoll is offline   Reply With Quote
Old 15 May 2025, 05:30 AM   #3
chrisjj
Cornerstone of the Community
 
Join Date: Jul 2003
Posts: 821
Quote:
Originally Posted by JeremyNicoll View Post
If you look at the support-page example for stuff in quotes, it's ambiguous. It says that quotes cause a "Search for emails with an exact word or phrase" and gives as an example: "dinner and movie tonight".
I see no ambiguity there.

Quote:
Originally Posted by JeremyNicoll View Post
Bear in mind that since most people post in 'html', the text being searched has to have arbitrarily complex markup elided first;
That too is broken. http://www.emaildiscussions.com/showthread.php?t=80936

Quote:
Originally Posted by JeremyNicoll View Post
it might not even be all that clear where word-boundaries are.
The word boundaries in this case are clear.

[quote=JeremyNicoll;639983As a programmer I'm used to lots of situations where 'word' just means a sequence of non-blank, or non-whitespace characters, possibly with some sort of allowance for punctuation, and having to be precise which I mean.[/QUOTE]

In this case, I think the user is entitled to reply upon "word" meaning word.

Quote:
Originally Posted by JeremyNicoll View Post
I would guess that Google's algorithm strips punctuation & whitespace from both search arguments and the text being searched, to reduce the complexity of the search - and make it usually do what most people would expect.
I think most people expect the search to do what it promises. Why would they expect otherwise? Surely not because thery are programmers who think doing what is offered is too difficult.

And if Gmail can't deliver what it promise then the problem is easily solved. Don't make that promise.

Quote:
Originally Posted by JeremyNicoll View Post
In your case "x-com" and "x.com" have common punctuation characters between the 'words'.
x-com doesn't. It is one word.

And even if it wasn't, it would be a phrase. Gmail promises a search for an exact phrase and on this it fails.
chrisjj is offline   Reply With Quote
Old 15 May 2025, 08:15 AM   #4
JeremyNicoll
Cornerstone of the Community
 
Join Date: Dec 2017
Location: Scotland
Posts: 565
Quote:
Originally Posted by chrisjj View Post
I see no ambiguity there.
It's ambiguous because it could mean "precisely the same", or also "near enough". If (as I think it does) it means the latter, there will always be instances which look wrong to some people.


Quote:
Originally Posted by chrisjj View Post
The word boundaries in this case are clear.
I don't agree. You think "x-com" is one word, but (I think) that is only because you, knowing the context of what you were searching for, regard it so. How would you expect the algorithm to know? Do you think eg "commonly-held" is one word? How about "brother-in-law"? I think those are two & three words.


Quote:
Originally Posted by chrisjj View Post
In this case, I think the user is entitled to reply upon "word" meaning word.
So, is "BBC" a word? How about "P.D.Q.Bach"? Or "WD40" or "W.D.40"? How do YOU decide where the word boundaries are? Which dots are parts of words & which are not?


Quote:
Originally Posted by chrisjj View Post
I think most people expect the search to do what it promises. Why would they expect otherwise? Surely not because thery are programmers who think doing what is offered is too difficult.
I think the point is that it's not clear "what it promises". In different circumstances I think most people would expect subtlely different behaviours, perhaps even the exact opposite of what they expected in some prior search.

Programmers will implement what they're told to, if they can. But what if they're given vague instructions or no instructions?


Quote:
Originally Posted by chrisjj View Post
x-com doesn't. It is one word. And even if it wasn't, it would be a phrase. Gmail promises a search for an exact phrase and on this it fails.
If it's a phrase, from which punctuation has been stripped, it's identical to similarly-treated "x.com" and "x com".


Would you /always/ want a search for "dinner and movie tonight" NOT to find an email which contains "dinner and movie, tonight"? If so you'd have a struggle finding all the (pedantically) almost-identical phrases - each of which you'd have to search for one at a time - which many people would think were the same (or /near enough/).

Does Google let you ask for mails containing "dinner" then "and" then "movie" then "tonight", while letting you say what is & is not allowed, this time, to be between those words?


A further complication is that the rules for different natural languages may or may not matter. For example, what exactly a 'word' is might differ from one to another.

If someone searches for "M?ller" [oops, that question mark is an umlauted 'u'], should that also find "Mueller" (& vice versa)? Should specified accented letters have to match exactly, if they're often omitted (eg when they denote stresses in words rather than distinct sounds)? Which language's rules would you use if a mail contains snippets of text in several?

Last edited by JeremyNicoll : 15 May 2025 at 08:17 AM. Reason: clarify umlauted u
JeremyNicoll is offline   Reply With Quote
Old Yesterday, 04:39 AM   #5
chrisjj
Cornerstone of the Community
 
Join Date: Jul 2003
Posts: 821
Quote:
Originally Posted by JeremyNicoll View Post
It's ambiguous because it could mean "precisely the same", or also "near enough".
No. "exact word or phrase" means exact.

Quote:
Originally Posted by JeremyNicoll View Post
I don't agree. You think "x-com" is one word, but (I think) that is only because you, knowing the context of what you were searching for, regard it so.
No., Because it meets the definition of "word".

Quote:
Originally Posted by JeremyNicoll View Post
How would you expect the algorithm to know?
It is a string of contiguous letters and hyphens.

Quote:
Originally Posted by JeremyNicoll View Post
Do you think eg "commonly-held" is one word? How about "brother-in-law"?
Of course. See the definition of word.


Quote:
Originally Posted by JeremyNicoll View Post
If it's a phrase, from which punctuation has been stripped
... then it is failing the promise of exact.

Quote:
Originally Posted by JeremyNicoll View Post
Would you /always/ want a search for "dinner and movie tonight" NOT to find an email which contains "dinner and movie, tonight"?
When I was searching for exact, yes.
chrisjj is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +9. The time now is 04:38 PM.

 

Copyright EmailDiscussions.com 1998-2022. All Rights Reserved. Privacy Policy