EmailDiscussions.com  

Go Back   EmailDiscussions.com > Email Service Provider-specific Forums > FastMail Forum
Register FAQ Members List Calendar Today's Posts
Stay in touch wirelessly

FastMail Forum All posts relating to FastMail.FM should go here: suggestions, comments, requests for help, complaints, technical issues etc.

Reply
 
Thread Tools
Old 28 Mar 2006, 09:59 AM   #1
anotherJeremy
Essential Contributor
 
Join Date: Aug 2004
Location: Japan
Posts: 226
too many false positives

I have a pop account for work which I access via fastmail's pop links. The address is public so it does get some spam, but it also gets a huge number of false positives. This means I have to go through my junk folder every day to find the ham. It also means that when I respond to a message and remove the spam score from the subject line, my message is not recognized as a response to the incoming message. I can't whitelist the incoming mail because most of it is first time inquiries from potential customers.

Most of these mails are in Japanese. I mentioned before on this forum that Japanese messages seem to get an extra high spam score but someone from fastmail said that wasn't true. I still wonder because as far as I can tell, there's nothing spammy about them.

Any suggestions for how to solve this problem? Thanks in advance.

Jeremy
anotherJeremy is offline   Reply With Quote

Old 28 Mar 2006, 11:20 PM   #2
hadaso
The "e" in e-mail
 
Join Date: Oct 2002
Location: Holon, Israel.
Posts: 4,857
You can look for "X-spam-hits" in the headers and see if there's something common that raises the spam scores. In Hebrew I also get quite a lot legitimate mail that gets relatively high spam score.

You can use "advanced" spam filtering, opt not to use spam filtering automatically, and then set rules that use the spamscore with other headers to get mail coming from different pop-links or forwarded to different aliases to use the spamscore differently.

Example: I use
allof( header :value "ge" :comparator "i;ascii-numeric" ["X-Spam-score"] ["7"] , header :contains "X-LinkName" "netvision" )
in the filing rules (with "look in" set to "advanced") to file the email I pull from my Netviosion account with spam score 7 or above to a spam folder.

Another slightly more complicated rule:
allof( header :value "ge" :comparator "i;ascii-numeric" ["X-Spam-score"] ["7"] , anyof( header :contains "X-Delivered-to" "member", header :contains "X-Sneakemail-Label" "slashdot") )
deals with email forwarded by a forwarding service to an alias containing the word "member" and in addition email forwarded by by sneakemail that was received at one of the addresses I published on Slashdot, and that have spam score 7 or above.
It's not a perfect solution, but it can save you time by having most legitimate mail pulled from the pop account in one folder, and all of the spam in another, with a few false positives and a few false negatives.
hadaso is offline   Reply With Quote
Old 30 Mar 2006, 05:27 PM   #3
anotherJeremy
Essential Contributor
 
Join Date: Aug 2004
Location: Japan
Posts: 226
thank you

Thank you, Hasaso. I'll try the advanced spam filtering.

As for checking x-spam hits, I glanced at a few and nothing jumped out at me except that most of them say something about bayes. What does that mean? And is there anything I can do about it? As individual users, we can't change the way spam assassin judges our messages, can we?

If the advanced spam filtering thing doesn't work, I'm not sure what I'll do. Somethings got to change, because the way things are, I have to look through my junk folder as often and as carefully as my inbox. All my messages get forwarded to gmail and the filter there allows the occasional spam into the inbox but very rarely gives false positives.
anotherJeremy is offline   Reply With Quote
Old 31 Mar 2006, 02:55 PM   #4
robmueller
Intergalactic Postmaster
 
Join Date: Oct 2001
Location: Melbourne, Australia
Posts: 6,102

Representative of:
Fastmail.FM
Can you PM me your account name, and leave some messages in the Junk Mail folder that I can look at. I'll see if there's an obvious problem...

Rob
robmueller is offline   Reply With Quote
Old 4 Apr 2006, 07:13 AM   #5
anotherJeremy
Essential Contributor
 
Join Date: Aug 2004
Location: Japan
Posts: 226
Rob looked into it

and just for people who amy have been following this thread, I quote (with permission) what he said in a PM:

Quote:
Hmmm, the problem mostly is that the emails seem to have been tagged as highly spammy by the dynamic bayes database, and also because some japanese Eudora clients are being incorrectly determined as forged.

I'll fix up the eudora issue, and i'm going to clear the bayes database, if you still get a lot of false positives in the next couple of days, let me know, I'll dig some more...

Rob
I appreciate Rob's help.

Jeremy
anotherJeremy is offline   Reply With Quote
Old 23 Jun 2006, 04:41 PM   #6
anotherJeremy
Essential Contributor
 
Join Date: Aug 2004
Location: Japan
Posts: 226
still too many false positives

Most of them are in Japanese. Most are very short messages. Many come from cell phones. I don't know what else they could have in common that lead to them being classified as spam.

Rob, could you take another look in my junk folder? I left eight mails in there (the read ones). Just in case you don't have my user ID from before, I'll PM it to you again.

Thanks.
-Jeremy
anotherJeremy is offline   Reply With Quote
Old 23 Jun 2006, 05:34 PM   #7
JasonWard
Cornerstone of the Community
 
Join Date: Mar 2004
Location: London, UK
Posts: 834
Re: still too many false positives

Quote:
Originally posted by anotherJeremy
Most of them are in Japanese. Most are very short messages. Many come from cell phones. I don't know what else they could have in common that lead to them being classified as spam.

Rob, could you take another look in my junk folder? I left eight mails in there (the read ones). Just in case you don't have my user ID from before, I'll PM it to you again.

Thanks.
-Jeremy
I think the problem could be because many people like me will class any message in Japanesse as SPAM, so I suspect the Bayes filters just see Japanesse content as SPAM regardless.
JasonWard is offline   Reply With Quote
Old 23 Jun 2006, 06:06 PM   #8
anotherJeremy
Essential Contributor
 
Join Date: Aug 2004
Location: Japan
Posts: 226
Re: Re: still too many false positives

Quote:
Originally posted by JasonWard
I think the problem could be because many people like me will class any message in Japanesse as SPAM, so I suspect the Bayes filters just see Japanesse content as SPAM regardless.
But there are hundreds of millions of ham Japanese emails being sent every day. I hope SA's bayesian filters aren't based on spam/ham ratings from English users only.

Or does Fastmail have its own Bayes database? Thich would give non-English messages a hard time getting through since I assume most fastmail customers use English and would classify messages in other languages as spam.

Jeremy
anotherJeremy is offline   Reply With Quote
Old 23 Jun 2006, 09:49 PM   #9
robmueller
Intergalactic Postmaster
 
Join Date: Oct 2001
Location: Melbourne, Australia
Posts: 6,102

Representative of:
Fastmail.FM
The bayes DB is not currently user controlled in any way, it's built up automatically from the spam score of messages. It's not ideal, but it does actually help.

Part of the thing with the new servers is that we'll be freeing up some other servers to become DB servers for logging + per-user bayes DB, I've been waiting for that for a while...

Rob
robmueller is offline   Reply With Quote
Old 23 Jun 2006, 09:56 PM   #10
anotherJeremy
Essential Contributor
 
Join Date: Aug 2004
Location: Japan
Posts: 226
Quote:
Originally posted by robmueller
The bayes DB is not currently user controlled in any way, it's built up automatically from the spam score of messages. It's not ideal, but it does actually help.

Part of the thing with the new servers is that we'll be freeing up some other servers to become DB servers for logging + per-user bayes DB, I've been waiting for that for a while...

Rob
Does that mean that false positives get fed into the database leading to more false positives?

I'm looking forward to the per-user database. Should I be saving spam to train it on once it's in place?
anotherJeremy is offline   Reply With Quote
Old 24 Jun 2006, 05:37 AM   #11
hadaso
The "e" in e-mail
 
Join Date: Oct 2002
Location: Holon, Israel.
Posts: 4,857
Re: Re: Re: still too many false positives

Quote:
Originally posted by anotherJeremy
...I hope SA's bayesian filters aren't based on spam/ham ratings from English users only...
They are probably based on spam/ham of mainly English speakers. I get some Hebrew email with spam score that classifies it as spam (above five). However most of my Hebrew email is not classified as spam (However I saw once a legitimate message with spam score of about 26). I do however get quite a lot of spam in English that receives a very low spam score (all this spam on email addresses that get mostly spam. Some of which I should really just dump...).

I do consider all the mail I get in Japanese/Chinense/Spanish etc. as spam, but it really is - it's quite easy to recognize even if I don't speak the language. I don't get any legitimate email in these languages because people who know me don't send me email in languages I don't speak. So even if a spam/ham coepus used to train a system to recognize spam contains only Japanese spam, that would really be spam and not include any ham classified as spam. However, if their is no Japanese ham to go with it, a statistical model would just learn to associate Japanese patterns as spam.

When there is a per user bayes DB, I wonder how easy it would be to make it work "per destination" (e.g., pop-link, alias, email address). It seems that different "destinations" tend to get different patterns of spam, so taking it into account in the statistical model can produce better results.
hadaso is offline   Reply With Quote
Old 26 Jun 2006, 10:12 AM   #12
robmueller
Intergalactic Postmaster
 
Join Date: Oct 2001
Location: Melbourne, Australia
Posts: 6,102

Representative of:
Fastmail.FM
Quote:
Originally posted by anotherJeremy
Does that mean that false positives get fed into the database leading to more false positives?

I'm looking forward to the per-user database. Should I be saving spam to train it on once it's in place?
Email you "report as spam" is already going into your corpus to be used when we have it set up.

The "train to bayes db" level is very high, so I think it's currently only spams with scores above 15 or below -10 which are going to the training DB.

Rob
robmueller is offline   Reply With Quote
Old 26 Jun 2006, 06:30 PM   #13
anotherJeremy
Essential Contributor
 
Join Date: Aug 2004
Location: Japan
Posts: 226
That's good to hear. Do I have to actually click the "report as spam" button, or does moving it to the junk folder in my client do the trick. And likewise, will simply moving the messages marked falsely as spam out of the junk folder serve to report the messages as nonspam?

Also thanks for the suggestions in the PM, Rob. I'll raise my SA score threshhold and add addresses to my address book. I get a lot of inquiries from potential customers which of course I can't whitelist ahead of time. Also I usually use a client, so without a way to synch my thunderbird address book and my FM address book, it's kind of a pain to remember which addresses need to be added to FM.

Jeremy

Edit: Just a thought. Can I whitelist all Japanese encoded email, maybe using sieve? Probably a topic for another post.
anotherJeremy is offline   Reply With Quote
Old 27 Jun 2006, 04:53 PM   #14
robmueller
Intergalactic Postmaster
 
Join Date: Oct 2001
Location: Melbourne, Australia
Posts: 6,102

Representative of:
Fastmail.FM
Quote:
Originally posted by anotherJeremy
Edit: Just a thought. Can I whitelist all Japanese encoded email, maybe using sieve? Probably a topic for another post.
Yes. You'd have to use a custom sieve script, and look for the iso-2022-jp or shift-jis encoding in the appropriate header. Search the forum for someone trying to block all chinese emails, basically the same idea in reverse

Rob
robmueller is offline   Reply With Quote
Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +9. The time now is 12:33 AM.

 

Copyright EmailDiscussions.com 1998-2022. All Rights Reserved. Privacy Policy