|
Runbox Forum Everything related to Runbox should go here: suggestions, comments, complaints, questions, technical issues, etc. |
|
Thread Tools |
4 May 2004, 06:37 AM | #1 |
Intergalactic Postmaster
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606
Representative of:
Runbox.com |
SpamAssassin autolearn=ham?
What does it mean if the SpamAssassin header says "autolearn=ham"? I noticed the SpamAssassin message headers of a spam message that slipped through SpamAssassin said "autolearn=ham".
Regards, Rich |
4 May 2004, 07:10 AM | #2 |
Cornerstone of the Community
Join Date: Jun 2003
Location: New York, NY
Posts: 900
|
Hi Rich,
Just googled it. Here is what I got from this page: http://spamassassin.rediris.es/doc/s...20for%20ham%20 DEFAULT TAGGING FOR HAM (NON-SPAM) MAILS X-Spam-Status: header A string, No, hits=nn required=nn tests=xxx,xxx autolearn=(ham|spam|no) is set in this header to reflect the filter status. Looks like Ham is the opposite of Spam. Does that answer your question? BKB. |
4 May 2004, 08:41 AM | #3 |
Intergalactic Postmaster
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606
Representative of:
Runbox.com |
Thanks. I guess I'm more curious about why it was "ham" and not "no" like it usually is?
Rich |
4 May 2004, 01:04 PM | #4 |
Essential Contributor
Join Date: Oct 2003
Posts: 455
|
I vaguely recall that Ham is the email you explicitly designate as being good, wanted mail. It is the opposite of spam, but more specifically it's the content you use to train a Bayesian filter to recognize the mail that you want.
I wonder if the SpamAssassin filter is saying that that particular message is not just "Not Spam" but even above and beyond that it closely matches a message which it's been explicitly told is a good message. Would that make sense? Was this a message that someone somewhere could have actually wanted? Or is it potentially a sign of someone trying to trick SA into thinking this was a good message? --Jason |
4 May 2004, 03:41 PM | #5 |
Intergalactic Postmaster
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606
Representative of:
Runbox.com |
The message was SPAM that was not flagged as SPAM by SpamAssassin. What I don't understand is why "autolearn" was set to "ham". As far as I know Runbox isn't using the "autolearn" function yet. That's why it's should have said "autolearn=no".
Rich |
5 May 2004, 09:48 PM | #6 |
Intergalactic Postmaster
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606
Representative of:
Runbox.com |
Jason, you were right about the autolearn. I found some details at the SpamAssassin Wiki:
Why isn't autolearning working for me? (aka: "autolearn=no") If it says "autolearn=no" then it means that SpamAssassin has not learned whether or not the message is spam or ham. If it says "autolearn=ham", then SpamAssassin has been trained to recognized the message as "ham". If it says "autolearn=spam", then SpamAssassing has been trained to recognize the message as "spam". I have a message that is 100% SPAM yet SpamAssassin has tagged it with "autolearn=ham" which means it's been trained the recognize the message as "ham". My question for the Runbox crew is when did you start training SpamAssassin, who's training it and how do you "retrain" it when it's been trained wrong? Regards, Rich |
7 May 2004, 06:30 PM | #7 | |
Junior Member
Join Date: Jan 2004
Posts: 22
|
Quote:
The database is system-wide, and kept in memory only (it will have to be re-built if the system is rebooted). Today there is no way for you to "correct" any mail learned wrong or explicitly learn any mail that had autolearn=no. Of course, this classifier is in no way as effective as a actively maintained per-user bayesian filter would be. However, it was deemed to be better than nothing when we started using SpamAssassin, even though incorrect learnings such as the one you experienced may occur sometimes. Look for the BAYES_* tests in the spam report to see the effect the classifier has on scoring your incoming mails. |
|
7 May 2004, 09:52 PM | #8 |
The "e" in e-mail
Join Date: Sep 2001
Location: Oslo, Norway
Posts: 2,938
Representative of:
Runbox.com |
We are currently working to expand the spam filter with individual "intelligent" filtering (more specifically CRM-114), allowing a user to tell the filter when it has classified a message erroneously - thus teaching it to catch spam more accurately.
We're hoping to launch this within a couple of weeks. - Geir |
7 May 2004, 10:46 PM | #9 | |||
Intergalactic Postmaster
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606
Representative of:
Runbox.com |
Hi Tore,
Quote:
Quote:
If a message has been learned incorrectly, what do I need to do to fix it? Quote:
X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on gaspode.runbox.com X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham version=2.60 X-Spam-Level: The message is for Vicodin (line reads "0,rder V~ic0-din 0nline Anytime"). It's all text (no HTML). About 50% of it is unrelated (part random words, part a joke about a freshman being tossed into a frog pond). I guess the spammers are figuring out how to get passed the Bayesian filters. Rich |
|||
7 May 2004, 11:04 PM | #10 | |
Intergalactic Postmaster
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606
Representative of:
Runbox.com |
Hi Geir,
Quote:
The ones that are valid that I don't know usually have other information in the headers I can key off of with filters. I also like the "challenge message" ideas other services use. Rich |
|
9 May 2004, 02:13 AM | #11 | ||
Junior Member
Join Date: Jan 2004
Posts: 22
|
Quote:
Of course - *I* could take the message in question and feed it to SA-learn on all the machines, but once that is neccessary to make the bayesian classifier work it's better to just disable the whole thing. Quote:
|
||
9 May 2004, 02:36 AM | #12 | |
Junior Member
Join Date: Jan 2004
Posts: 22
|
Quote:
C-R system places the burden on the alleged sender. But as there is absolutely *no* way today to verify the sender of an e-mail's authenticity, a C-R system will direct its challenges to unrelated third parties - thus making them unsolicited, or in other words: spam. It's a very very asocial, arrogant, and selfish way to deal with one's spam problems, IMNSHO. That said - C-R *is* very effective to prevent spam. However, it'll also prevent legit e-mail, as I, and many others, refuse to reply to such challenges by principle alone. If you want to keep a white-list of senders, fine, but don't expect me (and every other e-mail using individual on the planet) to maintain *your* whitelist *for* you. Also see Karsten M. Self's thoughts on the subject, most of which I very much agree with. (It is of course not not my desiscion whether or not Runbox should implement such a feature.) |
|
9 May 2004, 07:33 AM | #13 |
Essential Contributor
Join Date: Apr 2001
Posts: 315
|
spamasssasin is failing to identify a lot of SPAM for me. And I'm having problems with setting up filters. One of them has the "Html tags" thing and then when it follows that filter, someitmes runbox list emails end up in a "possibly spam" folder I have set up.
How does the -# filters work? do those get looked at first? |
9 May 2004, 01:17 PM | #14 |
Intergalactic Postmaster
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606
Representative of:
Runbox.com |
Hi Tore,
OK, you win. Maybe C-R isn't as good as it sounded. Rich |
9 May 2004, 01:33 PM | #15 | |
Intergalactic Postmaster
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606
Representative of:
Runbox.com |
Quote:
SpamAssassin has been missing more and more spams lately. But I can see why they are being missed. Many of them for me are very small text only messages with the spam messages buied in the middle of random words or full paragraphs of unrelated text. It's hard for a program to flag these as spam and they apparently haven't been reported to the traditional blacklists (at least SpamAssassin didn't show that in the scoring). Rich |
|