SpamAssassin autolearn=ham?
What does it mean if the SpamAssassin header says "autolearn=ham"? I noticed the SpamAssassin message headers of a spam message that slipped through SpamAssassin said "autolearn=ham".
Regards, Rich |
Hi Rich,
Just googled it. Here is what I got from this page: http://spamassassin.rediris.es/doc/s...20for%20ham%20 DEFAULT TAGGING FOR HAM (NON-SPAM) MAILS X-Spam-Status: header A string, No, hits=nn required=nn tests=xxx,xxx autolearn=(ham|spam|no) is set in this header to reflect the filter status. Looks like Ham is the opposite of Spam. Does that answer your question? BKB. |
Thanks. I guess I'm more curious about why it was "ham" and not "no" like it usually is?
Rich |
I vaguely recall that Ham is the email you explicitly designate as being good, wanted mail. It is the opposite of spam, but more specifically it's the content you use to train a Bayesian filter to recognize the mail that you want.
I wonder if the SpamAssassin filter is saying that that particular message is not just "Not Spam" but even above and beyond that it closely matches a message which it's been explicitly told is a good message. Would that make sense? Was this a message that someone somewhere could have actually wanted? Or is it potentially a sign of someone trying to trick SA into thinking this was a good message? --Jason |
The message was SPAM that was not flagged as SPAM by SpamAssassin. What I don't understand is why "autolearn" was set to "ham". As far as I know Runbox isn't using the "autolearn" function yet. That's why it's should have said "autolearn=no".
Rich |
Jason, you were right about the autolearn. I found some details at the SpamAssassin Wiki:
Why isn't autolearning working for me? (aka: "autolearn=no") If it says "autolearn=no" then it means that SpamAssassin has not learned whether or not the message is spam or ham. If it says "autolearn=ham", then SpamAssassin has been trained to recognized the message as "ham". If it says "autolearn=spam", then SpamAssassing has been trained to recognize the message as "spam". I have a message that is 100% SPAM yet SpamAssassin has tagged it with "autolearn=ham" which means it's been trained the recognize the message as "ham". My question for the Runbox crew is when did you start training SpamAssassin, who's training it and how do you "retrain" it when it's been trained wrong? Regards, Rich |
Quote:
The database is system-wide, and kept in memory only (it will have to be re-built if the system is rebooted). Today there is no way for you to "correct" any mail learned wrong or explicitly learn any mail that had autolearn=no. Of course, this classifier is in no way as effective as a actively maintained per-user bayesian filter would be. However, it was deemed to be better than nothing when we started using SpamAssassin, even though incorrect learnings such as the one you experienced may occur sometimes. Look for the BAYES_* tests in the spam report to see the effect the classifier has on scoring your incoming mails. |
We are currently working to expand the spam filter with individual "intelligent" filtering (more specifically CRM-114), allowing a user to tell the filter when it has classified a message erroneously - thus teaching it to catch spam more accurately.
We're hoping to launch this within a couple of weeks. - Geir |
Hi Tore,
Quote:
Quote:
If a message has been learned incorrectly, what do I need to do to fix it? Quote:
X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on gaspode.runbox.com X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham version=2.60 X-Spam-Level: The message is for Vicodin (line reads "0,rder V~ic0-din 0nline Anytime"). It's all text (no HTML). About 50% of it is unrelated (part random words, part a joke about a freshman being tossed into a frog pond). I guess the spammers are figuring out how to get passed the Bayesian filters. Rich |
Hi Geir,
Quote:
The ones that are valid that I don't know usually have other information in the headers I can key off of with filters. I also like the "challenge message" ideas other services use. Rich |
Quote:
Of course - *I* could take the message in question and feed it to SA-learn on all the machines, but once that is neccessary to make the bayesian classifier work it's better to just disable the whole thing. Quote:
|
Quote:
C-R system places the burden on the alleged sender. But as there is absolutely *no* way today to verify the sender of an e-mail's authenticity, a C-R system will direct its challenges to unrelated third parties - thus making them unsolicited, or in other words: spam. It's a very very asocial, arrogant, and selfish way to deal with one's spam problems, IMNSHO. That said - C-R *is* very effective to prevent spam. However, it'll also prevent legit e-mail, as I, and many others, refuse to reply to such challenges by principle alone. If you want to keep a white-list of senders, fine, but don't expect me (and every other e-mail using individual on the planet) to maintain *your* whitelist *for* you. Also see Karsten M. Self's thoughts on the subject, most of which I very much agree with. (It is of course not not my desiscion whether or not Runbox should implement such a feature.) |
spamasssasin is failing to identify a lot of SPAM for me. And I'm having problems with setting up filters. One of them has the "Html tags" thing and then when it follows that filter, someitmes runbox list emails end up in a "possibly spam" folder I have set up.
How does the -# filters work? do those get looked at first? |
Hi Tore,
OK, you win. Maybe C-R isn't as good as it sounded. Rich |
Quote:
SpamAssassin has been missing more and more spams lately. But I can see why they are being missed. Many of them for me are very small text only messages with the spam messages buied in the middle of random words or full paragraphs of unrelated text. It's hard for a program to flag these as spam and they apparently haven't been reported to the traditional blacklists (at least SpamAssassin didn't show that in the scoring). Rich |
All times are GMT +9. The time now is 06:49 AM. |
Copyright EmailDiscussions.com 1998-2022. All Rights Reserved. Privacy Policy