EmailDiscussions.com  

Go Back   EmailDiscussions.com > Email Service Provider-specific Forums > Runbox Forum
Register FAQ Members List Calendar Today's Posts
Stay in touch wirelessly

Runbox Forum Everything related to Runbox should go here: suggestions, comments, complaints, questions, technical issues, etc.

Reply
 
Thread Tools
Old 4 May 2004, 06:37 AM   #1
carverrn
Intergalactic Postmaster
 
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606

Representative of:
Runbox.com
SpamAssassin autolearn=ham?

What does it mean if the SpamAssassin header says "autolearn=ham"? I noticed the SpamAssassin message headers of a spam message that slipped through SpamAssassin said "autolearn=ham".

Regards,
Rich
carverrn is offline   Reply With Quote

Old 4 May 2004, 07:10 AM   #2
BKB
Cornerstone of the Community
 
Join Date: Jun 2003
Location: New York, NY
Posts: 900
Hi Rich,
Just googled it. Here is what I got from this page:

http://spamassassin.rediris.es/doc/s...20for%20ham%20

DEFAULT TAGGING FOR HAM (NON-SPAM) MAILS
X-Spam-Status: header
A string, No, hits=nn required=nn tests=xxx,xxx autolearn=(ham|spam|no) is set in this header to reflect the filter status.

Looks like Ham is the opposite of Spam. Does that answer your question?

BKB.
BKB is offline   Reply With Quote
Old 4 May 2004, 08:41 AM   #3
carverrn
Intergalactic Postmaster
 
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606

Representative of:
Runbox.com
Thanks. I guess I'm more curious about why it was "ham" and not "no" like it usually is?

Rich
carverrn is offline   Reply With Quote
Old 4 May 2004, 01:04 PM   #4
jbs
Essential Contributor
 
Join Date: Oct 2003
Posts: 455
I vaguely recall that Ham is the email you explicitly designate as being good, wanted mail. It is the opposite of spam, but more specifically it's the content you use to train a Bayesian filter to recognize the mail that you want.

I wonder if the SpamAssassin filter is saying that that particular message is not just "Not Spam" but even above and beyond that it closely matches a message which it's been explicitly told is a good message.

Would that make sense? Was this a message that someone somewhere could have actually wanted? Or is it potentially a sign of someone trying to trick SA into thinking this was a good message?

--Jason
jbs is offline   Reply With Quote
Old 4 May 2004, 03:41 PM   #5
carverrn
Intergalactic Postmaster
 
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606

Representative of:
Runbox.com
The message was SPAM that was not flagged as SPAM by SpamAssassin. What I don't understand is why "autolearn" was set to "ham". As far as I know Runbox isn't using the "autolearn" function yet. That's why it's should have said "autolearn=no".

Rich
carverrn is offline   Reply With Quote
Old 5 May 2004, 09:48 PM   #6
carverrn
Intergalactic Postmaster
 
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606

Representative of:
Runbox.com
Jason, you were right about the autolearn. I found some details at the SpamAssassin Wiki:

Why isn't autolearning working for me? (aka: "autolearn=no")

If it says "autolearn=no" then it means that SpamAssassin has not learned whether or not the message is spam or ham. If it says "autolearn=ham", then SpamAssassin has been trained to recognized the message as "ham". If it says "autolearn=spam", then SpamAssassing has been trained to recognize the message as "spam".

I have a message that is 100% SPAM yet SpamAssassin has tagged it with "autolearn=ham" which means it's been trained the recognize the message as "ham".

My question for the Runbox crew is when did you start training SpamAssassin, who's training it and how do you "retrain" it when it's been trained wrong?

Regards,
Rich
carverrn is offline   Reply With Quote
Old 7 May 2004, 06:30 PM   #7
tore
Junior Member
 
Join Date: Jan 2004
Posts: 22
Quote:
Originally posted by carverrn

I have a message that is 100% SPAM yet SpamAssassin has tagged it with "autolearn=ham" which means it's been trained the recognize the message as "ham".

My question for the Runbox crew is when did you start training SpamAssassin, who's training it and how do you "retrain" it when it's been trained wrong?
SpamAssassin is training SpamAssassin. If the mail gets a very low score, it is learned as ham, and if it gets a very high score it is learned as spam. Else, the message is not learned at all. The score inflicted by the build-in bayesian classifier is ignored when deciding whether or not the mail should be learned.

The database is system-wide, and kept in memory only (it will have to be re-built if the system is rebooted). Today there is no way for you to "correct" any mail learned wrong or explicitly learn any mail that had autolearn=no.

Of course, this classifier is in no way as effective as a actively maintained per-user bayesian filter would be. However, it was deemed to be better than nothing when we started using SpamAssassin, even though incorrect learnings such as the one you experienced may occur sometimes. Look for the BAYES_* tests in the spam report to see the effect the classifier has on scoring your incoming mails.
tore is offline   Reply With Quote
Old 7 May 2004, 09:52 PM   #8
Geir
The "e" in e-mail
 
Join Date: Sep 2001
Location: Oslo, Norway
Posts: 2,938

Representative of:
Runbox.com
We are currently working to expand the spam filter with individual "intelligent" filtering (more specifically CRM-114), allowing a user to tell the filter when it has classified a message erroneously - thus teaching it to catch spam more accurately.

We're hoping to launch this within a couple of weeks.

- Geir
Geir is offline   Reply With Quote
Old 7 May 2004, 10:46 PM   #9
carverrn
Intergalactic Postmaster
 
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606

Representative of:
Runbox.com
Hi Tore,

Quote:
SpamAssassin is training SpamAssassin. If the mail gets a very low score, it is learned as ham, and if it gets a very high score it is learned as spam. Else, the message is not learned at all.
I kind of figured it was on auto-pilot.

Quote:
Today there is no way for you to "correct" any mail learned wrong or explicitly learn any mail that had autolearn=no.
What about the sa-learn tool mentioned in this SpamAssassin Wiki:

If a message has been learned incorrectly, what do I need to do to fix it?

Quote:
Look for the BAYES_* tests in the spam report to see the effect the classifier has on scoring your incoming mails.
Here's the SA information from the message:
X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on
gaspode.runbox.com
X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham
version=2.60
X-Spam-Level:

The message is for Vicodin (line reads "0,rder V~ic0-din 0nline Anytime"). It's all text (no HTML). About 50% of it is unrelated (part random words, part a joke about a freshman being tossed into a frog pond).

I guess the spammers are figuring out how to get passed the Bayesian filters.

Rich
carverrn is offline   Reply With Quote
Old 7 May 2004, 11:04 PM   #10
carverrn
Intergalactic Postmaster
 
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606

Representative of:
Runbox.com
Hi Geir,

Quote:
Originally posted by Geir
We are currently working to expand the spam filter with individual "intelligent" filtering (more specifically CRM-114), allowing a user to tell the filter when it has classified a message erroneously - thus teaching it to catch spam more accurately.
I don't know about others but an improved Whitelist would probably work better for me. I would say that 99% of my valid mail comes from people/addresses I know.

The ones that are valid that I don't know usually have other information in the headers I can key off of with filters.

I also like the "challenge message" ideas other services use.

Rich
carverrn is offline   Reply With Quote
Old 9 May 2004, 02:13 AM   #11
tore
Junior Member
 
Join Date: Jan 2004
Posts: 22
Quote:
Originally posted by carverrn

What about the sa-learn tool mentioned in this SpamAssassin Wiki:
We're well aware of that too. However, it's not that easy. First off, the web interface doesn't have any functionality to re-learn these messages today. Secondly, the "auto-pilot" bayesian databases aren't shared between the MXes - there would need to be a system to distribute the sa-learn invocation on all the MXes, too.

Of course - *I* could take the message in question and feed it to SA-learn on all the machines, but once that is neccessary to make the bayesian classifier work it's better to just disable the whole thing.

Quote:

I guess the spammers are figuring out how to get passed the Bayesian filters.
Yes. That's why the auto-pilot classifier isn't that efficient any longer and might well be disabled in the near future. The per-user bayesian databases you can correct in case of error should be much more efficient than the automatic one (IFF you do take care to correct it whenever it's made a mistake). That feature is on its way, I think.
tore is offline   Reply With Quote
Old 9 May 2004, 02:36 AM   #12
tore
Junior Member
 
Join Date: Jan 2004
Posts: 22
Quote:
Originally posted by carverrn


I also like the "challenge message" ideas other services use.
For what it's worth, I detest these. They place the burden of maintaining your white list on everyone *but* you. Have you ever received a stupid, unsolicited junk bounce from some mail system that are "kindly" informing you that you sent some user a virus, even if you're 110% percent certain you've done no such thing? Well - if C-R systems will become exactly as common as moronic virus filters, expect that number of junk mail to double. Same goes for spam sent from forged addresses.

C-R system places the burden on the alleged sender. But as there is absolutely *no* way today to verify the sender of an e-mail's authenticity, a C-R system will direct its challenges to unrelated third parties - thus making them unsolicited, or in other words: spam. It's a very very asocial, arrogant, and selfish way to deal with one's spam problems, IMNSHO.

That said - C-R *is* very effective to prevent spam. However, it'll also prevent legit e-mail, as I, and many others, refuse to reply to such challenges by principle alone. If you want to keep a white-list of senders, fine, but don't expect me (and every other e-mail using individual on the planet) to maintain *your* whitelist *for* you.

Also see Karsten M. Self's thoughts on the subject, most of which I very much agree with.

(It is of course not not my desiscion whether or not Runbox should implement such a feature.)
tore is offline   Reply With Quote
Old 9 May 2004, 07:33 AM   #13
jedilizagain
Essential Contributor
 
Join Date: Apr 2001
Posts: 315
spamasssasin is failing to identify a lot of SPAM for me. And I'm having problems with setting up filters. One of them has the "Html tags" thing and then when it follows that filter, someitmes runbox list emails end up in a "possibly spam" folder I have set up.

How does the -# filters work? do those get looked at first?
jedilizagain is offline   Reply With Quote
Old 9 May 2004, 01:17 PM   #14
carverrn
Intergalactic Postmaster
 
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606

Representative of:
Runbox.com
Hi Tore,

OK, you win. Maybe C-R isn't as good as it sounded.

Rich
carverrn is offline   Reply With Quote
Old 9 May 2004, 01:33 PM   #15
carverrn
Intergalactic Postmaster
 
Join Date: Jan 2002
Location: Chicago, IL
Posts: 5,606

Representative of:
Runbox.com
Quote:
Originally posted by jedilizagain
spamasssasin is failing to identify a lot of SPAM for me. And I'm having problems with setting up filters. One of them has the "Html tags" thing and then when it follows that filter, someitmes runbox list emails end up in a "possibly spam" folder I have set up.

How does the -# filters work? do those get looked at first?
Runbox filter order values go from -99 to 999. With -99 as the highest/first filters and 999 as the lowest/last filters. If you have multiple filters of the same order value they are processed in the order the entries were defined. Basically, just look at your filter list and go from the top to the bottom.

SpamAssassin has been missing more and more spams lately. But I can see why they are being missed. Many of them for me are very small text only messages with the spam messages buied in the middle of random words or full paragraphs of unrelated text. It's hard for a program to flag these as spam and they apparently haven't been reported to the traditional blacklists (at least SpamAssassin didn't show that in the scoring).

Rich
carverrn is offline   Reply With Quote
Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +9. The time now is 11:11 AM.

 

Copyright EmailDiscussions.com 1998-2022. All Rights Reserved. Privacy Policy