Dont fiddle with your Spam settings - Page 5

JeremyNicoll · 18 Jun 2023, 09:20 PM

Quote:

Originally Posted by xyzzy

That's one of the reasons I said "a lot"!

The other is that if I have to revisit my code in the future the comments help me remember what I was thinking when I originally wrote it.

Absolutely. I'm retired now and have noticed that I put more comments in my code than I did when I coded every day (and when anyone else reading it was also technical). Partly it's also because illness has affected my ability to concentrate so I can't rely on seeing what once seemed obvious - it needs to be pointed-out in the comments - and even if I'm writing code today, if I hit a bad patch it could be weeks before I next tackle the same code.

Quote:

Originally Posted by xyzzy

I think your 9999 idea is a good one. In fact 100.0 is sufficient since spam scores never exceed 99.9.

They're never MEANT to, certainly. But suppose for some reason the decimal point got omitted when the X-SPam-score header is generated... (Yes, I know, one can't code for every eventuality.)

Quote:

Originally Posted by xyzzy

I am not sure I want to exhaustively check all values. Clutters up the code. Some multiple cases insist on specific switch values but if there is a default I might not insist on the documented value.

I don't care what you do in code for your own use. It's when others use it without a clear understanding that the problems start. Long ago I maintained an open-source product that was distributed as source code, and some users who knew nothing about programming (or not enough) would change it and then complain that MY code did not work. I found I could waste a lot of time diagnosing those problems.

Quote:

Originally Posted by xyzzy

I agree with you about the error message header but Sieve doesn't leave a lot of degrees of freedom here. If I set the spam and discard thresholds to 100 that should let most messages through (albeit the ones that aren't spam score dependent - they are filtered normally).

Be it all bypassed or doing what I suggest still leaves the problem on how to error report. Short of adding a header, thinking out loud, I guess I could add a prefix to the subject just like the the spam score. That would show up on the subject line. Not sure that's very good either.

I think it's an omission from Sieve that there was apparently no way to tell a user that the Sieve script had a fatal error. I think the relevant rfc does mention some sort of (in)action if it doesn't compile or a "require" isn't met, but I've never seen any documentation that explains what might/should happen then, let alone what a specific implementation actually does then. Maybe the server admins get an error message?

Something Sieve could have done, perhaps optionally, is suspended processing of all incoming mails and generated and passed-on a single mail describing the problem.

If I was using addheader for debug messages etc I'd use a header name of something like "X-JN-Debug" and maybe more specific (eg X-JN-Debug-CodeSection) than that. If the user is viewing emails in a versatile client they might be able to configure it to show (and perhaps in colour) any "X-JN-" headers in a mail in the mail's viewer window (some allow a choice so it's not just Subject and From etc). Or one can colour certain mails in a folder-view window. But when one's using a webmail system that doesn't offer that level of control it's trickier to find a good solution.

qwertz123456 · 18 Jun 2023, 11:06 PM

@xyzzy Thanks so for clarifying and providing additional code and comments.

I have to admit I still struggle a bit to understand which sections have to be changed.

So I'' first post my current UI settings and for each piece of code I'll comment what and if I changed anything to basically mimic the spam settings/threshold I have already set in the UI. Maybe you or others can comment if that is correct or I need to set an additional number somewhere.

Here here are my current UI settings.

###

Here I changed SPAM_THRESHOLD to 4.2 and set the discard folder (or fileinto) to INBOX.Spam as I use this default FM folder as a "review potential spam mails" folder and is set to not do spam or non-spam learning of any kind of this folder.

Once I have reviewed these emails, I move spam emails to a folder called "Verified Spam" which is set a Spam learning in the FM settings. That way I can control FM that only verified spam email get trained as such. After a few days I delete these mails and FM has gone over these mails and scanned them for training their filters.

Code:

*/
set "SPAM_THRESHOLD"      “4.2”;            # SPAM_DISCARD > spam score >= SPAM_THRESHOLD
set "SPAM_FLAGS"          "";               # fileinto "\\Junk" flags
                                            # \\Seen ==> read, \\Flagged ==> pinned
                                            # "\\Seen" | "\\Flagged" | "\\Seen \\Flagged" | ""
                                            # this spam is always filed in \\Junk (same as UI)
                                            # DISABLE UI's PROTECTION LEVEL's "MOVE"

set "SPAM_DISCARD"        “5.0”;            # spam score >= SPAM_DISCARD
set "SPAM_DISCARD_FOLDER" "INBOX.Spam";  # folder for msgs in that range (null ==> discard)
set "SPAM_DISCARD_FLAGS"  "";               # fileinto SPAM_DISCARD_FOLDER flags
                                            # set SPAM_DISCARD to 100.0 to disable this test 
                                            # DISABLE PROTECTION LEVEL's "PERMANENTLY DELETE"

I didn't change anything in this part of the code.

Code:

*/
set "ADD_SCORE" "1";                        # 0 ==> do NOT prefix ANY spam subject lines
                                            # 1 ==> prefix spam that's NOT filed into Spam
                                            # 2 ==> prefix ALL SPAM with with "{SPAM xx.x} "

if string :matches "${SPAM_THRESHOLD}" "*.*" {
  set "ST1" "${1}";
  set "ST2" "${2}-9";
} else {            # should never happen if SPAM_THRESHOLD is properly formatted as "x.y"
  set "ST1" "0";    # set so all X-Spam-score's will always be greater than this
  set "ST2" "0-0";
  addheader "Debug" "????????? Miss-formatted SPAM_THRESHOLD value! ?????????";
}

if string :matches "${SPAM_DISCARD}" "*.*" {
  set "SD1" "${1}";
  set "SD2" "${2}-9";
} else {            # should never happen if SPAM_DISCARD is properly formatted as "x.y"
  set "SD1" "0";    # set so all X-Spam-score's will always be greater than this
  set "SD2" "0-0";
  addheader "Debug" "????????? Miss-formatted SPAM_DISCARD value! ?????????";
}

/* Example uses:

  anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${ST1}",
        header :regex "X-Spam-score" "^${ST1}\.[${ST2}]$") # X-Spam-score >= SPAM_THRESHOLD

  anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${SD1}",
        header :regex "X-Spam-score" "^${SD1}\.[${SD2}]$") # X-Spam-score >= SPAM_DISCARD

I didn't change anything in this part of the code.

Code:

if allof(not string :is "${ADD_SCORE}" "0",          # spam score prefix is to be added
         string :is "${known_sender}" "false",       # but only for unknown senders
         anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${ST1}",
               header :regex "X-Spam-Score" "^${ST1}\.[${ST2}]$"),
         header :matches "Subject" "*") {
  set "subject" "${1}";                              # X-Spam-score >= SPAM_THRESHOLD
  if header :matches "X-Spam-score" "*" {            # prefix Subject with "{SPAM xx.y} "
    set "spam_score" "${1}";
    if header :value "lt" :comparator "i;ascii-numeric" "X-Spam-score" "10" {
      set "spam_score" "0${1}";                      # add leading 0 for spam scores < 10
    }
    deleteheader "Subject";
    addheader "Subject" "{SPAM ${spam_score}} ${subject}";# prefix Subject with "{SPAM xx.x} "
  }
} # ADD_SCORE == 0 && spam to an unknown sender

Did I get everything correct?

xyzzy · 19 Jun 2023, 12:05 PM

Quote:

Originally Posted by qwertz123456

Here I changed SPAM_THRESHOLD to 4.2 and set the discard folder (or fileinto) to INBOX.Spam as I use this default FM folder as a "review potential spam mails" folder and is set to not do spam or non-spam learning of any kind of this folder.

FWIW, be aware that msgs in Spam are not considered as spam until deleted (that's documented by FM). I'm always confused if a move of a msg out of Spam (as opposed to clicking "not spam") marks the msg as spam like a delete.

Quote:

I didn't change anything in this part of the code.

I did!

After the discussion with JeremyNicoll I've now made the change I was thinking of in post #58 for the setting of ST1/ST2:

Code:

if string :regex "${SPAM_THRESHOLD}" "^([0-9]{1,2})\\.([0-9])$" {
  set "ST1" "${1}";     # 0.0 <= SPAM_THRESHOLD <= 99.9 (valid SpamAssassin values)
  set "ST2" "${2}-9";
} else {                # incorrect SPAM_THRESHOLD setting (not expected to ever happen)
  set "ST1" "100";      # set so ALL msgs will never be considered as spam 
  set "ST2" "0-0";      # note, at the moment there isn't a good way to report this error
  addheader "Debug" "????????? Miss-formatted SPAM_THRESHOLD value! ?????????";
}

if string :regex "${SPAM_DISCARD}" "^([0-9]{1,3})\\.([0-9])$" {
  set "SD1" "${1}";     # 0.0 <= SPAM_DISCARD <= 999.9 (>=100 means never discard)
  set "SD2" "${2}-9";
} else {                # incorrect SPAM_DISCARD setting (not expected to ever happen)
  set "SD1" "100";      # set so ALL msgs will never be considered for discarded
  set "SD2" "0-0";      # note, at the moment there isn't a good way to report this error
  addheader "Debug" "????????? Miss-formatted SPAM_DISCARD value! ?????????";
}

What's changed is shown in red but I didn't color the comments which also were changed.

Basically it now checks that the syntax and ranges are valid and recovery set to not consider a msg as spam or candidate for discard based on score.

It appears the rest is ok but your spam and discord values bother me a little. You are considering all msgs <4.2 as not spam, msgs 4.3 to 4.9 as spam to go into the Spam folder, and >=5 discard candidates but also going into the Spam folder. If both "spam" and "discards" are both going into the Spam folder then why not just set the discard value to 100 to disable it. Thus all msgs >=5.3 go into the Spam folder. Both methods work I guess but I'm just pointing out an alternative.

And as a reminder make sure the UI's Protection Level's "Move" and "Permanently delete" settings are turned off but Protection level must be set to Custom.

Update:
That last sentence reminds me of something I forgot. How do you use these values?

With UI's Protection Level's "Move" and "Permanently delete" settings off no code will be generated in Sieve section 3 (Sieve generated for spam protection) to do the spam value checks (backscatter though can still be specified in the UI). This is what we want because the replacement code to use "our" values is to be placed in the Sieve edit section immediately following Section 3.

Here's what the tests should look like:

Code:

if not exists ["X-Spam-hits", "X-Spam-score"] {      # Spam Protection must be enabled
  addheader "Debug" "#### SPAM PROTECTION IS OFF ####";
} elsif string :is "${known_sender}" "false" {       # known senders are NEVER considered spam
  set "mailbox" "";                                  # mailbox for spam - ƒ(kind of spam)
  setflag "flags" "";                                # to hold fileinto flags

  if anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${SD1}",
           header :regex "X-Spam-score" "^${SD1}\.[${SD2}]$") {
    if string :is "${SPAM_DISCARD_FOLDER}" "" {      # X-Spam-score >= SPAM_DISCARD
      discard;
      stop;
    }
    setflag "flags" "${SPAM_DISCARD_FLAGS}";
    set "mailbox" "${SPAM_DISCARD_FOLDER}";
  } elsif anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${ST1}",
                header :regex "X-Spam-score" "^${ST1}\.[${ST2}]$") {
    setflag "flags" "${SPAM_FLAGS}";
    set "mailbox" "\\Junk";
  }
    
  fileinto :flags "${flags}" "${mailbox}";
  stop;
}

This has another use of known_sender and another one of my "funky" error checks (X-Spam-hits and X-Spam-score are not generated if Protection level is off -- sorry JeremyNicoll

).

JeremyNicoll · 19 Jun 2023, 09:39 PM

Quote:

Originally Posted by xyzzy

It appears the rest is ok but your spam and discord values bother me a little. You are considering all msgs <4.2 as not spam, msgs 4.3 to 4.9 as spam to go into the Spam folder, and >=5 discard candidates

I nearly commented on that too, but from a different point of view.

As I said before I don't use (ie pay attention to) FM-assigned spam scores (*), but I'd have thought that treating such a narrow set of scores - 4.3 through 4.9 - as a single category is risky. You're relying on there being no changes in the logic of how Fastmail's backend systems score spams.

I don't know if that's safe?

* - I don't trust any automated spam-scoring system never to throw away one of my mails. I do (elsewhere) see lots of genuine emails (often coming from mail lists) flagged as possible spam. It's clear that none of the automated systems are perfect.

Instead, I use separate email addresses for everybody I communicate with (so there's hundreds of those) and hundreds of rules which generally check (eg for corporate senders) that an incoming mail for my company-specific address X came from the corresponding corporate server. Those likely-to-be genuine mails are routed to probably-safe folders. Mails that don't satisfy such tests go to a very-likely-dodgy folder. Mails that arrive that were sent to old email addresses that I know have been compromised go to a definitely-dodgy folder (I still sometimes look at them, to see how spam trends are changing).

I don't automatically trust any email even if it goes to a probably-safe folder.

I read nearly every incoming mail only in its plain text form. If I want to investigate its html form I c&p the html into a text editor and look at it there, not in a browser.

qwertz123456 · 20 Jun 2023, 12:12 AM

Thanks again for helping me along.

Quote:

Originally Posted by xyzzy

FWIW, be aware that msgs in Spam are not considered as spam until deleted (that's documented by FM). I'm always confused if a move of a msg out of Spam (as opposed to clicking "not spam") marks the msg as spam like a delete.

Thanks, that's good to know. A bit counter-intuitive from my view, but now I know

BTW, I do not have a "Junk" folder in my account. I believe I deleted that and I'm only using the Spam and Verified Spam folder. Hope that's not an issue for the code. I did change the one mention of it towards the end of your code.

Quote:

It appears the rest is ok but your spam and discord values bother me a little. You are considering all msgs <4.2 as not spam, msgs 4.3 to 4.9 as spam to go into the Spam folder, and >=5 discard candidates but also going into the Spam folder. If both "spam" and "discards" are both going into the Spam folder then why not just set the discard value to 100 to disable it. Thus all msgs >=5.3 go into the Spam folder. Both methods work I guess but I'm just pointing out an alternative.

Ok, valid point from you and @JeremyNicoll.
At some point I did change it to a lower value because I had more and more spam creap in that had a much lower spam score. That helped with the spam. But I did change the two values and spread them out a bit.

Now everything >=5 is considered spam and everything >=20 gets moved to the spam folder to be reviewed. I did change my UI settings like so.

So just to make sure my understanding is correct.

I currently have all my sieve script code in the 1. text box in the sieve script editor (whitelisted emails, whitelisted domains, blocked emails addresses, blocked domains, blocked header and subject texts etc.)

This complete code below (hope the order is correct - fingers crossed) incl. the changes you made in post #63 look like that.

And all of this code goes here, in the 2. editable text or 3. text block?

Code:

*/
set "SPAM_THRESHOLD"      “5.0”;            # SPAM_DISCARD > spam score >= SPAM_THRESHOLD
set "SPAM_FLAGS"          "";               # fileinto "\\Junk" flags
                                            # \\Seen ==> read, \\Flagged ==> pinned
                                            # "\\Seen" | "\\Flagged" | "\\Seen \\Flagged" | ""
                                            # this spam is always filed in \\Junk (same as UI)
                                            # DISABLE UI's PROTECTION LEVEL's "MOVE"

set "SPAM_DISCARD"        “20.0”;           # spam score >= SPAM_DISCARD
set "SPAM_DISCARD_FOLDER" "INBOX.Spam";     # folder for msgs in that range (null ==> discard)
set "SPAM_DISCARD_FLAGS"  "";               # fileinto SPAM_DISCARD_FOLDER flags
                                            # set SPAM_DISCARD to 100.0 to disable this test 
                                            # DISABLE PROTECTION LEVEL's "PERMANENTLY DELETE"

*/
set "ADD_SCORE" "1";                        # 0 ==> do NOT prefix ANY spam subject lines
                                            # 1 ==> prefix spam that's NOT filed into Spam
                                            # 2 ==> prefix ALL SPAM with with "{SPAM xx.x} "

if string :regex "${SPAM_THRESHOLD}" "^([0-9]{1,2})\\.([0-9])$" {
  set "ST1" "${1}";     # 0.0 <= SPAM_THRESHOLD <= 99.9 (valid SpamAssassin values)
  set "ST2" "${2}-9";
} else {                # incorrect SPAM_THRESHOLD setting (not expected to ever happen)
  set "ST1" "100";      # set so ALL msgs will never be considered as spam 
  set "ST2" "0-0";      # note, at the moment there isn't a good way to report this error
  addheader "Debug" "????????? Miss-formatted SPAM_THRESHOLD value! ?????????";
}

if string :regex "${SPAM_DISCARD}" "^([0-9]{1,3})\\.([0-9])$" {
  set "SD1" "${1}";     # 0.0 <= SPAM_DISCARD <= 999.9 (>=100 means never discard)
  set "SD2" "${2}-9";
} else {                # incorrect SPAM_DISCARD setting (not expected to ever happen)
  set "SD1" "100";      # set so ALL msgs will never be considered for discarded
  set "SD2" "0-0";      # note, at the moment there isn't a good way to report this error
  addheader "Debug" "????????? Miss-formatted SPAM_DISCARD value! ?????????";
}

/* Example uses:

  anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${ST1}",
        header :regex "X-Spam-score" "^${ST1}\.[${ST2}]$") # X-Spam-score >= SPAM_THRESHOLD

  anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${SD1}",
        header :regex "X-Spam-score" "^${SD1}\.[${SD2}]$") # X-Spam-score >= SPAM_DISCARD

if allof(not string :is "${ADD_SCORE}" "0",          # spam score prefix is to be added
         string :is "${known_sender}" "false",       # but only for unknown senders
         anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${ST1}",
               header :regex "X-Spam-Score" "^${ST1}\.[${ST2}]$"),
         header :matches "Subject" "*") {
  set "subject" "${1}";                              # X-Spam-score >= SPAM_THRESHOLD
  if header :matches "X-Spam-score" "*" {            # prefix Subject with "{SPAM xx.y} "
    set "spam_score" "${1}";
    if header :value "lt" :comparator "i;ascii-numeric" "X-Spam-score" "10" {
      set "spam_score" "0${1}";                      # add leading 0 for spam scores < 10
    }
    deleteheader "Subject";
    addheader "Subject" "{SPAM ${spam_score}} ${subject}";# prefix Subject with "{SPAM xx.x} "
  }
} # ADD_SCORE == 0 && spam to an unknown sender


if not exists ["X-Spam-hits", "X-Spam-score"] {      # Spam Protection must be enabled
  addheader "Debug" "#### SPAM PROTECTION IS OFF ####";
} elsif string :is "${known_sender}" "false" {       # known senders are NEVER considered spam
  set "mailbox" "";                                  # mailbox for spam - ƒ(kind of spam)
  setflag "flags" "";                                # to hold fileinto flags

  if anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${SD1}",
           header :regex "X-Spam-score" "^${SD1}\.[${SD2}]$") {
    if string :is "${SPAM_DISCARD_FOLDER}" "" {      # X-Spam-score >= SPAM_DISCARD
      discard;
      stop;
    }
    setflag "flags" "${SPAM_DISCARD_FLAGS}";
    set "mailbox" "${SPAM_DISCARD_FOLDER}";
  } elsif anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${ST1}",
                header :regex "X-Spam-score" "^${ST1}\.[${ST2}]$") {
    setflag "flags" "${SPAM_FLAGS}";
    set "mailbox" "\\Spam";
  }
    
  fileinto :flags "${flags}" "${mailbox}";
  stop;
}

BTW I did find 1-2 syntax errors in line 44 & 75 ish (missing curly brackets), at least according to the FM sievetester. I had to do a trial & error to figure out where they belong

)

xyzzy · 20 Jun 2023, 06:28 AM

Quote:

Originally Posted by qwertz123456

Thanks, that's good to know. A bit counter-intuitive from my view, but now I know

I think that gets everybody at first!

Quote:

BTW, I do not have a "Junk" folder in my account. I believe I deleted that and I'm only using the Spam and Verified Spam folder.

\\Junk is the spam folder! That's how you reference it your Sieve code (ignoring "specialuse" - let's not get into that).

Quote:

I currently have all my sieve script code in the 1. text box in the sieve script editor (whitelisted emails, whitelisted domains, blocked emails addresses, blocked domains, blocked header and subject texts etc.)

I said in post #63 that code that checks the spam score goes into the Sieve edit box immediately following Section 3 (Sieve generated for spam protection). That would be the second text edit box. The initialization code goes into the fist text edit box following the require statement at the beginning.

Quote:

BTW I did find 1-2 syntax errors in line 44 & 75 ish (missing curly brackets), at least according to the FM sievetester. I had to do a trial & error to figure out where they belong

)

Don't know. I thought the little snippet of code for testing the scores was complete with all it's braces matched up. I did however construct that from mine which has additional code there to do stuff not worth mentioning here.

FWIW and FYI, in my posts referencing Sieve code locations I always describe locations relative to what I refer to as the "Sieve section numbers". Those are the numbered titles you see in the FM Sieve generated code (the parts you cannot edit). For example "### 3. Sieve generated for spam protection". Those numbered section make good land marks. I think it's more accurate than referencing the "first text edit box", "second text edit box", etc. There used to be more text edit sections than there are now so it avoided confusion on having to count which text edit box is which. Referencing your line numbers is useless since they only have meaning to you own script.