EmailDiscussions.com - View Single Post

xyzzy · 18 Jun 2023, 05:16 PM

Quote:

Originally Posted by JeremyNicoll

This is where I think xyzzy may have made an error. He(?) describes code that he didn't show that chops up a threshold value, described as "x.yy" (but shown as an example as 6.2) into two parts. He puts the "x" part into a variable named ST1 and the "yy" part (modified a bit) into ST2.

ST2 is described as then containing "yy-9". This is fine if "yy" was just a single digit, so eg chopping "4.7" up would produce "4" in ST1 and "7-9" in ST2. But if the bit after the dot had 2 or more digits, eg if the threshold value was "4.75", then ST2 would end up as "75-9" ... and I'm not sure what that would do (because it's syntactically invalid in the regex it's later used in, I think).

My post above summarized the way I handle this stuff. With all the recent posts I think I probably overlooked some of the specific questions. Let me address the JeremyNicoll thread where he though I had an error.

Looking at "normal" code that Sieve checks X-Spam-score the spam scores are always only in the form of xx.y or x.y, i.e., only one decimal place. There cannot be a 2-digit fractional part. Values range from 0.0 to 99.9 (Spam Protection's UI settings enforce this too). Admittedly I am a little lax in the internal error checks. Thinking about this a bit I probably should change the test that extracts the values from the score to a more rigorous regex as follows:

Code:

 if string :regex "${SPAM_THRESHOLD}" "^([0-9]{1,2})\\.([0-9])$"

This will only match on x.y and xx.y where all the x's and y's are digits.

SPAM_DISCARD is similar but allow 3 digit integers (e.g., 100.0) to disable discards since spam scores are always <100.0.