|
FastMail Forum All posts relating to FastMail.FM should go here: suggestions, comments, requests for help, complaints, technical issues etc. |
|
Thread Tools |
25 Jan 2022, 12:12 PM | #1 |
Member
Join Date: Jul 2003
Posts: 55
|
How to mark email as "Ham" (to train Fastmail spam filter)?
Summary
In Fastmail: 1. How to mark email as "Ham"? (Seems less trustworthy without this.) 2. Can I apply large collections of (past) known-Ham and known-Spam emails to be "marked as Ham" and "marked as Spam" to best train Fastmail's spam filters? Details In Fastmail I can train its spam engine by moving email to the toplevel Spam IMAP folder. How can I train Fastmail to recognize email as "NOT spam" aka "Ham"? Tuffmail.net had this feature, simply by copying any email I wanted to an Auto-Train/Ham IMAP folder (similar but the opposite purpose of Tuffmail's Auto-Train/Spam folder). Does Fastmail provide some similar functionality? Without it, I have much less trust of Fastmail's spam filter to do the right thing. ie, ensure "Ham" is "not Spam filtered," which to me is just as much or maybe more inportant than catching-and-filtering Spam. Further: I still retain all my emails from Tuffmail (for ~15 yrs) that I manually marked as both Spam and Ham. Thousands of emails. I'd love to apply those to my Fastmail account engine. Is that feasible? What are my options (for procedural execution)? Are some procedures potentially more-efficient than others? |
25 Jan 2022, 01:02 PM | #2 |
The "e" in e-mail
Join Date: Jul 2002
Location: VK4
Posts: 3,029
|
Yes you can import your tuffmail emails but would be limited to so many per day....also surly that would put you over your storage limit.
15 years of emails that for me would be a nightmare. |
25 Jan 2022, 01:44 PM | #3 |
Member
Join Date: Jul 2003
Posts: 55
|
I have already imported all of my past emails. It’s not an issue and it was easy to do.
|
25 Jan 2022, 06:46 PM | #4 |
The "e" in e-mail
Join Date: May 2003
Location: mostly in Thailand
Posts: 3,095
|
The way you quickly train the spam filter (and get your own personal Bayes database) with Fastmail is to set up two folders, specifically designed to facilitate identification of spam and ham. Once setup correctly, those folders are scanned daily for new messages. Once 200 spam and 200 ham messages have been used to train your Bayes database, that is used in place of the global Bayes. For details, read https://www.fastmail.help/hc/en-us/a...pam-protection
Note that it is most effective if you can use messages that you know were mischaracterised in the past. |
26 Jan 2022, 03:15 PM | #5 | |
Member
Join Date: Jul 2003
Posts: 55
|
This is just the sort of thing I was seeking. I've configured my Fastmail account to take advantage of this. Thank you BritTim for the reference.
Question: does Fastmail offer any reporting output (like a "report log" emailed or stored somewhere) after it does its "daily batch process" for all the notSpam/Spam folder checks? Quote:
|
|
26 Jan 2022, 03:52 PM | #6 | |
Member
Join Date: Jul 2003
Posts: 55
|
Quote:
Settings->Filters&Rules->Spam_protection->Advanced_settings ...has a "Personal spam filter" section (currently) at the bottom of the page, with the following counters:
Helpful. Does not offer any granular-reporting-detail output, but it's much (much) better than nothing. Helps me to see if Fastmail is actually doing any personal-spam/ham processing. (I do not yet see this mentioned in Fastmail.help's docs, would be good if it could mentioned there in the future.) Last edited by hydrostarr : 26 Jan 2022 at 04:00 PM. |
|
26 Jan 2022, 03:54 PM | #7 | ||
The "e" in e-mail
Join Date: May 2003
Location: mostly in Thailand
Posts: 3,095
|
Quote:
If you really want to investigate how any specific email was handled on receipt, in terms of spam filtering, several of the message headers (available with Actions->Show raw message) can help. As an example, these might include: Quote:
|
||
26 Jan 2022, 03:58 PM | #8 | ||
Member
Join Date: Jul 2003
Posts: 55
|
Quote:
Quote:
Thanks BritTim for all your help and feedback, super appreciated. |
||
30 Jan 2022, 06:50 AM | #9 |
Member
Join Date: Jul 2003
Posts: 55
|
Spam/non-Spam processing is WAY too slow for 120k+ emails
Update: the Spam/non-Spam processing going WAY too slow.
After 4 days the spam counters show 6k emails have been processed. That's a ~1.5k/per_day rate. And the counters suggest the daily rate may be _slowing down_. I currently have 120k marked-as-spam-and-not-spam emails in queue to process... and this will most-likely grow every day (possibly dramatically) as I mass-add emails to my "Ham / non-Spam" folder. This will take months at the current rate. I have created a ticket with Fastmail on this. I doubt they'll have good answers, once they finally send me a meaningful reply. (I'm not yet having great Fastmail-tech-support experiences, fwiw.) My current idea: run a bayesian-database-generating gizmo on one of my own machines/servers and given them the data and they insert said spam-bayes data into whatever mechanism they have. And I'll work to generate compatible data for the "import." However... I'll be surprised if I'll be able to get them to do this. Fwiw, I have a Fastmail "Professional" membership (the biggest/baddest/most-expensive one they have). I keep asking them if there's ways I can give them more money for more service, features, performance on every tech-support topic I ask about. They have yet to take me up on the offer. Eventually I may come back to EmailDiscussions.com to see who might give me direct access to a smart/capable/authoritatively-enabled Fastmail development/tech-support/operations manager. Until then, I'll work the process a little more to see where I can get. Last edited by hydrostarr : 30 Jan 2022 at 07:39 AM. |
30 Jan 2022, 07:13 AM | #10 | |
Member
Join Date: Jul 2003
Posts: 55
|
Building our own, temporary SpamAssassin server to generate the bayesian data?
fwiw: my team already has an automated-deployment rig setup for email servers (Ubuntu + Postfix + Dovecot) we routinely run for rebuilds of email servers for our teams internally (with NO external-SMTP/MX gateway in or out... which means it's an email server that serves internal-to-VPN-only connections... so we do not have to deal with security-attacks, spam, and the like).
Given this, we figure it's not terribly hard to add Apache SpamAssassin to a test server, run my above emails through it (for a) spam and b) not-spam "programming"), export the resulting database info somehow into a Fastmail-compatible "data package"... and try to get Fastmail to "import" this into their stuff (the last part I anticiapte being the most-difficult osbtacle). (We run lots of server apps that are new to us as development+test systems. It's part of our regular work projects. So we're not deterred, daunted, or "scared" by prospect of this. Especially for a one-off effort that we have no intention of running in production, and only for this one particular task.) QUESTION: Other than getting Fastmail to cooperate: does anyone see a problem with this line of thinking? And/or can anyone offer a better way/path to solve this overall issue (whether or not we generate our own bayesian spam/nonspam data)? Quote:
Last edited by hydrostarr : 30 Jan 2022 at 07:20 AM. |
|
30 Jan 2022, 11:13 AM | #11 | |
The "e" in e-mail
Join Date: May 2003
Location: mostly in Thailand
Posts: 3,095
|
Quote:
I hope Bill comes by and adds his own thoughts. He has tuned his own account so he can safely discard virtually all spam (no false positives) while allowing almost no spam to reach the Inbox. I agree that it is important to spend some time to get this right, but careful selection is more important than throwing massive amounts of data at the problem. |
|
30 Jan 2022, 12:05 PM | #12 | |||
Member
Join Date: Jul 2003
Posts: 55
|
Effort of "selecting emails" might be much worse than a temp SpamAssassin server
Thanks BritTim for your continued excellent feedback!
First a disclaimer about my long note below: I'm getting all wordy and lengthy here for one main reason: I asked Fastmail tech support to read and stay updated on this EMD thread. I have confidence that BritTim and others at EMD already get where I'm coming from without having to detail everything. I do not yet have that confidence with Fastmail tech support or their systems. Second, fyi: I switched my team's email domains from Tuffmail.net to Fastmail.com in Dec 2021. Comments on BritTim's excellent points: Quote:
What I do have is a *massive* number of emails (over ~14 years or so) that were categorized (many of them to undo the false positive/negative) over the years by me when Tuffmail.net hosted my email domains and service. Further: I have little desire to take the time to figure out which email sets (from these 120k+ Tuffmail-spam-and-nonspam-trained emails) represent a better selection ("well-selected emails"), if that's what you mean. That's a huge effort (to selected 1k ham and 1k spam emails from 120k+ total ham-and-spam emails), or so it seems (maybe I'm missing something? pls advise if I am). It seems much easier for me to whip up my own temporary SpamAssassin server, process the existingly-categorized emails, and hand the resulting database over to Fastmail (if Fastmail is willing to do this). Further, I can rerun this paradigm whenever I get a large, new influx of new email characterizations (mostly to mark large sets of existing email folders from my Tuffmail days as "ham"/not-spam)... again if Fastmail is willing to play ball, or simply speed up their spam-processing a bit. Quote:
I'm not sure if it was Bayes related or something else, but there's been a potentially-big problem I've had with existing spam classifications (on Fastmail) and/or email-delivery delays... or something. The fact that it seems ambiguous (Fastmail tech thinks they have it under control; I do _not_ think that). This problem also happened almost immediately when I started testing my Fastmail-served domains (the first few emails I tested broke things and it's still not been "fixed"--it's been a baaaaaad experience). More on this later if the problems/symptoms remain relevant. There may be good explanations for this... or not. It depends. I've not yet decided. It's a deeper topic, not enough time for me to properly introduce and detail right now. (I have interacted with "level 2 tech support" at Fastmail on this. I'm not yet satisfied. They're doing their best to assist me, I'm sure.) The point: this Fastmail experience of mine has put a big, fat question mark in my mind on the trustworthiness of the Fastmail mx/spam/whatever-is-going-on filters. And since 2nd-level Fastmail tech support failed to tell me -precisely- what was going on, I do not trust their explanation. Their answer seemed flippant and possibly embedded with a tone suggesting I was an inexperienced user. And while I'm confident it was not their intent, I felt like they blew me off (subjective assessment, granted); this came after I waited over a week to get a response from their "senior tech." I truly appreciate that they are working to do their job the best that they can. Each tech handles hundreds to possibly thousands of these inquiries a week; they do not want to have to linger or spend any extra time on any point more than what's needed. Instead, what I ask is that some manager at Fastmail recognizes that I'm a special user, and they need to get me on the phone with their smartest tech-operations/developer person they got. Please enable me to blast past all the bureaucracy and red tape. This will solve this issue with max efficiency and minimal fuss. I'm happy to pay whatever extra fees this incurs, within reason. (I've already maxed out the user account to 3 years of "Professional.") I've already offered these "extra payments." Granted, I do recognize I'm a VERY hard-to-please customer with respect to these issues. I'm not Fastmail's average user. But I'm picky for what I think is a darn good reason: I want my email-communication systems to WORK and be reliable, else business and projects can fail. And I do not like to have to consistently revisit the question of "can I trust my email service provider to not throw my good email away." I want to kill the problem dead, once, and be done with it. In my teams' computing worlds: there's no such thing as "mostly working." In high-level practical terms, it works or it fails. Digital-computing systems can be treated this way if you design, test, and implement them correctly. I say this with confidence given decades of experience with all manner of implementations, whether or not the core technology was designed by my team or others. And we've designed some of the-most-complex-and-impactful technology ever built. Please do not "hand wave" over important points and details when trying to gain my trust with computing systems that you provide that effectively might be "eating my data" without my knowing it. (Again, I'm talking to you, Fastmail.) </rant #1> <rant #2 = comparing Tuffmail vs Fastmail filtering configurability... granted, not a fair comparison> With Tuffmail.net: I had confidence that I knew exactly what was happening when John and Derek were running Tuffmail. eg, I knew _exactly_ which mx filter was running for every domain/email.address, because the Tuffmail management interface allowed me to program that entire configuration. I could look at the daily report log for _every single mx filter action_ and easily spot problems. I also managed our own Sieve inbound scripts--it wasn't hard. (Fastmail's Sieve stuff seems harder; there's a more-complex existing configuration where it's less clear to me where I should input my Sieve programming, or not. Or maybe it's just "new" and I don't want to have to take the effort to figure out Fastmail's Sieve base config. ;-) ). Tuffmail also allowed more-granular level control of the Bayesian spam filters (separate from the mx-level filters). Sieve, spam-filter, mx configuration and logging, several other config options: all of this gave me tremendous confidence in Tuffmail's system behavior. I do not yet have that confidence with Fastmail. The only filter-configuration control I seem to have is the "selected folder marked as spam or ham/non-spam" stuff on top of the "zero/small/medium/large"-ish spam-control radio buttons. Add this to the big, unexplained, "ghost" of a problem mentioned in my rant #1 (above).... and... I'm hammering on this spam config--since it seems to be the only thing I can control with respect to filtering at Fastmail--to at least get it to the point where I'm more comfortable with it's Bayesian spam-filtering and thus trying to trust my email service once again (now that I've switched from Tuffmail to Fastmail in December). </rant #2> In short, the bottom line: I'm not yet trusting the global Bayes data running at Fastmail. Quote:
Last edited by hydrostarr : 30 Jan 2022 at 01:39 PM. |
|||
30 Jan 2022, 10:02 PM | #13 | ||
Essential Contributor
Join Date: Dec 2017
Location: Scotland
Posts: 490
|
Quote:
I see quite a lot of mail-list mails where previous mails in a thread have been flagged as spam and thus eg Subject-line tags remain in all following replies, and for these it's always clear that someone-else's system missclassified an earlier mail. Quote:
But their generator is pretty good ... provided that when setting conditions up, you click the "switch to no-preview rules" option. I find it irritating that one can't insert a new rule where one wants; instead one lets their system add it then you have to find it and drag it to where you want it. And although I asked for this ages ago, they still don't allow one to clone an existing rule, which (especially if it got defined initially right next to the one it's based on) would save a colossal amount of time when one wants to set up several very similar rules. Maybe creative use of rule export/import would help for this, but I see that they export as JSON so I'd need to be certain I could manipulate those files correctly. I have a lot of rules, set up in groups of related types of conditions, and between those groups I define dummy rules as a way of inserting comments in the list, eg before the rules for incoming mails from mail-lists hosted by groups.io, I have a rule if mailing list id is exactly "-------------------------------------------------------- LISTS (groups.io - others)" then move mail to Inbox (which of course is extremely unlikely every to happen), but the point is that the literal "-------------------------------------------------------- LISTS (groups.io - others)" stands out visually in the list of rules as one scrolls through it. FM did also add a feature which I requested a while ago, which is nicknames for rules. IIRC I did this because especially with regex ones it's impossible to glance at them and know what they do, and with rules which have lots of conditions in them their scrolling list of rules only shows the lefthand part of the whole rule. The "nickname" can be a whole line of text and so it too can contain a sensible comment. One could make one's "nicknames" have structured/meaningful contents. For example one of my "nickname"s is ID posts with no URLs in plain text part, ie 'added' etc preceded by someone's name (not end-span tag). and that whole line of explanation is displayed in the scrolling list of rules. |
||
31 Jan 2022, 02:47 AM | #14 |
Member
Join Date: Jul 2003
Posts: 55
|
@JeremyNicoll - I find your above comments incredibly useful. I'm digesting them carefully. Thank you for taking the time to write up your thoughts.
|
31 Jan 2022, 05:50 AM | #15 | |
The "e" in e-mail
Join Date: Oct 2002
Location: Holon, Israel.
Posts: 4,856
|
Quote:
I have a huge spam collection, some of it dating almost 20 years ago, but I don't think it's worth anything as training material for a spam filter. My spam collection is as useful as my wife's stamp collection. Perhaps in the future some of it would be worth something... I have some rare spam from exotic senders... |
|