Our Anti-spam is Ready! - Accuspam

General Network security, firewalls, port filtering/forwarding, wireless security, anti-spyware, as well as spam control and privacy discussions.
User avatar
JawZ
Posts: 21941
Joined: Fri Feb 23, 2001 12:00 am

Post by JawZ »

Yeah man...been plugging your site when I can to those who need it. If you haven't already, I would introduce yourself and your very fine product to the people at http://www.eff.org

Even though I'm not using your product at this time, I do feel it is important for us all to work together to fight spam. ;)
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

UOD wrote:...I would introduce yourself and your very fine product to the people at http://www.eff.org...
Thanks but I personally do not have time to contact the EFF.

Apparently AccuSpam meets the EFF's desired spam tactics 100% and better than any other existing method. Perhaps you can contact them on our behalf and let them know that Bayesian and Spam Assassin, which they are prominently linking to (http://www.eff.org/Spam_cybersquatting_abuse/Spam/) does not meet their own criteria, but AccuSpam does:

http://www.eff.org/Spam_cybersquatting_ ... _email.php

"...any measure for stopping spam must ensure that all non-spam messages reach their intended recipients. Proposed solutions that do not fulfill these minimal goals are themselves a form of Internet abuse..."

"...we would like to see the development of better filtration software on servers, something that could work interactively with the mail recipient in defining what he or she regards as spam using pattern recognition. That is, every time somebody gets a message of a sort he or she does not want, s/he could send it to the filter, thereby making that filter smarter over time, as well as giving it the ability to "learn" as spam techniques develop..."
User avatar
JawZ
Posts: 21941
Joined: Fri Feb 23, 2001 12:00 am

Post by JawZ »

accuspam wrote:Thanks but I personally do not have time to contact the EFF.

Apparently AccuSpam meets the EFF's desired spam tactics 100% and better than any other existing method. Perhaps you can contact them on our behalf and let them know that Bayesian and Spam Assassin, which they are prominently linking to (http://www.eff.org/Spam_cybersquatting_abuse/Spam/) does not meet their own criteria, but AccuSpam does:

http://www.eff.org/Spam_cybersquatting_ ... _email.php

"...any measure for stopping spam must ensure that all non-spam messages reach their intended recipients. Proposed solutions that do not fulfill these minimal goals are themselves a form of Internet abuse..."

"...we would like to see the development of better filtration software on servers, something that could work interactively with the mail recipient in defining what he or she regards as spam using pattern recognition. That is, every time somebody gets a message of a sort he or she does not want, s/he could send it to the filter, thereby making that filter smarter over time, as well as giving it the ability to "learn" as spam techniques develop..."

Well, I'll see what I can do. I am a member of the EFF and contribute yearly with monetary donations.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

UOD wrote:Well, I'll see what I can do. I am a member of the EFF and contribute yearly with monetary donations.
Thanks! Any such prominent links to AccuSpam.com will help accelerate the snowball downhill effect of ramping up the statistics AccuSpam uses to detect spammers.

As well, I am working a Bayesian content filter that uses the statistics of all AccuSpam users, and will work essentially exactly the same as the domain blocking. In essense the domain blocking hypothesis is that some domains send 99.9+% spam and < 0.1% non-spam. The naive Bayesian content filtering espoused by Paul Graham (and afaik used by all current Bayesian anti-spam, e.g. Spam Assassin, Spam Bayes, etc.) attempts to correlate spam features which have much less than 99% probability (especially if measured globally for all correlated users), thus it needs to balance the probabilities with "good words". Whereas, I am working on a Bayesian method that works the same as the domain blocking hypothesis and looks only for the features of spam content which are in 99.9+% of spam and in < 0.1% non-spam. This improved form a Bayesian content analysis will have advantages over Paul Graham's Bayesian content filter:

1. Fundamentally it is correlating not content of spam and non-spam (which is inherently noisy), but correlating volume of spam and non-spam. What makes spam is it's volume, not it's message. So this Bayesian does not try to decide what is bad content and good content as Paul Graham's (http://www.paulgraham.com) Bayesian does, it instead just tries to find the features of spam sent in bulk that unique from the features of non-spam on the whole.

2. The risk for false positive (even in future) will be near 0, e.g. 1 in million (same as for domain blocking), because it takes into account the patterns of many correlated users and the many permutations of legitimate email.

3. No way for spammers to corrupt the "good words" (words in non-spam) probability, because my approach does not use the probability of the "good words", only the probability of very "bad words" (words always spam and never in non-spam).

4. Effort to identify and train on patterns shared (divided) amongst all users, so many orders of magnitude less effort than per user (Paul Graham) Bayesian.

5. The only way for spammers to corrupt the very "bad words" is to fight with other spammers by adding more spam weight to the very "bad words" of other spammers. This is same as for the domain blocking. The only way for one spammer to defeat AccuSpam for his domain(s), is to correlate well by disapproving the domains of the other spammers. If all spammers fighting each other then they actually cancel their attempts to defeat AccuSpam, and aid AccuSpam in detecting them. For example, say there are 1000 spammers, then 999 are against each 1 of them, so they add 999 to 1 more disapproval data than they add approval data. If a spammer joins AccuSpam and does not disapprove his fellow spammers, then his votes are ignored because they won't correlate well to AccuSpam users which are disapproving the spammers. It is like a dog chasing his tail, they have no way out to catch it.

6. Since most single words occur both in spam and non-spam, my improved global Bayesian, will look at n-grams of word combinations, since it is not usually the word "sexy" but the context of the use "sexy" in a phrase that can uniquely identify a spam.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

I have thought of an easier way that spammers can defeat the Bayesian used in afaik all existing (Paul Graham) Bayesian anti-spam, easier than what I wrote before:

http://forums.speedguide.net/showpost.p ... stcount=69

For each spam run, they add 4 or 5 random letters (chosen from a-z and A-Z) to the end of each word that is often used in spam (e.g. ViagraAgtU). Do not insert HTML, space, punctuation or anything between the random letters and the spam word. Simple. Done. All existing (Paul Graham) Bayesian defeated 100%.

The reason is that given 26 letters times 2 for capitals, then the number of random combinations are (26*2) ^ 4 = 7.3 million. Thus it will take at least 7.3 million spam runs before on average a Bayesian filter will see the same spam word in more than one spam. Given that a Bayesian filter needs to see a word many times before giving it a high spam probability, then probably a billion spam runs will still not be detected. Spammers do not have to ask the other spammers not to use their combinations, because all spammers choose randomly.

Once the 4 letter combinations start getting caught by Bayesian, then just switch to 5 letters and that is 400 million. The 6 letters is 20 billion, e.g. trillions of spam runs before detection by Bayesian.

Since my improved Bayesian correlates all users, then for using the same combinations for all spams in a spam run will be detected by my improved Bayesian. The way for spammers to avoid detection with my improved Bayesian is to randomize the letters for each spam in a spam run:

http://forums.speedguide.net/showpost.p ... tcount=125

Anti-spam could attempt to identify words stems which end (or start) with randomized letter combinations, but this could create false positives if analyzed the last letters for randomness. Some legitimate non-spam emails contain unlikely letter combinations, e.g. hexadecimal numbers.

Anti-spam could attempt to use a dictionary of words to extract the beginning word stem,and ignore words with stems not found in dictionary, but the dictionary would have to contain all possible spellings of spam word stem, then the anti-spam would miss unknown and misspelled spam words (e.g. Viaqra).

So to be most clever, spammers should combine misspellings with random letter appendages (e.g. ViaqraAgtU) and avoid using anything (no html) but letters a-z and A-Z in their emails. They can defeat all Bayesian that way, even my improved Bayesian if they randomize each spam of spam run.

Spammers would also have to randomize any urls they insert in their spams. They probably should do it more intelligiently than just adding random "?xxxx" to end, as this is easy for anti-spam to ignore. Instead they must randomize the domain (or portion after a non-spam domain). Much more costly for spammers to randomize their domains and urls. As long as spammers have a correlatible url or reply email address in their spam, then BrightMail and Bayesian can correlate them, but my improved Bayesian can correlate it much faster (since many users data and spammers can change urls frequently compared to only one user data).

And spammers could combine this with my previous ideas to insert normal prose to help defeat and pollule the good word probabilities of Paul Graham type Bayesian:

http://forums.speedguide.net/showpost.p ... tcount=117

Here is some interesting analysis and examples from another person who believes Bayesian content filtering can and will be defeated:

http://www.jerf.org/writings/bayesReport.html
User avatar
JawZ
Posts: 21941
Joined: Fri Feb 23, 2001 12:00 am

Post by JawZ »

What are your thoughts on email encryption/digital signatures? Any problems in how encrypted email interfaces with your service?
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

UOD wrote:...Any problems in how encrypted email interfaces with your service?
As far as I know, no conflicts in terms of the sender address statistical blocking. AccuSpam does not care about what you put in the email, as long as the normal headers exist.

However, for the global Bayesian content blocking we are considering, then if the body of the email is encrypted, then that aspect of spam detected would be defeated. However, I think you are referring to a signature which identifies a sender, not the encryption of the email content. In that case, I see no conflict with AccuSpam.

Note that AccuSpam does not currently propogate all the headers (only for non-Approved Senders in free version, or all senders in the yet unreleased paid version), so any special headers (that normal email does not need) would be lost. This is in my medium term To Do list to fix.

UOD wrote:What are your thoughts on email encryption/digital signatures?...
If you are referring to encrypting the content of an email using public/private key (e.g. PGP) so that only the sender and recipient can decrypt, then I think that is really not needed or practical for vast majority of users.

What we really need is secure transport (e.g. SMTP and POP over SSL), so that the email can not be sniffed during transmission, which is especially important now with wireless transmission. Minimally every user needs to demand their ISP support APOP or POP over SSL (it is amazing how many major ISPs do not!), and then set their email program, to prevent the sending of their email passwords in clear text. My ISP Earthlink.net supports APOP, but my Host (which is also the Host of AccuSpam) Pair.com still does not support APOP (even after 2 years of me asking them to), and it was a source of irritation when a college student walked up to me in a coffee shop where I was connected via wireless and showed me my email password. Since then, I always change my email password before doing wireless session, and then change it back afterwards. Note that other than this, Pair.com is a very secure and excellent Host. The do support most other major secure connection mechanisms, such as SSH (telnet over SSL), SFTP (ftp over SSL), etc..

If you are instead referring to the use of a digital signature to identify that an email really came from you, then we think this is so important, that it is actually part of way AccuSpam will detect email forgery. Soon there will be a new feature on AccuSpam, where you insert a value in your signature so that all AccuSpam users can receive your email. If you don't sign up for it, AccuSpam users will still get your email, but your email address can be forged by a spammer. Initially we expect major corporations to sign up for this once we have many AccuSpam users, so that they can stop spammers from doing phishing scams using their corporate email addresses. This will also be a free service available for instant signup to individual users.

-Shelby Moore
http://AccuSpam.com
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

Improved the correlation of AccuSpam users by only correlating to the target user on domains the target user thinks are spammers. This was done to insure that any attempt to approve a spammer by joining AccuSpam to pollute global stats, would be ineffective because they would also have to disapprove a greater number of other spammers in order to correlate to other users.

An unexpected benefit is it increased the number of correlations by 50%! So we are 50% closer to critical mass. In hindsight, this makes sense (thinking to myself "why didn't I realize that!" :) ). Many users will disagree on the % of spam received from non-spam domains, ranging from 0% - less than 80 or 90%. But most users will agree the spammer domains are sending greater than 90+%.

Some users may see an instant and significant decrease in the length of their Daily Summaries from this simple improvement.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

I am currently having doubts whether I will implement the "improved Bayesian content filtering" I outlined in previous post:

http://forums.speedguide.net/showpost.p ... tcount=125

I have realized any Bayesian filter which recognizes urls and domains, could be effectively used by spammer to blacklist any less frequently domain on the web, by sending out a lot of spam containing that domain. Chalk that up to yet another hole that could be exploited by spammers again Bayesian.

I could ignore domains and urls in content, and may do that as a defense against current day spam until our critical mass builds for statistical sender blocking, but then as outlined previously, defeating all Bayesian content filters is fairly trivial for spammers if the Bayesian is not considering the urls and domains in the content:

http://forums.speedguide.net/showpost.p ... tcount=126

AccuSpam's statistical sender blocking can NOT be polluted so easily by spammers because we can detect forgery of sender. We have no corresponding way to detect forgery of content.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

Another major improvement has been made to AccuSpam.

The Daily Summary now has emails ranked by order of greatest chance to be a non-spam first.

And the chance of being a non-spam is listed below each email summary in the Daily Summary.

Thus the AccuSpam user can decide how far down to browse the Daily Summary based on his/her desired false positive risk.

Now there is no excuse not to reply to the Daily Summary. There are some users who are not replying to the Daily Summary, and they will have no one to blame if they lose a legimate email but themselves. We can not hold their quarantine indefinitely. We will probably automatically purge emails from the quarantine which are 14 days old and have less than 1 in 1000 chance to be a non-spam. Or something reasonable like that.

Additionally the statistical domain blocking algorithm is run again for previously processed emails for a user just before sending the Daily Summary, so that any global data that accumulated since first processing has another chance to detect the spam (as having > 1 in million chance to be non-spam) and not include it in Daily Summary.

I already noticed this has reduced the lengths of some users' Daily Summaries.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

We made an error in the improvement we made in morning:

http://forums.speedguide.net/showpost.p ... tcount=131

which caused the Daily Summaries to contain blank entries.

This has been fixed and replacement Daily Summaries have been emailed to all users.

Do not worry. No email was lost. It was merely an error in the display of the information in the database. No information in the database was affected.
cigamkcalb
New Member
Posts: 16
Joined: Tue Aug 10, 2004 1:09 am

Post by cigamkcalb »

sounds dangerous
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

cigamkcalb wrote:sounds dangerous
Absolutely not dangerous.

Before sending the Daily Summaries, the data from the database is copied into an array in memory. The error was that we were not reading correctly from that array when writing the values into the text of the Daily Summary email. No manipulations are performed on the database when composing the Daily Summary, because it is purely a display operation. That is why it wasn't as crucial to test it exhaustively before release. Be confident that any code that changes the database is tested exhaustively both before and during release and continually monitored.

Besides, the database is backed up frequently.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

I am very happy to report that the backlog in some users' quarantines is being automatically reduced by the improvement I made to apply the statistical blocking again before sending Daily Summary.

We had a 20% increase in enabled AccuSpam users overnight!

The global statistical blocking among correlated AccuSpam users is starting to catch up with the rate that spammers use new domains.

I am confident we will see the Daily Summaries reduce from here.

The remaining major work for me is to figure out how to deal with UNSPOOFED spam from domains which do not send 100% spam, e.g. major ISP domains. We already delete the spoofed spam in most cases. Luckily this UNSPOOFED spam from non-spammer domains is a small % of the spam being received because ISPs have incentive to stop spam coming from their networks. I will probably have to apply some sort of "safe" Bayesian to UNSPOOFED spam from non-spammer domains. And may be able to apply "safe" reverse DNS on free email that are exclusively Webmail oriented. As well, the statistical blocking by sender email address (not just domain which only needs < 1000 users) will kick in once we have 10,000 AccuSpam users.

As always, in no case should you receive spam in your Inbox with (paid version of) AccuSpam (and only minute amount if free version used correctly as detailed in the AccuSpam.com website FAQ).
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

We had the first user complain angrily about AccuSpam, and I feel it is important to explain the scenario where AccuSpam will absolutely not work.

(not counting the old version of AccuSpam in 2003 that was totalling different product and algorithm).

We did not bother to ask the user why s/he wanted to disable AccuSpam, but s/he was asking for instructions to disable and they seemed angry about not receiving any of their email. We simply pointed them to the instructions that are already on the AccuSpam.com website for disabling.

We realized that the user could not have been receiving the Daily Summaries or was not properly replying to the Daily Summaries. That is the only way they could have received none of their email.

So I realized that the users is probably running another anti-spam (probably at their ISP possibly even without the user's knowledge) and that anti-spam is erroneously blocking the Daily Summary emails from AccuSpam to the user.

It could very well happen that some people (possibly other anti-spam companies) who feel competition to AccuSpam will try to hurt us by blacklisting our IP address.

So if you are using AccuSpam and you do not receive the Daily Summaries, then complain to your ISP that they are erroneously blocking your legitimate email.

AccuSpam does not delete legitimate email. Sadly most other anti-spam does. So do not blame AccuSpam if you run another anti-spam that blocks AccuSpam. Users would be much wiser to run one anti-spam at a time.

UPDATE:

Apparently we were wrong and the problem was the user was not replying to the Daily Summaries.

Here is copy of our email response to her/his further explanation of the problem they were having. Note that no personal information has been disclosed. This is merely to answer a problem that other users may run into:


======AccuSpam wrote to AccuSpam user========
Thanks for explaining your problem further, especially we did not even ask you to. That is appreciated. We had incorrectly assumed you were not receiving the Daily Summaries from us:

http://forums.speedguide.net/showpost.p ... tcount=136

Sounds to me like you are trying to type into the Daily Summary email you received from us.

You must click "Reply" first to create a new reply email. Make sure your email program is configure to include the senders email at bottom when you reply to a sender. Else you need to copy and paste the email from AccuSpam into the reply.

Then you can type into the [ ] boxes in the reply email.

NOTE: you do NOT need to type a space for each message you wanted deleted. The spaces are already inserted by default in the Daily Summary AccuSpam sends to you.

At 08:37 PM 8/15/2004 -0400, AccuSpam user wrote:
>
>I am trying to place letters (A & R) in the Message ID brackets. I am not
>able to type anything in the spaces ... nor can I put a space in the Message
>Brackets for the messages that I want deleted.
>
>WHAT IS GOING ON?
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

Improved the Daily Summary instructions so users understand that they do not have to manually type an empty space for each email they wish to delete. They merely reply. They only need to type the A and the R.

Added:

Code: Select all

   Note the [ ] below already have an empty space by default.

To bottom of:

Code: Select all

-  Place empty space in Message ID brackets [ ] for messages you want deleted
   permanently and to permanently block sender.
   Future emails from sender will be deleted.
   Use empty space if sure is spam.
   Note the [ ] below already have an empty space by default.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

As I predicted, the spammers are getting more astute at attacking the popular Bayesian content filtering used by most anti-spam (not used by AccuSpam).

The following content is an attempt at normal prose, but I think it is still too non-random and as I said the urls are what can still be correlated by bayesian, if you do not mind your legit email getting blocked by bayesian if spammers insert non-spam urls in their spams:
Subject: joke inside

<DIV><FONT face=Arial size=2><A
href="http://nicepharmacy.com/?partid=arlenders">Three blondes were taking a walk in the country when they came upon a line of tracks. The first blonde said, "Those must be deer tracks!" The second blonde said, "No, stupid, anyone can tell those are rabbit tracks!" The third blondie said, "No, you idiots, those are horse tracks!" They where still arguing ten minutes later when a train hit them.</A></FONT></DIV>
<DIV><FONT face=Arial size=2><A
href="http://nicepharmacy.com/?partid=arlenders"><IMG alt="" hspace=0
src="http://222.233.52.28/d1.gif" align=baseline border=0></A></FONT></DIV>
<DIV><FONT face=Arial size=2><A
href="http://nicepharmacy.com/?partid=arlenders">A blonde got a dent in her car and took it in to the repair shop. The repairman, noticing that the woman was a blonde, decided to have a wee bit of fun. So he told her all she had to do was take it home and blow in the tailpipe until the dent popped itself out. After 15 minutes of this, the blonde's blonde friend came over and asked what she was doing. "I'm trying to pop out this dent, but it's not really working." "Duh. You have to roll up the windows first!"</A></FONT></DIV>
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

Request for help.

To further improve AccuSpam, I need a list of the mail domain (e.g. msn.com, earthlink.net, etc.) for subscribers of major and stable ISPs all over the world. Specifically ones that we know have a bonafide non-spam subscriber base of size that would be worth a spammer attacking.

And for each one, I need a copy of the email headers (specifically the "Received" header lines) from an email sent from a subcriber to that ISP.

The reason I need this information is to compile a database of ISP domains that support "Reverse DNS" (e.g. PTR records in DNS for their IPs) and also a list of the nameservers for each ISP (e.g. NS records). I can lookup this information from DNS given the email headers.

It seems that we can delete a lot of the spam that AccuSpam is currently summarizing in the Daily Summaries simply by looking for forged Reverse DNS records! This is different than how most anti-spam use Reverse DNS. I have noticed that many spammers set a Reverse DNS record for their IP to match the lie they give in the email headers, but that then of course the nameservers do not match the major ISP they are pretending to be sending from.

For spams from IPs which do not have a Reverse DNS record, we will not delete this (as some anti-spam do), as this would cause false positives, but we can assign a probability to this which when combined with other metrics can help detect the spam.

Start here for lists of major ISPs:

http://navigators.com/isp.html

http://www.thelist.com/

Any contributions can be emailed to me at:

shelby@coolpage.com

If you subscribe to one of those ISPs above, simply send me an email with subject "Here is an ISP header you requested".

Thanks,
Shelby Moore
http://AccuSpam.com
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

Implemented the pseudo-"Reverse DNS" test (somewhat different that the way other anti-spam use "Reverse DNS"), and have populated it to detect and delete sender email address forgeries from yahoo.com and hotmail.com. I can see many of these forgeries now being deleted:

http://forums.speedguide.net/showpost.p ... tcount=139

You should see much less (if any) spam in your Daily Summaries from senders that have yahoo.com and hotmail.com in their email address.

For major non-webmail ISPs, it will not delete because we can not be sure legitimate email will pass "Reverse DNS" in that case, but it will place a higher probability of spam on those that fail "Reverse DNS". Most importantly it will delete those that forge the "Reverse DNS" of major ISPs.

FYI, this may seem non-intuitive, but it is MORE important to block forgery of free email domains than paid email domains, because AccuSpam deletes spam from non-existent senders, and thus it is much more costly for spammers to obtain paid email accounts (or to use their mailing list as the senders), or obtain their own domains, than to obtain free email accounts and forge them. The reason the spammer must forge the free email account they created is because they can not send huge volumes of emails through the webmail interface of the free email provider.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

Forged spam from hotmail and yahoo is now eliminated from Daily Summary. The only way to get a spam from hotmail and yahoo in your Daily Summary is if the spammer actually sent the spam from the webmail (directly or via a program which interfaces to the webmail, e.g. Outlook, Hot Popper, Yahoo Pops, etc) of hotmail or yahoo (which I think yahoo and hotmail prevent sending huge volume of email from their webmail).

Here is an example that AccuSpam detected and deleted (with "xxxxxxxx" used to obscure private AccuSpam user data):

Code: Select all

Return-path: <cheappharmz569@hotmail.com>
Received: from defapp04.gatewaydefender.com (unverified [209.153.138.124]) by
buckeye-express.com
 (Rockliffe SMTPRA 5.3.11) with ESMTP id <B0067639950@mpmail1.accesstoledo.com>;
 Tue, 17 Aug 2004 10:17:14 -0400
Received: from YahooBB218124004160.bbtec.net (Not Verified[218.124.4.160]) by
xxxxxxxxxxxxxxx with DEFSCAN (v3)
        id <BH0cca5226>; Tue, 17 Aug 2004 10:17:12 -0400
Message-ID: <86981256473.88020@cheappharmz569@hotmail.com>
Reply-To: "Katina Thornton" <cheappharmz569@hotmail.com>
From: "Katina Thornton" <cheappharmz569@hotmail.com>
To: xxxxxxxxxxxxxxxxx
Subject: V.iagra on s.ale, save moolah ;    bdlkvdpkgogsc 
Date: Tue, 17 Aug 2004 12:12:53 -0300
MIME-Version: 1.0 (produced by decimatesportsman 1.7)
Content-Type: multipart/alternative;
        boundary="--945509168745767"

Here is the analysis AccuSpam did, where "38k" means it deleted the email as forgery. You can see the spammer was actually sending from 209.153.138.124, which is "defapp04.gatewaydefender.com" probably on "voyager.net" network:

Code: Select all

27: [email]cheappharmz569@hotmail.com[/email]
28: V.iagra on s.ale, save moolah ;  bdlkvdpkgogsc
28a: hotmail.com
39
40a
38a1: hotmail.com
38a2: 209.153.138.124
RESULT:

; <<>> DiG 9.2.3rc4 <<>> -x209.153.138.124
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10170
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 1

;; QUESTION SECTION:
;124.138.153.209.in-addr.arpa.        IN        PTR

;; ANSWER SECTION:
124.138.153.209.in-addr.arpa. 86400 IN        PTR        defapp04.gatewaydefender.com.

;; AUTHORITY SECTION:
138.153.209.in-addr.arpa. 86400        IN        NS        e0.ns.voyager.net.
138.153.209.in-addr.arpa. 86400        IN        NS        e1.ns.voyager.net.
138.153.209.in-addr.arpa. 86400        IN        NS        e2.ns.voyager.net.

;; ADDITIONAL SECTION:
e2.ns.voyager.net.        164766        IN        A        207.90.100.25

;; Query time: 25 msec
;; SERVER: 209.68.2.239#53(209.68.2.239)
;; WHEN: Tue Aug 17 10:19:51 2004
;; MSG SIZE  rcvd: 169


38a3
38a4
38a9
38k
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

Added an interesting link of example phishing scams to previous posts below. The link was provided by Tony.

http://forums.speedguide.net/showpost.p ... tcount=116

http://forums.speedguide.net/showpost.p ... tcount=128
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

Here is an explanation I gave to Tony, which I think is relevant to share with anyone using or interested in using AccuSpam.


To: tonyt@coolpagehelp.com
Subject: Re: Is this from a spammer?
Cc:


Yes it is a spam.

You received it because it had arrived in your mailbox within 3 minutes before you POPed email from your mailbox.

If using the paid version, it would be impossible for you to receive these.

The spammer sent an email from zstom@coolpage.com to contactform@coolpage.com. AccuSpam sent a confirmation to zstom@coolpage.com, because there is no way for AccuSpam to know that zstom@coolpage.com is an alias for same POP mailbox as contactform@coolpage.com. For example, AccuSpam would not know if first@msn.com is same mailbox as second@msn.com.

But AccuSpam has a way to find out. When AccuSpam finds this email in the POP mailbox, it checks it's database and realizes that it received the confirmation in the same POP mailbox as it sent it from. So then AccuSpam deletes the confirmation and the original spam.

The only reason AccuSpam did not do this, is because you downloaded the confirmation email below before AccuSpam had a chance to check the mailbox again. So AccuSpam checked the POP mailbox, sent the confirmation, and then waited 3 minutes to check the POP mailbox again. While waiting for 3 minutes, the confirmation came back to same POP mailbox (because zstom@coolpage.com and contactform@coolpage.com are aliases for same POP mailbox). You downloaded the confirmation email below during that 3 minute wait. That is why this form of spam can only be received in the free version and only in the rare case that you happen to hit that 3 minute wait window.

AccuSpam must wait 3 minutes between inspecting your POP mailbox, because if it opened your POP mailbox more frequently than that, then you would be unable to open your own POP mailbox, because it can only be open to one client at a time.

The paid version solves this by using two mailboxes, one that AccuSpam inspects and the other that you POP from. This is called a "proxy". There are other ways we could attempt to do a proxy, but in our analysis they were all inferior to what we chose. For example, putting a proxy on the client computer of the user would not work well because it would not work with WebMail or when user uses other computer, so we chose the dual mailbox proxy for paid version instead.


At 09:01 PM 8/17/2004 -0400, you wrote:
>Return-Path: <cnfm_77741_HZuJwmIsI9PTY6Y5@accuspam.com>
>Delivered-To: coolpage-3dize:com-support@3dize.com
>X-Envelope-To: support@3dize.com
>Received: (qmail 98099 invoked by uid 3052); 18 Aug 2004 00:17:50 -0000
>Delivered-To: coolpage-coolpage:com-zstom@coolpage.com
>Received: (qmail 98096 invoked by uid 3052); 18 Aug 2004 00:17:50 -0000
>Date: 18 Aug 2004 00:17:50 -0000
>Message-ID: <20040818001750.98095.qmail@qs662.pair.com>
>To: zstom@coolpage.com
>Subject: Received your email: [BuddyN]
>From: "contactform@coolpage.com" <cnfm_77741_HZuJwmIsI9PTY6Y5@accuspam.com>
>Reply-To: cnfm_77741_HZuJwmIsI9PTY6Y5@accuspam.com
>
>
>I [contactform@coolpage.com] received the email from you [zstom@coolpage.com],
>containing the subject above.
>
>If you need me to reply more urgently, simply click Reply
>and send back this entire confirmation email.
>
>
>If you sent the email to [contactform@coolpage.com], the following
>does not apply to you.
>If you did NOT send an email to [contactform@coolpage.com],
>http://AccuSpam.com can help you stop forgery spam.
>
>=============================
>Join free http://AccuSpam.com
>100% spam blocked. 0% of non-spam blocked.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

Added "web.de" so impossible to get forged emails from a web.de email address, same as was done for hotmail and yahoo.

It is as easy as follows to add forgery blocking to AccuSpam for each free email provider:

Code: Select all

dig -x217.72.192.221

; <<>> DiG 9.2.3rc4 <<>> -x217.72.192.221
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15696
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0

;; QUESTION SECTION:
;221.192.72.217.in-addr.arpa.   IN      PTR

;; ANSWER SECTION:
221.192.72.217.in-addr.arpa. 3353 IN    PTR     fmmailgate01.web.de.

;; AUTHORITY SECTION:
192.72.217.in-addr.arpa. 3353   IN      NS      nsx2.cinetic.de.
192.72.217.in-addr.arpa. 3353   IN      NS      nsx1.cinetic.de.

;; Query time: 2 msec
;; SERVER: 209.68.2.239#53(209.68.2.239)
;; WHEN: Wed Aug 18 09:20:27 2004
;; MSG SIZE  rcvd: 124

mysql> show create table dns;
+-------+--------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                             |
+-------+--------------------------------------------------------------------------------------------------------------------------+
| dns   | CREATE TABLE `dns` (
  `Tld` varchar(127) NOT NULL default '',
  `MajorISP` tinyint(3) unsigned NOT NULL default '1',
  `PTRSupported` tinyint(3) unsigned NOT NULL default '2',
  `PTRRequired` tinyint(3) unsigned NOT NULL default '0',
  `TldNSMatches` varchar(127) NOT NULL default '',
  PRIMARY KEY  (`Tld`)
) TYPE=MyISAM |
+-------+--------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> insert into dns values ('web.de','1','1','1','cinetic.de');
Query OK, 1 row affected (0.00 sec)

I was prompted to prioritize adding "web.de" (in advance of a planned comprehensive addition of all known (10,000+) free email providers), as I noticed the following forged email from a "web.de" address was NOT blocked by our best competitor, http://BrightMail.com:

Code: Select all

Return-Path: <qwceilqxodjbd@online.sh.cn>
Received: from 207.217.125.20 ([211.191.62.186])
	by robin (EarthLink SMTP Server) with SMTP id 1bX4fK48O3NZFjX0
	Tue, 17 Aug 2004 06:43:47 -0700 (PDT)
Received: from dns3.web.de (dns3.web.de [73.212.13.183]) by 211.191.62.186 with SMTP id d7AJB51Jv7;
	 Tue, 17 Aug 2004 18:38:44 +0400
From: "Carmen Shepard" <kzwjxg@web.de>
Reply-To: "Carmen Shepard" <kzwjxg@web.de>
Subject: of 9 but
To: [email]lrtimmons@earthlink.net[/email]
Cc: [email]paulzaccardi@earthlink.net[/email], [email]aurora51@earthlink.net[/email], [email]cyndi6@earthlink.net[/email], [email]coolpage@earthlink.net[/email], [email]jusnjodi@earthlink.net[/email]
Message-ID: <B84EE85692174DC@web.de>
X-Mailer: crank case 62 curses
Date: Tue, 17 Aug 2004 20:43:44 +0600
Organization: philosopher 870 brides
Mime-Version: 1.0
Content-Type: multipart/alternative;
	boundary="=====250893080900=_"
X-ELNK-AV: 0

Dewey Blair,%RND_SYB ,cretin ,strengthen .%RND_SY Under ground C D !Check Your spouse and staff, Investigates anyone own cREDIT-HISTORY, Govenment don't want me to sell. hacking someone P C !Get a new passport! Disappear in your city very easy! [url]http://acadu.bettersites.info/amite/CD3/[/url] insomniac ,hypothetic , ,pouch ,din ,formidable ,adolphus . kinky ,lack . 

Note the doing a reverse dns query of the IP address in first Received: header in above spam does not return "web.de" domain and "cinetic.de" nameserver, thus indicating it was not sent over the "web.de" webmail and is thus (with very high probability) a forged email:

Code: Select all

dig -x211.191.62.186

; <<>> DiG 9.2.3rc4 <<>> -x211.191.62.186
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 54402
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;186.62.191.211.in-addr.arpa.   IN      PTR

;; Query time: 229 msec
;; SERVER: 209.68.2.239#53(209.68.2.239)
;; WHEN: Wed Aug 18 08:48:54 2004
;; MSG SIZE  rcvd: 45

Whereas looking at following email I sent from a "web.de" account I created, notice I used the IP address in first Received: header in the query I used to configure our database (as shown in first Code section above).

Code: Select all

Return-Path: <shelby_moore@web.de>
Received: from fmmailgate01.web.de ([217.72.192.221])
	by sparrow (EarthLink SMTP Server) with ESMTP id 1bXqku7yV3NZFjV0
	for <coolpage@earthlink.net>; Wed, 18 Aug 2004 06:18:14 -0700 (PDT)
Received: by fmmailgate01.web.de (8.12.6/8.12.6/webde Linux 0.7) with SMTP id i7IDHq1d016358; Wed, 18 Aug 2004 15:18:12 +0200
Received: from 203.168.2.77 by freemailng2002.web.de with HTTP;
	Wed, 18 Aug 2004 15:18:07 +0200
Date: Wed, 18 Aug 2004 15:18:07 +0200
Message-Id: <30890719@web.de>
MIME-Version: 1.0
From: "Shelby Moore" <shelby_moore@web.de>
To: [email]coolpage@earthlink.net[/email], [email]shelby@coolpage.com[/email]
Subject: [email]coolpage@earthlink.net[/email], [email]shelby@coolpage.com[/email]
Precedence: fm-user
Organization: [url]http://freemail.web.de/[/url]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-ELNK-AV: 0

Test
____________________________________________________
Aufnehmen, abschicken, nah sein - So einfach ist 
WEB.DE Video-Mail: [url]http://freemail.web.de/?mc=021200[/url]

Enforcing reverse dns on free (webmail exclusive) email providers deletes forged spam that apparently http://BrightMail.com does not block.

I understand it is possible, yet not standard and complex, for some (1 in 10,000?) users of free email to configure (see "Method 2: How to Set Up a New Account that Sends Messages by Using an SMTP Server") an email client to not send over the free email providers' network, but my opinion and assumption it simply isn't worth receiving all that forged spam from free email domains to insure against that rare chance (1 in million overall for all email received?). Those rare cases are easily handled by adding those rare users to Approved Senders list. My assumption is because by their nature, free email providers entice users who want to do webmail and who want an easy and free solution (not a complex one that requires paid password access to a non-open SMTP relay).
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

At 11:55 AM 8/17/2004 -0400, you wrote:
Sirs:

Thank you very much for your free AccuSpam. I do not think I will need anything further as I am not a heavy user of email. I was just so annoyed at the spam mail and resulting pop-ups.

I just need to know if my existing blockers will interfere with your service?

As long as you are receiving the "Twice Daily Summary" emails from "AccuSpam Robot" then you are probably okay.

But if you lose a legit email, you will have to suspect your existing blockers. I noticed you are also using Spam Assassin (or am I mistaken?), which is known to delete legit email sometimes (severity depending on the Spam Assassin threshold set).

I would suggest turning off your existing blockers and see if AccuSpam can sufficiently block the spam. If not, then assuming you are receiving the "Twice Daily Summary" emails from "AccuSpam Robot", the turn back on your existing blockers. Repeat this test every couple of months, until you are satisfied that AccuSpam is sufficient without your existing blockers. Then leave the existing blockers off.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

Added that free version of AccuSpam is not compatible if you use forwarding to the protected email address:

AccuSpam FAQ Requirements

This is because the reverse dns anti-forgery I added recently must have access to the original Received: headers of the email, which are normally deleted by most forwarding methods.

Most users do not use email forwarding.

Internal ISP forwarding that retains the Received: headers is compatible.

Failure to follow this Requirement, can lead to lost email.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

AccuSpam has announced it's superior proposal for anti-forgery, called SenderKeys(tm):

This directly competes with DomainKeys from Yahoo, SenderID (aka CallerID) from Microsoft, and SPF from Pobox.com.

AccuSpam had promised this long ago in the debate about .mail TLD at ICANN.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

The rough snapshot estimate of current AccuSpam performance from Tony's account:

Approximately 1300 emails avg. per day processed by AccuSpam over the last month.

Only 40 emails per day in Twice Daily Summary with no probability to be spam, thus (1300 - 80) / 1300 = 94% spam deletion if Bayesian level false positive risk accepted.

Only 66 emails per day in Twice Daily Summary with greater than 99 in 100 probability to be spam, thus (1300 - 122) / 1300 = 91% spam deletion with medium false positive risk.

Only 103 emails per day in Twice Daily Summary total is (1300 - 206) / 1300 = 84% spam deletion with 0% (> 1 in million) false positive risk.

The shows about 1% improvement from where we were last week. 10 other AccuSpam users are now correlated with Tony, compared with 8 last week. We only have about 100 AccuSpam users. We really need about 1000 for the spam deletion rate in the Daily Summaries to hit Bayesian level without the Bayesian level risk of false positive.

-Shelby Moore
http://AccuSpam.com
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

We are in the process of implementing anti-spam "honeypots" (aka "spam probes") to reduce with length of Daily Summaries without having to wait for more AccuSpam users.

This should be completed within August hopefully.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

SenderKeys Anti-Forgery proposal drastically improved and now has discussion list:

http://www.accuspam.com/senderkeys.php

It can now optionally be implemented entirely at the MTA (mail server) level without requiring MUA upgrades!

It now depends on (any one of) the 3 major anti-forgery proposals, so it will be seen as less of a threat to them and more complementary.
User avatar
Rainbow
Senior Member
Posts: 2936
Joined: Sun Dec 02, 2001 10:02 am
Location: Pittsburgh

Post by Rainbow »

My Blocked senders list keeps getting bigger and bigger I think its time to give acuspam a try :)
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Example input we are receiving from satisfied AccuSpam users.

Post by accuspam »

Example input we are receiving from satisfied AccuSpam users.

To: "The Old Map Company" <postmaster@oldmapxxxxxx>
Subject: Re: AccuSpam Comments
Cc:


Thanks.

Hope you do not mind if we post your comments to our Forum so others can be aware of the benefits you initially got with AccuSpam.

Actually you will find that AccuSpam will improve over time (we are still improving the algorithms), and eventually you will only get a Daily Summary if you have legit e-mail from previously unknown sender.

The Cool Page button was linked, but we added a link in the text based on your feedback.

Yes you can be sure with AccuSpam that you will never receive any spam that you did not specifically request, except as per the caveats in our FAQ if you are using the free version. If ever you need an absolute 100% insurance, you can upgrade to our paid version when it is available.


At 09:55 AM 9/7/2004 +0100, Steve wrote:
>Hello
>
>Trialing for a few days now and this all looks very promising. There was
>always the odd spam mail that demanded a quick look and the Newsletters
>(with the ad links that also demanded a look) one should have un-subscribed
>from, but never got around to hitting the button. Also I can now let my
>family have access to my mail box in the knowledge they will not be exposed
>to anything unpleasant. AccuSpam must be saving me an hour a day! That's
>more than two weeks a year, or placing a conservative value on my time as
>£10 per hour - £3,650 (US$6,500!) Congratulations.
>
>Steve Robxxxxx
>http://www.rag-dollxxxxxx
>
>PS You have a link missing on your site - To quickly design cool, creative
>web sites, we recommend ?
>Trust it's Coolpage!
User avatar
Paft
SG Elite
Posts: 5785
Joined: Tue Feb 20, 2001 12:00 am
Location: Richmond VA

Post by Paft »

accuspam wrote:SenderKeys Anti-Forgery proposal drastically improved and now has discussion list:

http://www.accuspam.com/senderkeys.php

It can now optionally be implemented entirely at the MTA (mail server) level without requiring MUA upgrades!

It now depends on (any one of) the 3 major anti-forgery proposals, so it will be seen as less of a threat to them and more complementary.
This seems like the private/public key model that PGP and GPG use already for encryption, tied in with email.

COOL.
So trade that typical for something colorful, and if it's crazy live a little crazy!
shikaza

shikaza

Post by shikaza »

shikaza is her :irate:
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Major Algorithm Update

Post by accuspam »

A superior underlying algorithm for AccuSpam will be released probably today. Nothing will need to change in the user interface of AccuSpam at this time.

The new algorithm correlates (among all users in a safe manner) on highly recursive content fragments instead of domain of sender, making it less susceptible to error from excessive email forgery of a domain, and more accurate against ISPs (domains) which send both spam and non-spam.

This algorithm also effectively increases the statistical reach of AccuSpam's user count, because spam content fragments cross-correlate more often than domain of sender of spam.

Unlike the very popular Bayesian statistics for anti-spam (e.g. used in Spam Assassin used my many ISPs), this algorithm continually re-trains itself, it will not generate a false positive (delete non-spam) or false negative (fail to block spam) when YOUR current non-spam or spam, suddenly has a shift in content that (in terms of Bayesian statistics) resembles YOUR past spam or non-spam respectively. The risks of Bayesian were detailed further in past posts:

http://forums.speedguide.net/showpost.p ... tcount=111

http://forums.speedguide.net/showpost.p ... tcount=126
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Post by accuspam »

A reply we sent to a customer today:
Hi,
1
Your promo on the home page says you just sign up and carry on as before. How can that be when an approved sender list must be compiled?

A single Daily Summary email is sent to you (automatically by our robot) with a COMPACT list of temporarily blocked emails (those that we were not sure if spam or not) and you can reply to that email back to our robot with an "A" in the [ ] box next to each sender you want to receive email from.

Once a sender is added to your Approved Senders list (by any method, i.e. directly or replying to Daily Summary, other method in future by login, etc..), then you always get email from that sender immediately in your Inbox and the following does not apply to that sender any more.

2
a) A friend of mine gives my address to another friend who emails me.

b) A person sees my address on a business card and emails me showing an interest in my product.

c) A person sees my address on a business card and emails me with info about his product, relevant to my industry.

All three above are unsolicited, all from addresses not see before by the anti spam software, and yet are welcome.

They will all appear in your Daily Summary email. And as our usership grows, less and less spam appears in the Daily Summary. Somtime this year, all you will see in Daily Summary are new senders. In that future scenario (where our undelying statistic spam detection is 99.99%) then on the days you do not get new senders, then those days you do not Daily Summary emails.

See our announcement of superior statistical algorithm yesterday:

http://forums.speedguide.net/showpost.p ... tcount=154

Without intervention by the user, no antispam system could possibly know which of the above emails are welcome and which are not.

Not true. Our underlying statistical algorithm is able to know this, we just do not have enough users yet (we have 1184 user as of today) to detect 99.99% of spam statistically. Once we have 10,000+ users, there will be an option for paid users to turn off the Daily Summary and allow new senders directly into Inbox. However, note that we need to know the Approved Senders in order to drive our statistical algorithm. However in that future scenario, we will be able to auto-populate the Approved Senders by seeing that you have received email from a new sender more than once and have not chosen to block the sender. In that scenario without a Daily Summary, then you will login to AccuSpam.com to report any spam received in your Inbox. But we won't enable such an option until our underlying spam detection is 99.99%.

Right now AccuSpam is 100% because it blocks everything that is not an Approved Sender and them compiles it into a Daily Summary. About 50% of incoming spam is detected (40+% by detecting nonexistent sender and 10% by statistics) and not included in the Daily Summary. The nonexistent sender and statistical algorithm is mathematically certain to never delete a non-spam more frequently than once in a million spams. In other words, the false positive accuracy is always 99.9999%. The 99.99% accuracy we are aiming for is to increase the statistical spam detection from current 10% to 99.99%.

If this point is agreed with then how would Accuspam be different to, say, Mailwasher where mail received from an address not yet seen by Mailwasher must be viewed by the user before being manually blacklisted.

1. AccuSpam is currently deleting 50% of spam automatically before the Daily Summary with 99.9999% accuracy. 10% is being done by correlating spam and non-spam content among all users, and this detection will increase to 99.99% this year as our usership grows. The exact algorithm is secret and includes some "magic" (math) which will hopefully be patented soon. It is quite different from Bayesian algorithm that many anti-spam products use, with some distinct advantages.

(Note the previous statistical algorithm which correlate sender domain, was not achieving the desired 99.9999% accuracy because many ISP's domains are used to send spam as well as non-spam. This wasn't a big problem, because the statistical sender domain correlation was only affecting (detecting as spam) 10% or less of incoming email and still with a very high accuracy. However, we have fixed this with the announcement mentioned above. It would have become a bigger problem as our usership increases, and now we have a very accurate statistical algorithm to build on as usership grows).

2. 100% spam protection is guaranteed by the Daily Summary email, which is much more COMPACT and SAFE way of reviewing suspect email not caught by the underlying algorithm than MailWasher which downloads all the spam and viruses to your computer BEFORE you are shown them and make choice whether to blacklist or receive them.

3. MailWasher is not correlating spam statistics with other users and has no underlying statistical way to detect spam automatically. Some products (maybe Mailwasher) will attempt to correlate only YOUR spam stats to detect spam (Bayesian), and the drawbacks of Bayesian are discussed in the link to the AccuSpam Forum I gave above.

4. MailWasher only protects email you download to your computer. AccuSpam protects your mailbox, no matter where or how you access it, e.g. using WebMail or from other computer.

5. You have to download and install MailWasher (and learn to use with each mail program you use) to every computer you want to protect. With AccuSpam, just signup your mailbox online in 1 minute and you are done.

6. The are sure to be technical compatibility issues for some computers and some mail programs when using a program such as MailWasher which runs on the computer you are using. AccuSpam runs on our server and communicates to your mailbox on your ISP's server, via standard POP3 protocol, and thus compatibilty issues are very, very rare and any compatibility issues are discovered when you attempt to signup. If your ISP's POP3 mailbox server is not compatible, you won't be able to signup for AccuSpam (we do numerous POP3 compatibility checks at signup). You won't have nasty problems later. And such incompatibility is very, very, very rare, because POP3 is a very, very, very universal standard for email mailbox delivery.

7. If your computer crashes or gets virus, your anti-spam does not crash or get compromised with AccuSpam.

And a promotional email my be sent out in bulk to subscribers, with a few extra to relevant ( as above ) industry pariticipants, so if software scrubs it just becaus a lot of others like it have alos been sent out it will fail the user again.

Our new underlying statistical algorithm will not scrub a desireable bulk email (newsletter, etc), because some of our users will have the sender of the desireable bulk email on their Approved Senders list and this will tell our algorithm that the content of that bulk email is not spam.

Again that is why I said we need the Approved Senders list to feed our statistical correlation algorithm. And again I said we can eventually get rid of the Daily Summary, once we reach critical mass of usership. In the meantime, it works very well, which is why we have a growing usership.

We will post our reply to our Forum for the benefit of the public knowledge.
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Major Algorithm Update

Post by accuspam »

The following descriptions of AccuSpam's algorithms are not a license, nor a public grant of any rights. AccuSpam reserves all rights. A patent will be filed on this algorithms.

The Major Algorithm Update is working as expected. The amount of spam summarized in the Daily Summary is drastically reduced, because most spam is being recognized and deleted (safely with less than an impossible 1 in million (0.0001%) risk of losing non-spam) by this new statistical algorithm which will call "Chunk".

The "Chunk" algorithm has many benefits as compared to the per-user Bayesian algorithm used by most all other anti-spam (e.g. Spam Assassin uses Bayesian and is used by many ISPs):


(1) Analyzes data from all users

(2) Automatically trains itself in real-time

(3) Only needs to be told what *some* non-spam is (does not need to have every incoming email trained on). We get the non-spam data from users' whitelists.

(4) Automatically recognizes new strains of spam in real-time (does not have to be trained on new spam), e.g. "Viagra" changed to "Ciali$". Can not be fooled by changes in spam (randomization) between spam runs.

(5) Automatically recognizes new strains of non-spam (new to one user, but not new to all users). In other words, it doesn't get confused if you contact an insurance company for a quote, but you have classified insurance emails as spam in the past.

(6) Detects much higher rate of spam, with a much lower rate of false positives. The false positive rate can be set in the probability calculations of the algorithm (e.g. 1 in million is 0.0001%), compared to 0.03% (1 in 3333) for Bayesian. So Bayesian will lose a legit email every 3333 emails received, whereas AccuSpam will never (1 in million) lose non-spam:

http://www.paulgraham.com/better.html

http://citeseer.nj.nec.com/androutsopou ... rning.html
(See Page 9 of the PDF linked at top)


(7) 100% immune to users who misclassify non-spam as spam.

(8) Is more immune than Bayesian to users who misclassify spam as non-spam. Also we monitor users to discovers spammers who signup for AccuSpam to approve their own spam. Besides getting spam past the statistical algorithm in AccuSpam is useless, because it goes into Daily Summary and still is blocked and body of email never read by users.

(9) Uses an *EXACT* probability calculation. Bayesian which counts statistical evidence, then uses an ad hoc (inexact/guess) calculation to sum/weight those evidence probabilities. Bayesian's ad hoc calculation makes many assumptions (e.g. that spam and non-spam never share the same corpus and are mutually independent), and when these assumptions fail, Bayesian makes mistakes (false positives). Fundamentally Bayesian samples only one user, and more importantly it samples spam over history yet spam changes in real-time, so it suffers aliasing (sampling below Nyquist of some event) errors. Some heuristics (guesses) have been developed to combat Bayesian's fundamental aliasing, but guesses+guesses is still not exact or 100% reliable. Bayesian works reasonably well for users whose non-spam and spam rarely change. Our "Chunk" works for all users, all the time.

(10) To understand more the strategic differences between the very popular Bayesian anti-spam algorithm and AccuSpam's new "Chunk" algorithm ("spam genetic correlator"), look at the following links to ideas our developer wrote years ago:

Semantics of spam is UBE

Spam that learns to not be statistically identified?

http://www.mail-archive.com/ietf@ietf.org/msg12814.html

Bayesian does not measure the real-time bulkness of spam. The "Chunk" algorithm was born from a conceptual paradigm that starts with the semantics of spam. Spam is UBE (unsolicited bulk email), which means the vast majority of bulk email. So filter on what is statistically sent in bulk, then the receiver can white list those few sources of BE (bulk email) that is not unsolicited. The current "Chunk" algorithm improves this further by using the data from many receivers to determine which BE is definitely unsolicited and should be deleted without showing in Daily Summary. Thus legit BE is not deleted.

By filtering spam at the semantic level (BE), then spam has to change semantically. But if spam stops being BE, then it isn't spam any more. Contrast this with the Bayesian algorithm. Bayesian filters spam by assuming the semantics of spam is the word distribution of the message. That is not the what makes spam a problem. Most of us get a lot of non-spam which has annoying word distributions. :rotfl:

Bayesian encourages spam content to change such that either, a) it's annoying words change often, e.g. "Viagra" changes to "Ciali$". That is why spam is becoming more annoying, as it isn't even readable most of the time, or b) spam that has words distributions very similar to normal email, and we do now see some spams which are not quickly discernable as being spam, or c) spams with less words (and often more images instead), and we are seeing an increasing # of these.

So although "Chunk" is also correlating on spam content, the semantics it correlates are the UBE factor of an email. This is a big difference from Bayesian which correlates the word distributions of email histories.

(11) Paul Graham, the "creator" of Bayesian for anti-spam, asks a question which he does not adequately answer, "So why did we get such different numbers?" when comparing his horrible false positive rate (0.03% is 1 in 3333 non-spams lost) to Pantel and Lin's 1.16% (is 1 in 86 non-spams lost).

http://www.paulgraham.com/better.html

The answer is that Bayesian is only sampling the side effects of spam, and not sampling the semantics of spam, which is that spam is BE. Bayesian can not measure the bulkness in real-time.

The differing error rates is a well known phenomenon in science called "aliasing". It occurs when the sample rate (frequency) is too low, specially when it is lower than 2x the frequency of the signal (Nyquist).

So different people will get totally different error rates with Bayesian, and it will be very unpredictable. As stated, if a recipients' non-spam and spam do not change much over time, then performance of Bayesian is adequate (because then it is approximating a bulk measurement). But in reality, non-spam and spam change quite frequently! For some people the change rate may be 1 in 3333 and others it may be 1 in 86 or even worse! Our "Chunk" algorithm does not have this variable aliasing error rate problem!
User avatar
accuspam
Member
Posts: 95
Joined: Mon Mar 08, 2004 4:32 am

Emails Sent to Paul Graham (aka "creator" of Bayesian anti-spam)

Post by accuspam »

Subject: Semantics of Bayesian leads to varied false positive rates

Hi Paul Graham (aka "creator" of Bayesian),

You might remember me as perhaps first to warn you about possible bad effects of Bayesian in Dec. 2002:

http://ixazon.dynip.com/pipermail/nilsi ... 00041.html

In item #11 at the following web page, I have answered the more important question you posed but did not adequately answer in your, http://www.paulgraham.com/better.html, web page:

http://forums.speedguide.net/showpost.p ... tcount=156



==================================================
Subject: Spammers can subvert your Filters Fight Back idea

Hi Paul Graham (aka "creator" of Bayesian),

Spammers can subvert your Filters Fight Back idea:

http://www.paulgraham.com/ffb.html

They simply customize the link with a randomly chosen password (of sufficient length) for each recipient and ignore invalid links. They can then detect which recipients are hammering them and ignore their http requests (but not necessarily stop sending them spam).

And one of the links in email could be an unsubscribe link that causes http requests to ignored, but does not unsubscribe from mailing list.

I can see this incorporated into popular spamware programs that spammers use to create and send spam.

Note to mention that protecting against the bad side effects of this idea would be extremely complex in real world.

Also, correlating the content of the linked web sites as you suggest is unnecessary overhead (not needed in our "Chunk" algorithm).



==================================================
Subject: You do not account the training cost of Bayesian

Hi Paul Graham (aka "creator" of Bayesian),

In your web page below, "So Far So Good", you fail to mention that if spammers conceal their bad tokens, by changing them frequently, then these spams do get past the Bayesian filter and involving a continual training cost for the recipient. And we observe that spammers are doing that all the time now. And eventually spammers could learn how to randomize their bad tokens (including urls, email addresses, and headers) sufficiently to avoid Bayesian all the time. But IMO they won't have to, because too many recipients are not satisfied with the false positive rates of Bayesian and effectively turn off the filter often (even if just by browsing their spam folder in normal course of using their Bayesian anti-spam).

http://www.paulgraham.com/sofar.html

Conversely, changes in good tokens have to be trained as well, especially when they change to overlap pre-existing bad tokens.

These are unavoidable consequences of incorrect semantics of Bayesian for spam filtering.

Kind Regards,
-Shelby Moore
http://AccuSpam.com
Post Reply