Depending on how a filter tests and scores a message, the filter might be able to determine the type of spam. By ordering and weighting the tests, you can place more or less importance on certain message characteristics. Early in the evaluation, tests might check for HTML formatting, whether the sender is on a blacklist, and whether any URLs match those in a known spam database. The cumulative score of these tests might be enough to classify the message as spam. Further tests (e.g., looking for vulgar words) could then classify the message as offensive spam.
No filter can completely eliminate false positives because some legitimate messages will have enough spamlike attributes to earn a spam classification. You can mitigate the risk of false positives by tuning the filter rules to account for your organization's message profiles. For example, a pharmaceutical company might need to configure tests so that the filter doesn't look at drug names or to ensure that drug names don't contribute significantly to the overall spam score.
Another false-positive concern is the desire to receive messages from specific groups. Some organizations might be partnered with companies that use direct-email marketing, and these organizations might want or even need to receive messages that the rest of the world might consider spam. In these cases, you can create a whitelist of approved sender and system addresses. A whitelist tells the spam filter to pass the message unchecked because you don't consider the sender a spammer.
Organizations that implement antispam mechanisms need to use a combination of whitelists and filter tuning to reduce the number of false positives that they see. In my experience, most of the messages that are incorrectly classified as spam are newsletters, bulletins, or newsgroup posts. These messages often end up incorrectly classified because they have attributes such as HTML formatting or advertisements, and in some cases, are sent by the same software that spammers use. However, you can easily identify the sources of these messages and add them to a whitelist.
Another misconception about false positives is that the spam filter deletes these messages. All spam-filtering software that merits use provides you with at least three disposition options, as Table 1 shows. Except for messages that rate a high spam score or those that are rejected because of a blacklist, most organizations don't delete email (at least not initially).
As with any new system, you should conduct a pilot implementation before you make a production deployment. During the pilot, perform tuning and build most of your whitelist entries. After the pilot, use a tag-and-deliver option for your production deployment. This method doesn't eliminate the spam from user's mailboxes, but depending on your implementation, it can make identifying spam easier.
Tagging typically adds a text prefix, such as SPAM: to a message's subject line (e.g., a subject of Very good news becomes SPAM: Very good news). Then, users can use Outlook to configure rules to perform some action when a tagged message is found in the Inbox. As I explain in the sidebar "Using Rules to Handle Spam," the most common action is to move the message to a separate folder. Tagging the subject line also lets you easily see which messages are spam even if you're not using a rule to move them into another folder, which is a significant plus if you're using a client or system that doesn't have rules capability.
Although the tag-and-deliver option essentially defeats what you expect a spam filter to doget the spam out of your users' mailboxesat least users can easily move the suspect messages out of their Inbox to a junk mail folder. When moved to a junk mail folder, the messages are no longer intermingled with other email, which helps reduce the risk that users will miss an important message. Don't forget, though, that everyone needs to review (and empty) their junk mail folders periodically to make sure the folders contain no false positives. Later, when people become comfortable that the filters aren't flagging vital messages as false positives, they can configure rules to delete messages instead of moving them into a junk mail folder. When the organization as a whole becomes comfortable with the effects and benefits of spam filtering, IT administrators usually receive permission to start deleting email with a high spam score instead of delivering it.
4. Putting SPAM on our subject line causes too many problems.
Some people are concerned about using filters to prefix certain messages' subject lines with the word SPAM. The primary concern is that the filter will tag a legitimate message, and someone will reply or forward the message with the prefix intact. Leaving the word SPAM in the subject can have two negative consequences. First, if you add SPAM to the subject and reply to the message sender, that person might be offended that you flagged the message as spam. Second, in a reply or forwarded message, the recipient might see the prefix and treat the message as spam, possibly deleting it.
Regarding the first concern, my opinion is that senders probably want to know that their messages are getting flagged as spam so that they can take steps to correct whatever is triggering the filter. If this is a concern in your situation, you have other options. First, you can use a different prefix, such as _SUSPECT, _ADVERT, or _ADVERTISEMENT. These words don't invoke the same feeling as SPAM and might be easier on the egos of the senders while still conveying that the message needs some extra care. The underscores prefixing the words help to ensure that the subject lines are distinguished from legitimate message subject lines that might contain these words. For example, a spam-handling rule would ignore the subject line Suspect seen at 10:30, but would process a message with the subject line _SUSPECT: Guaranteed Millions.
As for the second concern, yes, you might lose legitimate messages because of the word SPAM in the subject line. One way around this dilemma is to tag spam by inserting a reference into the message's SMTP header instead of prefixing the message subject. Some spam-filtering packages can insert a header extension field or X-header tag into a message as the message moves through an SMTP transport.
Figure 1 shows an example of some X-header tags that you might find in your messages even if you aren't running an antispam package. This example shows two X-header tags that Exchange 2000 Server uses to request a return receipt or to flag a message as important. Some antispam packages use X-headers in a similar way. When such a program scores a message, the program can write a notation in the header. For example, it might insert an X-Spam-79% tag or an X-Spam-Offensive tag to specify the message's spam score or classification. Rules can evaluate these header tags and act on the message. With this method of tagging, rules can handle spam, but the risk of a spam tag propagating through forwarded messages is eliminated.
Header tagging can also be useful when you're piloting antispam packages. Because message headers are usually hidden from view, only pilot participants would know about the tags and be able to use them to process spam. After the pilot project is over, you can let everyone know that header tags exist so that they too can start using the tags in their rules.
5. Spam isn't a threat.
Many people don't view spam in the same way they do virus-infected email; they consider spam simply a nuisance and don't realize its harmful side effects. Spammers are constantly changing tactics to deliver their junk mail. One such tactic is called a dictionary attack. The idea is that a spammer picks a domain for which he or she has an idea of the naming convention (e.g., firstname.lastname). The spammer uses a dictionary of first and last names to build email addresses using every combination of names and then sends messages to those addresses. The mail server will reject most of the combinations, but will accept some. This type of spamming places a high burden on Message Transfer Agents (MTAs) and directories. The receiving server must accept the mail, check the directory, and generate the nondelivery reports (NDRs) for the nonexisting accounts. Depending on how many thousands of messages the server receives, this processing can cause a significant delay in sending or receiving legitimate email. In addition, spammers use these dictionary attacks to build a directory of the names in your organization (aka directory harvesting). The messages arrive as spam, but the sender keeps track of which messages are returned as undeliverable. They can use this information later for social engineering (i.e., using a piece of information to convince someone that you're connected in some way to an organization) or other attacks against your systems.
Another threat from spam is virus delivery. The recent SoBig worm deployed a mechanism whereby one of the intended results was a platform that people could use to send spam. Many spam messages contain links to Web sites for products or links to unsubscribe. These links can provide an entry point for a virus.
And if the technology reasons aren't enough to convince organizations to implement an antispam solution, they should consider the lawsuits and hostile-workplace complaints users are filing because they're receiving offensive spam. Some people feel that companies aren't doing enough to stop these messages from arriving and feel compelled to file a complaint. But it isn't just the individual complaints that you need to think about. You need to consider how these suits might affect your organization's reputation.
The Days of Simple Email Are Gone
Today, email isn't just for casual communicationit's a vital part of most organizations' day-to-day operations and business processes. The people who make decisions for an organization know this and need to take it into consideration when addressing the spam problem. But these people also need real facts about spam and antispam solutions. Antispam technology has its risks, but you can mitigate those risks by providing information and best practices. The payoff will be worth the effort because spam is definitely more than just a nuisance.