Assessing Email Harvester Countermeasures

It should be understood that every solution has trade-offs. Some solutions have such negative trade-offs that they remove themselves from contention.

Think of it like this:

Step 1: What assets are you trying to protect?

  • The e-mail inboxes of people in my community

Step 2: What are the risks to these assets?

  • Spam in the inboxes.

Methods spammers use to gather addresses:

  1. Harvesting e-mail addresses from web pages
  2. Harvesting from search engines
  3. Harvesting from whois databases
  4. Harvesting from newsgroups and bulletin boards
  5. Web forms (like formmail.cgi)
  6. LDAP siphoning
  7. List purchasing from unscrupulous web sites
  8. Lists of leads generated from their own sites
  9. Dynamically generated addresses from a dictionary or based on an organizational nomenclature
  10. etc, etc, etc.

Step 3: How well does the coutermeasure mitigate the risks to the assets?

For the purpose of this discussion I will focus on the web-based attack vectors for gathering addresses and the capability of harvesting applications.

Harvesting applications are capable of reading pages and gathering e-mail addresses whether plain text or in mailto links. They are capable of fixing common things like larryNOSPAM@example.com. High-end harvesters can handle javascript which allows them to harvest from the rendered page. Many sophisticated spammers (not necessarily harvesters) can guess addresses from a listing of people's names within an organization. For instance, if the nomenclature of the organization is first initial and last name then it would be simple to deduce that Bilbo Baggins's address would be bbaggins@example.com. This attack vector is beyond the scope of this message.

  • Keep mailto links and implement NO filtering at the e-mail server level
     
  • Keep mailto links and implement filtering countermeasures on our e-mail servers
     
  • Complete removal of all e-mail addresses from the web site (including ones that are not in mailto links)
    NOW: excellent
    LATER: excellent
     
  • JavaScript encoding solutions
    NOW: good (except for high end harvesters which support JavaScript)
    LATER: poor, as the JavaScript support moves down into standard harvesting applications
     
  • Web forms that allow recipient e-mail to be entered (formmail)
    NOW: poor, it can be used as a relay through which to send spam to everyone in our community.
    LATER: poor
     
  • Enhanced web forms (recipient is looked up in a database upon submission, see my previous post)
    NOW: excellent
    LATER: excellent

Step 4: What other risks does the countermeasure cause?

  • Keep mailto links without filtering
    spammers can gather e-mail addresses
     
  • Keep mailto links and implement filtering
    spammers can gather e-mail addresses
     
  • Complete removal
    none
     
  • JavaScript
    false sense of security once JavaScript support becomes standard feature on low end harvesters
     
  • Web forms that allow recipient e-mail to be entered
    provides a new attack vector
     
  • Enhanced web forms
    none

Step 5: What trade-offs does the security solution require?

  • Keep mailto links without filtering
    receive 100% of the spam sent to us from spammers who have harvested our addresses from our websites
     
  • Keep mailto links and implement filtering
    receive less than 100% of the spam sent to us from spammers who have harvested our addresses from our websites
     
  • Complete removal
    removal of the ability to communicate via e-mail with someone from a web page, probably too negative to be considered.
     
  • JavaScript
    all browsers must have JavaScript turned on
    all content developers will need a fool-proof method of generating the JavaScript
    we will likely have to replace this with a better solution once the majority of spammers have support for JavaScript
    many screen readers don't support JavaScript
     
  • Web forms that allow recipient e-mail to be entered
    provides a new method for spammers to relay mail using our organization, too negative to be considered.
     
  • Enhanced web forms
    all content developers will need a fool-proof method of generating the link to the form system
    screen readers will not work
    unified interface to the system allows changes to the system without changing the entire website , this makes the system flexible enough to counteract new attach vectors.

So, while JavaScript solutions can provide temporary protection we are likely to change our approach in the future. Which will require a significant amount of work each time. If we use an enhanced web form system we are less likely to have to make wholesale changes to the web site once implemented. But, if our filtering technologies are good enough (greylisting, DNSBLs, bayesian filters, DCC, razor, etc) we may be able to reduce or at least tag enough incoming spam that no changes are needed on the web site.

As you can see from the question in step one, e-mail harvester countermeasures is just one component of a larger process of reducing the amount of spam to an organization. I am of the opinion that my organization should implement filtering at the server level and re-evaluate the need for any next steps. If it should be determined that we must provide a countermeasure against mail harvesting then we should choose one that does the best when considering the trade-offs.

Hopefully, you can use the above method to decide what your institution should do.

Thanks to Bruce Schneier for teaching me his five step approach.