• Home   /  
  • Archive by category "1"

Obfuscators Comparison Essay

In “Win the SPAM Arms Race” (A List Apart, May 2002), Dan Benjamin talked about the importance of hiding e-mail addresses on our websites from vicious, e-mail address harvesting bots—or spam bots, as they are more often called. Dan pioneered a JavaScript-based solution for bypassing the indexing mechanisms that spam bots use. Here’s a quote from the article:

101 comments

It’s hard to believe, but it’s been more than five years since Dan wrote these words. So, did we win the SPAM Arms Race? As you may have noticed by looking at your own inbox recently, not exactly. The Messaging Anti-Abuse Working Group (MAAWG) estimates that 90 billion spam messages are sent every day, and 80–85% of all incoming mail is abusive.

A shared responsibility

Many web users don’t understand the inevitable consequences of exposing their e-mail address on the web. Experienced web developers and website owners, however, do. Thousands of spam bots tirelessly crawl the web to collect e-mail addresses exposed on websites, in blog comments, and elsewhere. These addresses end up in databases sold to unsavory marketers, who bombard the owners’s inboxes with unsolicited mail.

Of course, spam is an increasingly complicated problem that can never be solved by the efforts of web developers alone. But don’t underestimate your own powers.

An unpleasant surprise

I work for a large non-profit organization that provides social services for the blind and visually impaired. After Wim, our system administrator, complained about the massive amounts of spam our mail server had to process, we started a small investigation. It turned out that 90% of all spam was sent to a mere 5% of the e-mail addresses we own, and guess what? They were exactly the addresses that had been published on our website.

Although most of the damage had been done by then (remember Dan’s quote), I promised Wim I would come up with an effective way to protect the addresses on our upcoming portal, on which we intend to publish even more addresses.

My solution would need to defeat spam and be accessible. We work intensely with and for people who have (mostly visual) disabilities. Accessibility is not an optional add-on.

A few months ago, Wim very unexpectedly passed away (we miss you, Wim!). Since then, I have spent a lot of time thinking about a way to fight spam bots. In this article, I’ll share my ideas on the subject and leave you with a working script to build on or to use in your own projects right away.

The problem with current techniques

Wikipedia has an excellent overview of anti-spam techniques. Their article also includes interesting links to articles about e-mail obfuscation. (Google the subject for more). Over the years, I’ve tried more than a dozen of these techniques. Although most seem effective, I can’t use them in my projects, as every one fails to meet one or more essential requirements. My requirements are:

1. No hassle, please

You’ve certainly seen e-mail links that look like “” or “”. If you’re like me, you probably don’t like to correct a deliberately misspelled e-mail address after you click on it. Moreover, users who don’t notice what’s wrong with the address will end up frustrated, because their message cannot be sent or delivered. Similar techniques require users to re-type a (correctly spelled) address that’s rendered as an image—which isn’t any better, of course.

Although they don’t require JavaScript, these methods of e-mail obfuscation add an unpleasant barrier to a task as trivial as sending an e-mail. Clearly, this is not the right way to treat visitors or (potential) customers. I want real, clickable e-mail links that work just as expected, but—at the same time—are immune to spam bots.

2. Graceful degradation

JavaScript-based techniques—like Dan’s—offer the seamless user experience I’m looking for. They’re all based on the simple fact that spam harvesters are incapable of parsing JavaScript or understanding DOM changes initiated by JavaScript events. Instead, spam harvesters try to extract e-mail addresses from raw HTML by using brute force algorithms—even Googlebot chokes on most of the JavaScript it comes upon. Only real browsers know how to handle JavaScript and can undo the obfuscation—either by stitching together s or by using a more advanced, unobtrusive, event-based approach.

An important downside is that such solutions are not bulletproof. Visitors who surf the web without JavaScript support—whether by choice or not—are out of luck, because they’re treated as spam bots. These visitors include people using text browsers, old or incapable screenreaders, or mobile devices with limited capabilities. Other users have JavaScript turned off for security reasons or because of company policies. W3Schools estimates that 6% of internet users have no access to JavaScript as of January 2007. As a comparison, if you believe that’s not enough to really care about, then maybe it’s time to reconsider why you strive to make your markup and CSS accommodate the 1.5% of IE 5.x users or the 1.3% of Safari users (again, W3Schools).

3. Install and forget

Most e-mail obfuscation techniques I’ve tried tend to be bothersome and time-consuming to implement because they have to be applied to each and every e-mail address that you want to protect. Most require you to use lengthy inline elements and inline event handlers. They may also invalidate your markup.

I wanted a transparent and fully automated solution that I can set up once and never worry about again. That’s the only way I can guarantee that all addresses that appear on our website are safe—even the ones that show up in blog comments.

Putting it together

Enough talking. Let’s get our hands dirty.

The ingredients

You’ll need Apache 2 and PHP 4 or later. On the web server, the module must be enabled and you should be able to set Apache directives through the use of files. Most web hosts have this enabled by default, so you probably don’t have to worry about it. For help on these Apache-specific features, check out the Apache documentation.

Put on your masks

Setting up Graceful E-Mail Obfuscation (GEO) involves a few steps. The key is to replace all occurrences of mailto links with innocent-looking URLs. Take this e-mail link as an example:

<a href="mailto:sales@yourcompany.com"> E-mail our sales department </a>

After the server-side treatment (I’ll get to that in a minute), that same link will look like this (line wraps marked » —Ed.):

<a href="contact/sales+yourcompany+com" rel="nofollow"> E-mail our sales department </a>

Let’s just take this one step further and apply some basic ROT13 to it.

<a href="contact/fnyrf+lbhepbzcnal+pbz" rel="nofollow"> E-mail our sales department </a>

From the results of web exposure tests I did with freshly created addresses, the ROT13 encryption did not seem to be necessary for the technique to be effective. However, it does add an interesting level of obfuscation that certainly won’t do any harm either. If you’re not familiar with ROT13, I should note that it doesn’t add real cryptographic security. Wikipedia offers an accurate description of what ROT13 does:

There are a couple of other things to note here:

  • I choose “contact” as a faux folder name for this example, but you can choose anything you like. To substitute the “@” and the dot in the address, I opted for a “+”. A “+” is typically not allowed in real e-mail addresses and it doesn’t have to be URL-encoded—which will come in handy later on.
  • The part is added to instruct search engines that they don’t need to follow these links and index subsequent pages. Read more about on Microformats.org.

Away with the s! We’re left with plain old hyperlinks. Well, except that they’re broken, maybe; but we’ll fix that soon enough. As you can imagine, there’s very little chance that a spam bot will identify these links as e-mail links—because…they’re not.

The script

To replace each occurrence of a link in a given webpage with a regular URL, I’ll use a PHP search-and-replace regular expression. The URL notation reuses parts of the original e-mail address so that it can be reconstructed later on. For this, we’ll take the entire HTML page as the subject of a PHP function (line wraps marked » —Ed.):

function encrypt_mailto($buffer) { preg_replace("/"mailto:([A-Za-z0-9._%-]+ » )@([A-Za-z0-9._%-]+).([A-Za z]{2,4})"/","" » contact/\1+\2+\3" rel="nofollow"",$html) }

With ROT13 enabled, the function looks quite a bit longer, as you’ll see in the finalized PHP class that you can download at the end of the article.

Now I want the script to intercept and parse all HTML pages before they’re sent to the browser. I’ll use PHP’s output buffering mechanism for that. In its simplest form, output buffering is activated by using a callback function:

ob_start("encrypt_mailto");

Using , plus PHP’s little-known, but powerful directive, we can now automate this process for an entire website or for specific folders only. If you add the following line to your file, will be automatically included at the top of every PHP document that Apache serves.

php_value auto_prepend_file /yourpath/prepend.inc.php

The file in itself initiates the output buffering and runs the entire contents of the served pages through the function.

Also note that for this prepending to work properly, you must make sure that PHP code in plain HTML documents (without the extension) is parsed by PHP as well. Add this line to the file:

AddType application/x-httpd-php .php .htm .html

This might demand a bit more processing power from our web server, but it’s the easiest way to make sure that all our web pages get the server-side special treatment we need. If you’re using a CMS or some sort of application framework, you could opt to cache the server-side encryption.

Fixing the links

Now that we’ve effectively disguised our mailto links, let’s see what happens when someone clicks one of these funny “” links. Well, except for the Error 404 page: not much.

In the end, visitors shouldn’t notice anything unusual about our e-mail links. A few lines of JavaScript will help us to restore these links into their original shape. But wait: what about those 6% that have no JavaScript support? When JavaScript is not available, our “” URLs will not be “decrypted” on the client side, resulting in a 404 error. Apache to the rescue!

Let’s configure Apache so that its module will intercept all URL requests that match the pattern we defined earlier. Apache will then derive the segments that make up the e-mail address from the URL and pass them quietly to an intervening PHP script that undoes the ROT 13 encryption and prepares the address for further processing. This is what the Apache rewrite rule looks like (line wraps marked » —Ed.):

RewriteRule ^.*contact/([A-Za-z0-9._%-]*)+ » ([A-Za-z0-9._%-]*)+([A-Za-z.]{2,4})$ » /yourpath/mail.php?n=$1&d=$2&t=$3 [L]

Note that I had to split the regular expression to fit on this page, but you can download an example file at the end of the article.

Providing an elegant fallback solution

Here comes the fun part! Coming up with a safe, elegant and easy to use—or “graceful”—alternative for visitors to send an e-mail when JavaScript is unavailable, is where your own imagination comes into play. How you do it depends on the type of website you’re using it for, but I don’t suggest using a visual captcha for this purpose: it’s quite likely that people who get to see this non-JavaScript page cannot see the captcha image either (either because they’re using a screen reader to compensate for a visual impairment, or because they’re using a text browser).

One solution would be to offer users a simple contact form that allows them to send a message without giving away the actual address. And if your website already uses a contact form, you could choose to redirect “unencoded” mailto links to that page.

In most cases, however, people do want the actual address. So, for this example, I decided to prompt the user with a question that’s hard to answer by a spam bot, but easily enough for humans. If the right answer is given, the script can safely assume that it’s not dealing with a spam bot and reveal the actual e-mail address.

To see how this works, take a look at the demo page I put together. Be sure to turn off JavaScript to see the degradation in action. If you’re using the Web Developer Toolbar for Firefox, choose .

JavaScript for the rest of us

Now that we’ve implemented a non-JavaScript fallback, let’s make sure that the other 94% of users won’t notice anything “funny” about our carefully masked e-mail addresses. So, let’s revert the page’s DOM to what it looked like before the page’s source code was modified by the PHP script.

First, we need a JavaScript search and replace regex that does exactly the opposite of what our PHP regex did. I wrote a function around it that looks like this (line wraps marked » —Ed.):

function geo_decode(anchor) { var href = anchor.getAttribute(’href’); var address = href.replace(/.*contact/ » ([a-z0-9._%-]+)+([a-z0-9._%-]+)+([a-z.]+)/i, » ’$1’ + ’@’ + ’$2’ + ’.’ + ’$3’); if (href != address) { anchor.setAttribute(’href’,’mailto:’ + address); }

Next, we must loop through all anchors on the page and tie the function to the handler:

var links = document.getElementsByTagNameName(’a’); for (var l = 0 ; l < links.length ; l++) { links[l]. { geo_decode(this); }

And finally, let’s attach the function to the object:

window.onload = function () { geo_decode(); }

To make things run smoothly, a little more code is involved. Take a look at geo.js.php to see how I implemented the ROT13 “decryption.” If you read through geo.phpclass.php, you’ll see that the link to geo.js.php (the file that restores your mailto links) is auto-inserted right before closing the tag with the help of PHP’s output buffering. This means that you don’t have to add a single line of code to your existing documents to make the script work.

Try it yourself

I’ve set up a demo page for you to experiment with, and you can also play around with the source files:

  • contains the Apache directives to prepend and to redirect page requests using .
  • instantiates the PHP class and sets some custom properties.
  • contains the PHP class that does the “encoding” and inserts a tag before the closing element that loads .
  • contains the JavaScript that’s responsible for the “decoding.”
  • contains an example of a usable fallback script for when JavaScript is unavailable.

...or download the ZIP archive (8 kB).

The script works in all major browsers, including Internet Explorer 5.01.

A solution. For now.

Alas, no e-mail address that appears online is entirely safe. Until all spam is banned from this world, we have to try our best not to make it too easy for spam harvesters to steal our addresses (and make money out of them). Now you can protect your addresses in a fully automated way while at the same time being gracious to all users, so you can focus on what’s really important: getting your content out.

This is only an interim solution. We should all be planning for the day when spam bots get smarter, and outwit them when they do. We should not pretend that legislation alone will be the silver bullet to address the world’s spam problem, so web developers will have to continue to come up with creative solutions to fight the problem—and masking your addresses is one of them. I look forward to reading your comments and suggestions.

Get our latest articles in your inbox.Sign up for email alerts.

101 Reader Comments

Load Comments

More from ALA

A DIY Web Accessibility Blueprint

Accessibility remediation projects can be daunting. Beth Raduenzel provides a guide to making and maintaining accessible websites.

Accessibility ·

We Write CSS Like We Did in the 90s, and Yes, It’s Silly

Web development has changed over the past 20 years, but when it comes to writing CSS, we’re still stuck in the 1990s.

CSS ·

Owning the Role of the Front-End Developer

How one developer goes beyond code and fights for a seat at the table.

Process ·

Discovery on a Budget: Part II

Resources may be limited, but that doesn’t mean you have to drop the all-important discovery phase when planning a new venture.

User Research ·

My Accessibility Journey: What I’ve Learned So Far

Manuel Matuzovic explains why and how to up your accessibility game.

Accessibility ·

Design Like a Teacher

Aimee Gonzalez reflects on a difficult user migration project that led to a dramatic shift in how she approaches her work.

Interaction Design ·


Your last assignment is to write an evaluation. I want you to decide whether this course is worth taking and to explain why or why not.

Some things to know about evaluation. Evaluation presupposes some standard of judgment. Standards vary and may not be explicit; nevertheless, there has to be a standard. If I dislike Bon Jovi's music, it's probably because I prefer Bruce Springsteen. Springsteen is my standard of judgment. Obviously, the next question is "What about Springsteen's music do you like?" This implies that my standard is not really Springsteen but that Springsteen performs to my standards. I like a melody I can follow. I like lyrics that go beyond mindless repetition for 3 minutes of a 4-line chorus. Thus, I prefer "The End of the World As We Know It" to "This One Goes Out To The One I Love," both by REM.

Standards are usually relative and not absolute. While most people believe it is wrong to kill, most people also believe killing is a necessary part of war.

In any case, your standards for evaluating this course must be stated in the paper. You needn't be as blatant as "I prefer classes where the teacher grades on the curve but has no mathematical skill." But you may be neither as obscure as "leave us eschew obfuscation" nor as vague as "I love this course."

You should also know there are at least 3 (three) types of evaluation:

First, PRIMARY EVALUATION, evaluation of a state of affairs or of past action. E.g., is it okay to bomb tiny foreign nations into the stone age in order to accomplish our foreign policy? Or, should we have started Social Security merely because some poor and old people were starving in the 1930's?

Second is, unsurprisingly, SECONDARY EVALUATION, which is an evaluation of someone else's judgment (or of someone else's evaluation). For example, book reviews and judgments of paintings, poetry, movies. Or, my critiques of your essays. Here, the evaluation involves 2 (two) sets of standards:

  • those of the creator of the art work and
  • those of the critic.

To make a secondary evaluation, you would ask three questions:

  1. What did the author intend to do?
  2. How well did the author succeed?
  3. Was it worth doing?

Obviously, Student, the answers to the first 2 questions are in terms of the author's standards. The third question depends on the critic's standards.

Third is SELF-EVALUATION. This term is self-explanatory. Obviously, this kind of evaluation has 2 (two) major questions. 1) Am I doing what I intended, according to my standards? 2) Are my standards the right/best/most reasonable ones for me? The issue isn't really #1, which is easily answerable. The issue is always #2, because we don't always know where our standards come from.

For example, to take this course in order to score well on the Advanced Placement Test in English Language and Composition is perfectly reasonable. But, if your standards are to become Shakespeare in one year, you've got a problem. Your standards are unreasonable. First, this course doesn't teach how to write drama, especially not verse drama in iambic pentameter. Second, writing poetry isn't the purpose of the AP Test in Language and Composition. Third, by definition there can only be one Shakespeare.

Having explained the major points of evaluation and of standards of judgment, let's get specific. You'll need to discuss

  1. The purpose of this course
  2. How well the course measure up to its purpose.
  3. Whether the purpose and the course are worth it.

To further complicate the issue, you have to remember that three different answers are possible, depending on which/what standard you choose: CTY's, mine, and yours. If you include your parents, you have four possible standards. You have to consider as many of these standards as possible, especially in terms of (1) and (2) above.

DON'T worry about making me mad. The ulterior motive for this assignment is to inform me of your thoughts about the quality and value of this course.

BUT, I want to read your essay as if it is addressed to someone else. Don't address it to me. You might want to make it a letter to a friend or to parents, or a petition to the CTY office, or an editorial for the school newspaper. The form and the audience are up to you, but name the audience in parentheses beneath your title.

Please put this assignment in the mail by the due date on your schedule.

I will, of course, send a critique of your evaluation (which is to say "my evaluation of your evaluation") two weeks later.

All the Best,
Instructor

The student sent the following note with her essay.

Dear Instructor,

I hope this is what you mean by a piece of persuasion. If you think I got carried away with my strawberry/whipped cream simile (metaphor?), I can see that.

I couldn't look up anything in Strunk and White (that might show) because I'm on vacation across the country in Washington. I know my last essay could have used some work, but I'm not going to say anything about my reasons that could be held against me. Anyhow, hope this one's better.

See you soon,
Student

(Addressed to other talented students)

Strawberries can be sour. "Oh, no," you say, "That's worse than claiming that lemons are sweet and honey is salty." It is true, though.

Take the sweetest strawberry you have ever tasted, including the one you snitched from your grandpa's patch last summer, and douse it in whipped cream. The whipped cream is so loaded with artificial sugar that the natural sugar of the strawberry pales to tartness. However, they still manage to taste delicious together. When you first start a writing course, the juicy strawberry of your writing looks perfect and beautiful, and tastes sweet because you have no higher standards to compare yourself to than your classmates' average writings. But just let it meet the whipped cream of truly fine writers--Twain and Shakespeare and all the others--and your strawberry is no longer the sweetest thing you have tasted. It is surpassed, for the first time you can remember, and becomes so tart it hurts. This hurt is not assuaged by your teacher's ever-so-tactful but firm critiques of your sour strawberries.

After a while, you become inured to the fact that all you will ever write are sour strawberry essays. Then, once you have retraced your pride and reconciled yourself, your writing gets better. You even produce a few (so many few!) blobs of thin whipped cream. Then you must change all over again while you become accustomed to your new writing image.

All the while you are adjusting yourself, you keep trying to write, to meet your deadlines with essays about subjects that stir up thoughts which are sometimes very difficult to admit to yourself, much less to another person, though they remain unseen.

The emotional turbulence and mental storms that are part of the process of growing up, being a writer, and facing yourself, show up in your writing. Your teacher will write back saying, "You seem to be strongly against this" or "Looks like you think it's Hell either way," and you'll remember, "Yeah, I was pretty mad that day, but I thought I'd controlled it better in my writing." Feelings always show through, but that is good. They teach you about yourself and tell you your style of writing in different moods.

If by now you've gotten the feeling that this course isn't worth it, you're probably right. It probably wouldn't do you any good, because you're either too high in whipped cream or too low in unchallenged, falsely sweet strawberries to profit from it. But if you're not quite turned off yet, let me list the benefits.

Your essays greatly improve, though you may not notice. Although there is always a slight danger of relapse into Strawberry-Land, you can always rebuild your bridges and return to the cream of good writing. You can experiment with different styles and obtain an outside opinion of whether they suit you or not. In trying to write a decent essay on a tough subject, you begin to delve into yourself and start to find recesses you may have never suspected were there. Teachers at school love this course, because along with learning to write better essays, you learn the good old mechanics and the correct forms of writing. And, for those who are interested in language or writing careers, it looks great on your resume if you receive favorable comments.

However, this last reason usually fades into the background.

By the middle of the course, you become totally engrossed in why you write the way you do and what can cause your style to change.

You finally emerge from the course feeling humbled and enriched, but learning what to expect from yourself. Make your own decision and keep your own counsel, but if you decide to take this course, and actually get through it, you will at the very least be glad you tried, and may even get a glimpse of your ship coming in, sailing high on many billows of whipped cream.

Dear Student,

Here's your last essay. Yes, I think you worked the strawberries and whipped cream metaphor a bit hard. At first, I thought you were going to argue/persuade by use of analogy, which logicians will tell you is a no-no, but you didn't. You really use the metaphor for comparison and for explanation, a good move.

I think it works generally well, though it's flawed by a diction slip. When you say "artificial sugars," you seem to mean Cool Whip and other non-dairy toppings that look like whipped cream, but aren't. I think you intend "artificial" to mean something like "added" or "stronger."

Your points are well made. The comparison within your metaphor (artificial vs. natural sugars) is familiar, so we understand the whole metaphor's application right from the start.

(Note my check marks in the margin: especially good stuff there.)

You are going on a bit in paragraph 2, working your way into verbosity. I have suggested deletions. Student, I admire your work and your sensitivity to words more than you seem to admit. I think you're a strong writer. I also KNOW that none of us is perfect, or we wouldn't need courses.

Paragraphs 3, 4 and 5 show us the sweet and sour aspects of the course. "Retracted..." and "reconciled..." is good parallelism. I assume "so many few" is a typo.

Don't forget that to persuade, you need examples to support your generalizations. See the readings I sent on generalizations and specific evidence. Thus, the 4th paragraph wants an example of the "stirred up thoughts." Consonant with your persuasive intentions, the example has to be unpleasant but intriguing. Farther on, you're smart to discuss the course by showing some typical remarks. (Note also some ambiguous "they's" floating around that paragraph.)

As you know, conscious artistry is how one becomes good. Athletes worry about where they place their feet, how they bend their elbows, which muscles flex when. Plumbers worry about how to get the solder smoothly all the way around a pipe joint. They THINK about what they're doing, just as writers do. Actually, we think about what we're doing until, like typing, it becomes so second nature that we don't think about it. Your point about why you write the way you write is well made, important, worth remembering.

The conclusion is nice. The clash of metaphors (ship & cream) is strained but not unsightly. It edges toward cute without crossing the border. It has the virtue of reprising the original comparison (and just a bit earlier, "enriched" harkens back to the richness of strawberries and whipped cream).

This is good work, especially considering that you were on vacation at the time. I think you write very well; you show great promise.

Thanks,
Your Instructor

Read more sample essays:

One thought on “Obfuscators Comparison Essay

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *