March 3, 2005

A short monograph on the theme of blog comment spam

I mentioned recently that I’ve been experimenting with ways to trap comment spam, the scourge of bloggers everywhere. This will get a bit technical later, and I’ll lose a lot of you, so let’s start with my conclusion: it’s easy. If you’re plagued by comment spam, it can be prevented - all of it (okay… most of it) - with almost zero effort. And I’m going to tell you how.

Personally, since I moved from MovableType to WordPress, I haven’t been getting a great deal of spam. Nevertheless, it was a minor annoyance, so I decided to implement one or two snares to filter it out. By ‘one or two snares’ I mean a whole minefield of, er, mines, which would explode under the feet of any spammer who tried to cross. And to satisfy my curiosity, I set it up to email me each time a spam comment fell foul of one of my explosives so I could see which tricks were proving the most effective. But like I say, I wasn’t getting much spam, so the results wouldn’t have much statistical significance.

This is where serendipity played its part, for the next day, Mort mentioned to me that she and MM were receiving a lot of comment spam. Naturally, I implemented the same tricks on their blogs as I had on mine, with the rejected comments arriving in my email (as well as the unrejected ones, so I could make sure nothing was getting through that shouldn’t). Suddenly my dataset had increased manyfold, so I decided to collect data for the duration of February and then write a short monograph on the subject. This is it. If you’re not interested in such matters, this will be boring. Don’t read it. Really.

As luck would have it, Mort, Mort’s Mom and myself all use WordPress for our blogs. This was handy because it meant I could use the exact same code for all of them - the only downside is that they perhaps aren’t as representative of all blogs everywhere as they could be, though the techniques I used would work for any blogging system, and I imagine the results would be much the same. But I suppose you’re all screaming to hear just what these cunning things I did are. You are, right? Yes, I thought you were.

There are, in fact, all sorts of things you can do to prevent comment spam. My solution could justifiably be regarded as overkill, but there are hundreds - nay, thousands - of other tricks I could have employed if I wanted to. Some of the most popular, such as kitten’s spaminator, approach the problem by analysing the comments once they’ve been submitted and throwing away those that look like spam based on various telling signs. That’s a good approach, and I’ve found kitten’s spaminator to be very effective, but I wanted to see if I could catch them earlier than that, preventing spam from being submitted in the first place. There are well known ways of doing that too, such as CAPTCHAs, those annoying images with numbers in them that you have to type in a box, often so distorted to prevent computers from reading them that even humans have trouble. CAPTCHAs are far from ideal - besides being an annoyance to users, they have serious accessibility issues (you’re stuffed if you’re blind), and they can be gotten around if you’re determined enough. The basic idea of CAPTCHAs, though, is a good one. It’s a Turing test (sort of) - you present something which is easy for humans but hard for computers. The holy grail of spam prevention is a Turing test that’s so easy for humans, they don’t even realise they’re doing one. I haven’t come up with a way of achieving that yet.

So, there are lots of ways to stymie the spambots. Why, then, am I about to tell you my way? Wouldn’t it be better if I encouraged you to go off and come up with a technique of your own? Surely if everyone used a different method, it would be harder for the spammers to get round them all? That is true, but I’m not sure that’s necessarily a good thing. I want the spammers to get round my traps. When they do, I’ll add some more. It’s an arms race, and it’s in the interest of those of us who despise spam that the race moved forward as quickly as possible, because we’re guaranteed to win it. We have two big advantages over the spammers. 1) It’s very hard to write a program that can pass a Turing test, but very easy to make a Turing test; and 2) no matter how smart they get, it’s simply impossible to make spam comments indistinguishable from real comments because, when it comes right down to it, there is a difference. If there wasn’t, they wouldn’t be spam. It might be that when the difference becomes subtle enough, only advanced AI techniques are able to detect it, and perhaps if the arms race goes too quickly we’ll reach that point before such techniques exist, but I don’t think that will be a problem. My message to the spammers, then, is a simple one: Bring! It! On!

Let’s get down to the details. For those of you not of a technical bent, this would be a good time to go and put the kettle on. Alternatively, here are some pictures of kittens. In fact, unless you’re a codey type person with an unhealthy interest in HTML, I seriously advise you not to read on. Go and look at the kittens instead.

Right, those of you who are still with me, the first thing we do is eliminate all trackback spam by turning off trackbacks and deleting the file that handles them (wp-trackback.php in WordPress). If you like trackbacks then you might not want to go with such a drastic solution, in which case you’re on your own. I just find them annoying.

To detect proper comment spam, I did the following (if you care about the details of these, look at my source code. But don’t look too closely, I know it’s horrific. I didn’t know what the hell I was doing when I wrote this site.):

1. Renamed the page that handles form submissions to stymie any bots that just assume it’s in the default location.

2. Preceded the form where you enter comments with two dummy forms - an empty one (for really stupid bots) followed by one that looks identical to the real one. Both these forms submit their info to the wrong page. They’re hidden from real people using the magic of CSS.

3. Did the same thing after the form, in reverse order, in case any bots start at the bottom of the page and work their way up.

4. Added a hidden field to the form which gets sent along with the other stuff. When a comment’s submitted, it checks that this field has been sent, and that it has the correct value. The value is based on the current date, so changes every day. To get this far, then, bots would have to parse the HTML to locate the correct form (and not be thrown off by the dummy forms surrounding it), and extract the names and values of all the fields. But - and here’s the evil part - the value of the hidden field in the html is wrong. It’s replaced by the correct value after the page has loaded by javascript.

5. Turned the Submit button into an image, which means the x-y position where it was clicked is logged. If no x-y position is given, we take that to mean it was submitted by a bot. This is flawed because a real person can tab to the Submit button and press return, submitting the form without actually clicking. This happened once and a legitimate comment was rejected, so I switched off this test but continued to monitor it. It turned out that this trap never caught any spam.

6. Logged the number of keypresses made when entering comments. Any comments where it’s less than two are rejected.

In total, 2079 spam comments were left and 287 genuine comments. All spams were caught, with one false positive (caused by number 5 in the list, which didn’t catch any real spam, so can be disabled with no negative impact). What traps caught the most spam varied between the three blogs, which isn’t surprising because presumably they’re all in the databases of different spambots.

So which of these methods was successful at snaring spam? Ooh, let’s have some statistics!

Number of spams sent to the default WordPress comment handling page (which nothing in the HTML mentions):
My blog: 0
Mort’s blog: 186
Mort’s Mom’s blog: 407

Number of spams sent to the page which the skeletal dummy form above all the other forms points at:
My blog: 55
Mort’s blog: 0
Mort’s Mom’s blog: 0

Number of spams sent to the page which the not-so-skeletal dummy form immediately above the real form points at:
My blog: 0
Mort’s blog: 825
Mort’s Mom’s blog: 551

Number of spams sent to the page which the not-so-skeletal dummy form below the real form points at:
My blog: 9
Mort’s blog: 0
Mort’s Mom’s blog: 39

Number of spams sent to the page which the skeletal dummy form at the bottom points at:
0

Number of spams sent to the correct form but without a value for the hidden field:
0

Number of spams sent to the correct form with an incorrect value for the hidden field:
0

Number of comments left with no x-y value for the submit button:
My blog: 2 (both legitimate comments - see number 5 above)
Mort’s blog: 1 (legitimate comment - ditto)
Mort’s Mom: 0

Number of spams sent to the correct form, with zero keypresses logged in the comment field:
My blog: 3
Mort’s blog: 0
Mort’s Mom’s blog: 0

Number of spams sent to the correct form, with one keypress logged in the comment field:
My blog: 1
Mort’s blog: 0
Mort’s Mom’s blog: 0

Those last two categories are particularly interesting - I can only attribute them to seriously desperate spammers who don’t have software to do their job, and actually did it manually, pasting in the comment from the clipboard (no keypresses if they used the context menu, one if they did Ctrl+V). Considering the number of blogs you must need to spam before seeing any benefit, these people have really got their work cut out.

In conclusion, then: trapping comment spam is easy. Renaming the page that handles comments (wp-comments-post.php in WordPress) and the bit in the comments form that references it (wp-comments.php), and sandwiching this between identical, hidden forms which point to pages that don’t exist will catch all spam save that entered manually by the truly desperate. These can be detected with a bit of javascript that counts keypresses. Do all that, and spam comment will be a thing of the past - at least until the spammers update their software accordingly, at which point it’s time for the next round. Bring! It! On!

Comments

I can honestly say I’ve never had any problem with Comment Spam, but if I did this wouldn’t help all that much, what with me being on Blogger.

Ah well… if the day comes, I’m sure me and the rest of the Blogger-users’ll pester you for a solution.

Comment by Nick the Greek — March 3, 2005 at 1:05 AM

options 2 and 3 (bogus forms hidden via css) are obviously going to pose accessibility problems (not all screenreaders - particularly older versions - ignore hidden page elements; also, users may be browsing via something like text-only browsers). option 6 presumably relies on javascript (?), so this may again bring up issues for users without javascript (either unavailable or disabled).

Comment by patrick h. lauke — March 3, 2005 at 1:22 AM

You know what works really well? Add an additional input field–something like “Are you a real person?” Add a requirement that this be set appropriately in the comment post code, or else the comment is rejected. Add it to the cookie so people only have to set it once.

It’s much less annoying to users than registration or a code image, and quite reliable. I haven’t seen a single comment spam since we implemented it.

Comment by Dave — March 3, 2005 at 1:27 AM

I still get trackback spam on occasion, but I don’t recall having any comment spam since I started forcing users to preview their comments before submitting them.

The bogus forms would be fine as long as they were preceded by a warning along the lines of “If you can read this message, please use the LAST form on this page” (in the hidden part, of course).

And the Javascript solution isn’t exactly the greatest… I’ve actually read that some workplaces disallow Javascript in the system policy (and naturally prevent installation of alternate browsers).

Comment by codeman38 — March 3, 2005 at 2:43 AM

Just re-read the part about the fake forms, this time noticing that there were some after the real one as well. Change my message in the above post accordingly. I suppose a link to an anchor located before the real form would work as well; spambots still couldn’t decipher it, probably, while screen readers have no problem with links to locations within a page.

Maybe this time it will submit when I actually CLICK the button rather than hitting Enter on it…

Comment by codeman38 — March 3, 2005 at 2:49 AM

Also, any statistic for number of legitimate posts with zero keypresses or without a value for the hidden field? That’d answer the JavaScript question.

Comment by codeman38 — March 3, 2005 at 2:51 AM

you want to get a good compluter like what mine is and go through them great people at aol.
you are just storing up trouble for yourself and making everyone unhappy.
and spam is good for catching fish.

Comment by henry the thirst — March 3, 2005 at 2:55 AM

V1AGR4!

Just kidding! :)

Comment by Let’s gamble! — March 3, 2005 at 4:20 AM

Great work… very interesting.

To be honest I don’t think spammers need to hit many blogs to get results. Why? We had some comment spam a few weeks ago that was up on the site for less than 8 hours. But we’re still getting hits from Google for the terms they were spamming.

Comment by bruce — March 3, 2005 at 5:11 AM

I’ve had zero comment spam on MT since implementing a couple of plugins - one of which allows me to turn off comments after a certain period of time. The other method I employ is to not have an email field (as you’d previously suggested), but I have a feeling this may now be a Bad Thing since the up-and-comingness of Gravatars. How the heck does Blogger manage it? I’ve never yet seen any comment spam on a Blogger blog.

Comment by Carol — March 3, 2005 at 5:26 AM

Hey, maybe someone’s decided they can outsource manual comment spamming to India. I mean, you can pay people in India US$0.60/hr to sit and cut-and-paste comment spams and they’re probably making good money doing it in local currency.

What a brillian business idea. I wonder if I can raise VC funding to start up an “advertising and site promotion business offshore” to do this. :-)

Comment by Dossy — March 3, 2005 at 6:16 AM

I don’t have a problem with comment spam. pLog, which is the blog software that I use has bayesian spam filter that handles all of the spam comments.

I wrote a plugin to handle spam trackback posts. If someone wants to send me a trackback ping, I require that the page that the url points at has a trackback url.

Comment by Paul Westbrook — March 3, 2005 at 6:45 AM

I laughed, I cried, uh, yeah. Anyway, this post rocks. I never really realized that comment spam could be such a thing of sublime delight. I’ve found my true meaning in life…to outwit, outlast, outplay….or something

Comment by Simon Dvorak — March 3, 2005 at 7:21 AM

Nick - as Carol says, spam doesn’t seem to be a problem in blogger. I have no idea how, I’m sure people must try. I’m half tempted to set up a blogger blog and then try to write a bot that spams it just to see what happens :)

codeman38 - statistic for number of legitimate posts with zero keypresses or without a value for the hidden field: zero. I agree that in principle there could be problems with some of these methods, but in practice, there weren’t. But then I’d wager that none of these blogs has any blind readers, so I could probably have got away with a CAPTCHA too.

Carol - what the heck are gravatars?

Comment by SimonG — March 3, 2005 at 7:55 AM

Ooh - lots of people I’ve never seen before.

I’ve never had any spam comments in blogger, nor in haloscan when I used that.

Comment by Lisa — March 3, 2005 at 8:07 AM

Hey, saw this via BoingBoing, and I must say that it is a work of genius. Oh, and anyone want some VIIC0D3N? Just kidding. Oh, and I’ve never heard of Gravatars either (thought it was something from Star Trek!)

Comment by Tavor — March 3, 2005 at 8:32 AM

Gravatars are Globally Recognised Avatars - see http://www.gravatar.com for more info. Haloscan (which incidentally doesn’t suffer from spam comments either) has implemented this as a feature, so I guess it’ll probably be coming to a blog near you soon (see the comments in the blogs of JG and Hutters for little weird blue boxes). The trouble is that these seem to need a valid email address to work properly, which is a bit of a pain really.

Comment by Carol — March 3, 2005 at 8:47 AM

OOh all those strangers. Isn’t it exciting! I have Gravatars? Wow. That’s technermological, isn’t it?

Comment by LordHutton — March 3, 2005 at 9:35 AM

OI. MONGERS. NO!

That’s not a Turing test!

http://cogsci.ucsd.edu/~asaygin/tt/ttest.html#intro

um… *goes back to reading otherwise very interesting monograph*

Comment by sweavo — March 3, 2005 at 9:59 AM

Hmm, it seems Haloscan have vanishificated the Gravatar business. I think it’s now an option which needs turning on as opposed to being the default which I think it was last week.

Comment by Carol — March 3, 2005 at 10:04 AM

Ooh look! A shiny bunny!

Comment by JG — March 3, 2005 at 10:05 AM

Came here via boingboing also. Interesting approach, though I wouldn’t like to depend on javascript, so suggestion nr. 4 might be a bit of a problem. In my experience, most spam happens with old entries, so closing comments after a week or two solves much of that.

Comment by Ingmar — March 3, 2005 at 10:08 AM

None of these solve the real problem I’m having with comment spam, which is the drain on my server that the spambots are causing.

I had a real problem with comment spam on my MT blog so I installed MT Blacklist. Even with Blacklist installed I was getting 40 to 50 spam a week get through on unrecognised URLs that I’d have to add to the list. However, the big problem was that the spambot was still accessing the form and submitting the comment before MT Blacklist could decide whether it was spam. During the months of December & January Blacklist reported over 13,500 spam attempts blocked but those attempts add up to some serious bandwidth.

I’ve implemented a CAPTCHA which has reduced the load because the comment doesn’t have to be processed, but the attempted access still causes bandwidth issues and has forced me to change my hosting plan to a dedicated server to account for it.

Comment by theaardvark — March 3, 2005 at 10:09 AM

sweavo - ah, yes, but, er… *scurries off to try and find a broader definition of ‘Turing test’, and fails* …oh.

Though that is what the T in CAPTCHA stands for.

Comment by SimonG — March 3, 2005 at 10:09 AM

Very cool. My own approach (possibly borrowed off my bro) was to remove the URL field and throw out any comments posted with a URL attached. The only comment spam that’s got through was so amusing I’ve left it attached to my blog. It was talking about crocheting squares and didn’t provide any kind of link to anywhere else, so utterly failed in its conception as a planted link.

However, trackback spam has been ramping up ever since.

Comment by sweavo — March 3, 2005 at 10:14 AM

Nice, but isn’t it sad that it has gotten to this point?

Comment by Steve — March 3, 2005 at 12:08 PM

Really interesting read, and a few good ideas there.

I’d like to point out that closing comments is a *bad* idea in my opinion : a lot of visitors come lately from a Google query, especially on posts with a ‘technical’ flavor (such as bits of code for example). Closing comments is just a rude way not to allow them to give feedback or ask questions. A much better solution is to auto moderate comments after XX days, rather than closing them.

Another point worth to mention : don’t trust too much simple custom solutions such as just adding a field like “Are you human ?". Spammers sometimes take time to elaborate bot for *your* site only and thus take your special field into account. I run a small community site with a home made Perl engine, which I’m the only one to run in the world : it has been spammed, and someone made a bot just for it.

Comment by Ozh — March 3, 2005 at 1:52 PM

It’s very interesting to see an approach at defeating the spam software from posting in the first place. 99% of the approaches I’ve seen all revolve around processing the post data that is sent to the server.

Spammers are clearly targetting places where standard applications (like wordpress, drupal, moveabletype) etc are used. They can then write code once, and run it against a database of sites. It’s impressive to see how little work is potentialy required for an INDIVIDUAL webmaster to make a spammer have to roll custom code to hit their site.

Serious automated spammers aren’t really going to bother, but your metrics showing that people may well be doing some of this work manually are concerning.

It’s also interesting to see that the spam bots that have hit you also appear to have some form of intellegence to do so.

Perhaps we could add to the technique list javascript writing of the form to the document. But, I suspect it looks like their system already does this. Perhaps they are using web test automation tools. That might explain that, replaying behaviour into an internet explorer window.

Anyway, this approach is clearly a very individual approach. It’s also important that effort continues to be poured into the comment spam catching post post stage, as that is something that can be implemented by all systems, on all sites.

Comment by THEMike — March 3, 2005 at 2:16 PM

One really great thing you could add to the post would be some actual code examples from your modified files!

I’ve found the CAPTCHA works pretty well for me… cut my spam by about 3/4. But it definitely raises accessibility issues.

Thanks for the tips!

Comment by Chris — March 3, 2005 at 3:50 PM

Very interesting. However, I do occasionally compose long comments in a text editor and then paste them into the comment field, mostly so I can save them intermittently in case something crashes while I’m writing the comment. People using assistive software might also do this, if it’s easier to write in another program. So filtering out low-keystroke comments, while a fascinating idea, may not be the best criteria.

Comment by metahacker — March 3, 2005 at 3:51 PM

I wish I understood this as someone has just started spamming my works website, for which I am responsible. Unfortunately, none of the above means diddley squat to my non-technermerlogical brain

Comment by Aoj — March 3, 2005 at 3:56 PM

While I appreciate the accessibility and corporate environment (non-javascript) issues, it’s this blanket “lowest-common-denominator” approach that leaves the biggest holes spammers are driving their trucks through. Everything has SOME system requirements. I don’t think it’s unreasonable to expect people to upgrade their software and keep pace.

Comment by elvix — March 3, 2005 at 4:47 PM

The accessibility problem has een exacerbated by the DDA in the UK (Disability Discrimination Act). At first it was about penalising companies that don’t put ramps or other reasonable measures in place for access to their premises for “the disabled", but it now applies also to web presences.

The reason it’s a problem is that it’s all a bit new and nobody’s sure how it should really be interpreted. In my opinion there is scope for screen reader software to be improved but while it isn’t improved it could be argued that you are dismissing certain portions of the population.

Another question is whether personal sites come under this law. I don’t rate my content as being important though I might give it a review if I actually knew someone partially-sighted who was interested. If I were a company this wouldn’t be good enough I’m sure, and I’d be wide open for a suit under the DDA.

Thus, if you are looking for a general solution, accessibility is potentially a very serious issue.

Comment by sweavo — March 3, 2005 at 6:27 PM

This article:
http://www.candygenius.com/spampop

Is very interesting. Worth a look. Instant kill of 99% of spam, allegedly.

Comment by THEMike — March 3, 2005 at 7:28 PM

I’ve got two tins of it in my cupboard.

Comment by toxic trousers — March 3, 2005 at 8:33 PM

My techniques: 1) Wrote my own comment script rather than using MT’s built-in one. (Did this before I switched to MT from a homebrew blogging tool, and already had comments in it, so I worked out a way to keep it.) 2) I have a dummy field (not an entire form) hidden by CSS. If this form has anything in it, I throw away the comment. 3) I strictly check the lengths of the name, URL, and e-mail address fields. If these are longer than are allowed in the MAXLENGTH, I throw out the comment. 4) The “real” comment field’s name is not fixed, but is generated from an MD5 hash of the requester’s IP address, the current date, and the thread ID. Basically, each comment form retrieved is good ONLY for one day, for a specific thread, from a specific IP address.

Some spammers seem to be spamming “manually” – not really manually, but with bookmarklets of some sort. #2 and #3 catch those magnificently. #4 (and the fact that it’s not MT) catches the dumb bots.

I also have a filter list that allows me to flag comments for manual moderation based on certain criteria such as message content or IP address. Anyone who gets a spam through the above gauntlet gets his subnet put on this filter list so I can catch him next time.

Comment by Jerry Kindall — March 3, 2005 at 9:59 PM

So… I’m NOT very good with anything better than html, but I have major comment spam problems. I have a lot of ideas for solving them, but I can’t test them myself and don’t understand the full implications… Shouldn’t it be possible to just hide one of the standard fields like “e-mail” from the user using CSS and then screen any comments that fill in the field anyway?

Comment by Jack Phelps — March 3, 2005 at 11:26 PM

You make it sound so easy to win the spam war. Simplistic techniques that are right out in the open (ie. client side) will be the easiest to abuse once spammers catch on. What will you do when spammers figure out they can just add $_POST[’keystrokecount’] = 3? Or better yet, add a JavaScript interpreter to their software?

Bragging about stumping spammers will only further attract their attention. I’m still 100% spam free, too, and I didn’t have to resort to using JavaScript.

Comment by c. s. — March 4, 2005 at 2:14 AM

The keypress-counter would typically trip me up. I type in an editor with spell-checker (cause I need it) then cut ‘n paste it in.

Comment by sjon — March 4, 2005 at 8:17 AM

One could use some timing tricks. There is an induction period for a real human to enter a keystroke because he must first recognize where to enter his comment and since even a fast typist can only type 60 wps one can easy calculate from the length otf the message the minimum time needed to enter it. Add a little more time for mental composition time and you could filter out robots easily. If a robot does get through, then it won’t be a very effective one as it could only acheive a low rate of comments per hour.

Comment by shaitan — March 4, 2005 at 3:19 PM

For a spammer, it would just be a matter of capturing the POST content that is sent when a real comment is submitted, then replicate it. So the dummy forms would prove useless, so would the keypresses, the field with the date, and the submit/image button…

there you go. all it would take is a minute coming here, filling out this form, capturing the POST (easily done with a Firefox plugin), then submitting at will to your blog within that day.

Comment by Micah Goulart — March 4, 2005 at 4:23 PM

I stumbled upon this rather cute Wordpress plugin that asks you to match a picture before you can submit a comment.

http://www.cannedmonkey.com/wordpress/index.php?p=4#comments

You can even put kitten pictures in it.

Comment by mare — March 4, 2005 at 6:14 PM

I think shaitan is really on to something there. Especially because the raw data could all be collected client-side and then all the processing could be done server-side (thus making *valid* post data more elusive).

Another way to counter Micah’s point would be to perform something like a character count via JavaScript and submit that along with the form, and then do a character count on the server side and compare the two. Then, when someone just grabs the POST data and resubmits it with their URL and SPAM message, assuming they don’t look far enough into your script to find out what it is, the ‘mystery number’ will rarely, if ever match the character count of their new script-submitted message.

Of course, as with a lot of these things, as soon as they are popularized, they will be rendered ineffective.

Something else to consider would be incorporating XMLHTTPRequest technology to report something dynamic (like mouse movement or key presses). Scripted submissions would not have any of these things.

This is a fun topic. Thanks for the great ideas.

Comment by Nathan Logan — March 5, 2005 at 12:56 AM

I just munged my MT comments cgi so that it doesn’t accept any comments that include the string “http://". Hacky, but it’s worked so far.

www.pickabar.com/blog/archives/2005/02/fighting_commen.html

Comment by gerrard — March 5, 2005 at 3:49 PM

One of the things I recently added is an exponential time limit: I keep a small database of comments/posts that just has ip address, number of posts, and the time of the post. It gets cleared if the time is 24 hours old, but if not, the cube of the number of posts is multiplied by 15 seconds and you can’t post again if there hasn’t been that much elapsed time.

So, the second post requires waiting 2 * 2 * 2 * 15 seconds, or 2 minutes.. the third is almost 7 minutes, fourth around 15 minutes, etc. Of course you could multiple by 5 seconds if you don’t want to be that aggressive or just square the posts if you want to be even less so. Or multiply by 60 seconds or whatever.

I found that very, very few legitimate folks ever make more than one post per session and if they do it’s usually later.

I also add the ip to a total ban database if someone keeps trying over a small number of times without success, which is of course unfortunately punishing stupidity too - or maybe that isn’t unfortunate. I add ip’s to that manually too if someone does make inappropriate posts.

Comment by Tony Lawrence — March 5, 2005 at 7:01 PM

to bad this is a fix for wordpress. my blog is hand coded and i sometimes get comment spam. i don’t have any methods to eliminate spam but i have found when i do get it simply imposing a ban on the offending party eliminates the problem for months.

Comment by eric — March 7, 2005 at 1:52 AM

I’m getting this error running the MTKeystrokes plugin. Any ideas on help?

***************

An error occurred:

Undefined subroutine &MT::Plugin::Keystrokes::tag called at plugins/keystrokes.pl line 111.

***************

Comment by Craig P J — March 9, 2005 at 2:24 AM

Thank you!

Comment by Walter — March 16, 2005 at 9:54 PM

I’ve started getting spammed by a porno site. Has anyone found ways of foiling the spammers on blogs hosted by blogger.com or blogspot.com (owned by Google)?

Comment by Wayne Leman — May 21, 2005 at 5:58 PM

I think that “code image” is the best solution, easy and clean.
I think that the tag ” rel="nofollow” ” is not a solution, it generates confusion.

Comment by Francesco — July 11, 2005 at 5:15 PM

As a web programmer, I’m currently required to write a password vault type tool for users of a client’s site. They want it to store logins to websites so that users can 1-click login to sites in new windows. So far I’ve had about 80% success rate with creating these logins - which are basically posting to a form automatically - in this case to log them in. Comments from doing this:
1) Timestamp fields can be a hassle, but only if modified by some unknown formula.
2) Serverside scripting is the only way to really get around someone posting to your form handler without using your input form from your site, which is how most ’spammers’ work, I guess, but with a litle php/asp/whatever, you can easily stop this from happening (those methods represent the 20% of sites I could not get through to)
As a suggestion for stopping automated attacks, You could simply add a series of checkboxes, hiding all but 1 of them with css, and require only that 1 be checked - would slow down most automated systems that don’t parse for css-hidden content.

Comment by Dash — July 11, 2005 at 7:18 PM

Well, nice try, but what about spam made by hand, don’t you get hit by that?

Comment by Here we go — February 1, 2006 at 4:10 PM

It’s a shame that the blogging world is becoming increasingly subject to the same shameless promotional spam that has caused so many problems with emails at companies both large and small. Thank you for sharing your efforts at reducing comment spam!

Comment by Geoff Brown, Deephaven MN — April 5, 2006 at 7:09 PM

Interesting article. To spam blogs, guestbooks and so on is bad of course but only if it done with maps and very often. To promote some sites by your own hands in a link field is absolutely normal :)
Registration, captcha, word filter - good solutions to fight bots

Comment by Chester — April 6, 2006 at 8:05 AM

i recently was blogging on a site that has the preview comment feature so you can see what your typing as you type it. i accidently cut and paste very personal information (credit card number etc. into the comment box and it showed up in the preview as well. my question is, can the blog operator see what i put in the comment box as i’m typing it? or will they have access to the info i pasted in the comment box before i hit “send"? thanks

Comment by art warner — April 18, 2006 at 8:44 AM

hi warner!
nobody can see your typing before click the send box. just a hacker can do this, he watches your type with trojaner ;)

Comment by dennis — April 23, 2006 at 1:44 AM

Great thinking. I particularly like the unused invisible forms :)
I’ve been receiving about 22,000 comment spam per month for a while now.
It really started to get out of control about 9 months ago.
Other tactics I’ve tried (but not measured the success of):
Creating the comment form via DOM manipulation, never having any actual HTML form.
Checking to see if the from @domain exists, using an AJAX lookup.
Requiring users to login in order to comment (but giving them a long-lasting login cookie, and the ability to ‘join’ in the same post as when they comment).
Allow non-registered users to comment, but put their comment through an automated filter (Akismet). If it is flagged as spam, send the user an email with a link asking them to confirm that they are human. If they’re not human, or if the email address didn’t exist, their comment is never approved. If they do click the link, then I’m notified that a potentially-spam comment has been contended, and I manually approve or deny that one.
I’m currently implementing this last one, and I hope it’ll make the most difference.

Keep rockin’ tha Turing test.

Derek

Comment by Derek Martin — August 4, 2006 at 12:35 AM

I think the days of blog spam were spam spiders entered data are almost gone. Although I must say, I like your idea with the hidden HTML form.

Spammers do not sleep! I heard about a rumor where spammers hire people in countries like Nigeria, China to write blog spam.

Comment by anonymous email — September 1, 2006 at 2:48 PM

I think your ideas are great. Still it won’t work long, spammers will find another methods to spread their information. What is right, that we are full up with them and try to use every known methods to get free of it.
Anyway I wish good luck for everyone!

Comment by Vincent Dallas — October 2, 2006 at 10:22 AM

One of the tactics I used in creating the blacklist was to include periods after the words so I have spamword. instead of spamword.

Comment by john beck — November 7, 2006 at 5:33 AM

the best thing i know are the numeric images where you must enter a code, before posting is ready!!

Comment by Uschi — November 10, 2006 at 10:32 AM

My spam problem had progressively increased. Thee main problem seems to be these damn chinese sites that are difficult to understand and infiltrate with a response. This forced me to begin developing an active spam guard to trace the sourdes and fuck up the offending server.
keep up the good spam fight
de

Comment by famousde — November 12, 2006 at 1:45 PM

Pardon the typos - not drunk at the moment, just coding all night and fingers are rubbery. I meant to type that BDBP has started to implement measures that strike back. There is no incentive for a spammers to stop infiltrating your processes unless the spammer incurs damage by doing so.

Comment by famousde — November 12, 2006 at 2:05 PM

Kudos on an elegant approach to stopping spam. I especially like the keystroke trick.

I *think* this is related to what you’re trying to do: I thought it might be interesting to see if I could write a program for detecting spam-pings (spings) in some of the big ping-beacons, like weblogs.com, blogsearch.google.com, and blogrolling.com. My comment name link is the link to the site. If you check out the “services", you’ll be able to download a filtered changes.xml containing weblogs/pings that have been flagged as suspect spings. Anyways, I wonder if there’s any value in cross-referencing those filtered change lists that identify spings (and spam blogs too?) with your data.

I’m not fishing for ad clicks or impressions (I have Google AdSense on the site). Just interested in collaborating and discussing…

Comment by whateverdood — January 12, 2007 at 3:17 AM

Im wondering if anyone knows how I can stop spam when creating forms or discussion webs with frontpage?

Comment by Holly — June 18, 2007 at 3:51 AM

Fuck me, Goodway. That’s a good set of comments to wake you from Blogsleep!

Comment by lordhutton — September 30, 2007 at 12:12 AM

Leave a comment

Sorry, the comment form is closed at this time.