About Spam, Servers & Downtime

Over the weekend, Noscope has been taking a nap. An unwanted nap, but a nap nonetheless.

I have yet to discover the cause for this naptime, but I have a few theories. Up until the crash, I had been receiving an increasing amount of comment-, and referrer-spam. Additionally, my 404 logs showed a huge increase in traffic for the non-existant file mt-comments.cgi, which is of course the Movable Type comment file.

On the other hand, I have before made a few mistakes with regards to infinite redirects. Ahem.

So I ask the tech-savvy of you:

  • What’s a good way to take the strain off the server, whether it’s a spammer, 404 traffic, or just plain traffic?
  • Where can one find a good 404 detection script that writes missing files to a .txt file? Dave Child came to the rescue!
  • How can spammers be blocked, if at all?
  • Supposedly there’s a bug in PHP bug that “forgets to tell apache to stop writing”. How to fix this? Not applicable.

Known Tips

  • WordPress users: Use Staticize Reloaded to cut back on SQL usage.
  • WordPress users: Rename wp-comments-post.php to something random, and update this in your theme comments.php.
  • MT users: Rename mt-comments.cgi to something random, and update templates that link to this.

20 thoughts on “About Spam, Servers & Downtime”

  1. Nik says:

    Shame about the recent hassles.

    I’ll be watching the progress of this thread tho’ as I’ve been hit by comment/referrer spam for a while now and have recently decided to take my blog offline (No one read it and I only ever redesigned it anyway!) 😀

  2. Dave Child says:

    ? What?s a good way to take the strain off the server, whether it?s a spammer, 404 traffic, or just plain traffic?

    Spammers end up requesting a whole page when they try and submit rubbish, increasing server load fairly dramatically if you are a target. Best bet is to detect as many of the spammers as possible and send them a very small basic HTML file. Authentic users that get the HTML file by accident will find it contains intructions on proceeding and a little info about why they’re seeing an “access denied”-like page. Spammers won’t see the page. Server load drops.

    ? Where can one find a good 404 detection script that writes missing files to a .txt file?

    The chances are this is already happening. Do you have access to your log files? All the info you need will be in there.

    Alternatively, you could get a programmer to put one together for you. It wouldn’t take long at all (I’d be happy to knock one together for you if you like). I don’t know if someone’s written one already, but http://php.resourceindex.com would be a likely place to find something like that.

    ? How can spammers be blocked, if at all?

    At the moment, not very easily. Referrer spammers can be blocked with a relatively simple keyword-based blacklists. Comment spammers can be stopped, to a degree, by placing a comment entry form on a seperate page to an entry and changing your script names so your comment forms can be found as easily (if indeed at all) on the major search engines.

    ? Supposedly there?s a bug in PHP bug that ?forgets to tell apache to stop writing?. How to fix this?

    I’m not sure exactly what you mean. You can set a timeout on requests in your php.ini file (usually this is set to 30 seconds or a minute by default) which means an infinite loop in a script will be cut off after a while. If there’s a known bug in PHP to do with how it communicates with Apache, I imagine that it will have been fixed in the most recent versions, so upgrading PHP might be wise.

    Alternateively, manually adding a call to “exit()” or “die()” at the end of every script would likely solve the problem, as that explicitly tells PHP to stop processing and send everything to the user.

  3. Joen says:

    Dave,

    Great feedback! Thanks. I’m sure it’ll be helpful, not only to me, but to readers of this.

    Authentic users that get the HTML file by accident will find it contains intructions on proceeding and a little info about why they?re seeing an “access denied”-like page. Spammers won’t see the page. Server load drops.

    I know you’re not a WordPress user, but I’m gonna say it anyway: This smells like a plugin!

    The chances are this is already happening. Do you have access to your log files? All the info you need will be in there.

    I have limited access, i.e. I can download the server log / error log, but I don’t have access to install Webalizer or the likes. I’m stuck with web-based stuff for the moment.

    Alternatively, you could get a programmer to put one together for you. It wouldn?t take long at all (I?d be happy to knock one together for you if you like). I don?t know if someone?s written one already, but http://php.resourceindex.com would be a likely place to find something like that.

    I will definitely check that link out, thanks. I used a script for my old host, but for some reason that script doesn’t work on my current host.

    Comment spammers can be stopped, to a degree, by placing a comment entry form on a seperate page to an entry and changing your script names so your comment forms can be found as easily (if indeed at all) on the major search engines.

    That’s a tricky one. I’m unsure as to what files in my system refer to the WP comment file, so renaming it would be somewhat of a hassle.

    Could a plugin be made, that had all the contents of wp-comments-post.php and instead relied on mod_rewrite rules to create a random filename? Somethings gotta be possible.

    If there?s a known bug in PHP to do with how it communicates with Apache, I imagine that it will have been fixed in the most recent versions, so upgrading PHP might be wise.

    I mention this because our Sysadmin at Titoonic mentioned this. Appearantly, this time-out doesn’t work with so-called “dead clients”. I.e. if there’s a huge amount of dead clients, the server will crash (because the timeouts won’t time out). He also told me that this but is still classified as “Open” in the PHP 5 bugzilla. I’m not sure if this even can be what’s happening to Noscope.

  4. Dave Child says:

    Could a plugin be made, that had all the contents of wp-comments-post.php and instead relied on mod_rewrite rules to create a random filename? Somethings gotta be possible.

    A random filename wouldn’t necessarily be necessary. More important is that it’s not the default file name – anyone searching for blogs to spam will search for a set list of things, for example “powered by wordpress” or “wp-comments-post.php”, to filter blogs from sites with no comment facility. Once they’ve got a list of blogs from that, it’s pretty easy to spam them manually or automatically.

    I know you’re not a WordPress user, but I’m gonna say it anyway: This smells like a plugin!

    I can’t see any reason why a plugin wouldn’t be possible. On my site, I’ve done it with a .htaccess file and a simple PHP script (there’s an article about it on there explaining how it’s done on my site – should be just the same no matter what blog system you’re using).

    I’m stuck with web-based stuff for the moment.

    The first thing to do is create a file named “404.php”. Add whatever HTML you want in there to display your 404 message etc. Then you can add code to the top to do whatever you like. For example, this will email you every time a 404 is displayed:

    <?
    if ($REQUEST_URI!="/favicon.ico") {
    mail("me@example.com","404 Error","http://".$_SERVER["HTTP_HOST"].$_SERVER["REQUEST_URI"] . " appears to be missing.nnPossibly referred from ".$_SERVER["HTTP_REFERER"]."nnUser agent: ".$_SERVER["HTTP_USER_AGENT"]."nnIP: ".$_SERVER["REMOTE_ADDR"],"FROM:me_example.com");
    }
    ?>
    

    While this will write 404s to a text file:

    <?
    if (($REQUEST_URI!="/favicon.ico") and ($fp = fopen('404.txt', 'a'))) {
    fwrite($fp, "http://".$_SERVER["HTTP_HOST"].$_SERVER["REQUEST_URI"] . "n");
    }
    ?>
    
  5. Joen says:

    Don’t worry about it, it’s my problem: I don’t have proper ways of posting code anyway.

    I think the best way would have been to wrap it in a pre tag, though the code tag should also be available.

    I’ll update my “allowed tags” page asap!

    And THANKS for your feedback!

    More important is that it?s not the default file name

    Currently I’m pretty sure renaming the file name would be a hassle. Either some core files need changing, or a plugin could be made to change allow other filenames maybe.

    there?s an article about it on there explaining how it?s done on my site – should be just the same no matter what blog system you?re using

    I just noticed that you have a bonanza of articles on that very topic! Now start pinging blo.gs:”http://blo.gs” and I’ll link you up 😉

    For now I’m going to try your code! Thanks!

  6. Dave Child says:

    Oopsy – just noticed a mistake in the second piece of code I posted earlier. After I finish this comment, no more typing for me today. I clearly can’t control my own fingers :).

    “and (!$fp = fopen” should read “and ($fp = fopen”. No exclamation mark. Otherwise it’ll be an access log, not an error log!

  7. Joen says:

    Thanks Dave, I’m trying it now. I modified the code in your original post.

  8. Jeff Minard says:

    Use Staticize Reloaded to cut back on SQL usage.

    And php processing time. (Eliminates both since it’s pulling the whole page from a text file.)

    If you have access to your log files, you can download them and run them through the trial version of Urchin and see some good statistics (including 404’s)

    As for spammers, I have found the wp-hashcash plugin to be extremely effective. Protects against comment spam from a real form submission as well as direct wp-comments.php submissions.

  9. Chris says:

    Joen, fantastic stuff. I swear you have some of the best commenters. Talk about harnessing the web. 🙂

  10. Jonas Rabbe says:

    Currently I?m pretty sure renaming the file name would be a hassle.

    I haven’t looked at how it’s done in WordPress, but in Movable Type you can rename your comment script, and set the new name of your comment script in the setup script. I’m actually going to be a bit surprised if that’s not possible in WordPress.

  11. Chris says:

    I did some poking around real quick in my sandbox. You can change the wp-comments-post.php file then just change the reference to it in comments.php in your theme. Breaks nothing.

    All I did was a grep of the WP install for wp-comments-post and that comments.php is the only file that references it.

  12. Chris says:

    apologies for a double post. I just did the same on my live site. Works fine. Now to see if it slows down those damn poker bots.

    mv wp-comments-post.php (clever word).php

    edit comments.php and change one part of the code on the form action=”” line to reflect the change.

  13. Joen says:

    Chris, yep, indeed the commenters here make the contents. It’s fantastic. For that reason alone, I’ve been spending a lot of time thinking about how commenting in WordPress can be improved. Among other things, they include threaded comments

    As for the WP comments, I really thought it was referenced several more places! I’m going to try that out right now!

  14. Chris says:

    As for the WP comments, I really thought it was referenced several more places! I?m going to try that out right now!

    I thought that it would be as well. But, apparently the comments.php file from your theme is the one that gets referenced but that actual comment-posting script is only referenced that one time. So, unless the spammers are parsing your comments.php (doubtful for now) this should stop a bit of it.

    If not, the comments.php file is only itself referenced in about 5 other locations. I’m not certain on that number as a grep for comments.php also returns edit-comments.php.

  15. Dave Child says:

    Now to see if it slows down those damn poker bots.

    By itself, it probably won’t make the world of difference. As part of a larger strategy, it should. Think of phrases and file names they might be looking for to find your site in the serps – that’s what needs changing. Renaming “wp-comments-post.php” by itself will make no difference if they’re finding your blog by searching for “Leave a Comment” or “subscribe to comments” or “trackback”. The best way to stop comment spammers is not to identify their spam, but stop them identifying your site as a blog.

    For comment spam to be profitable, it would need to be on a reasonably large scale. If I were comment spamming, I’d want to find a large list of blogs whose topics were closely related to the topic of the site I was trying to promote. I’d only be interested in blogs because such a high percentage of them allow comments and such a low percentage of other sites do. So I’d search for things like the above, plus “web design blog”, “web development blog”, “php blog movable type”. If I was writing an automated tool, I’d search for the default comment entry script name for the most widely-used blogging systems.

  16. Chris says:

    By itself, it probably won?t make the world of difference.

    Point taken. But, it’s at least a first step. Notes regarding “leave a comment” and their ilk taken under advisement.

    I don’t suppose you could just come by my house and look over my shoulder while I work on my site? 🙂

  17. Joen says:

    Thanks all for your great feedback, I’ll get a better read through later tonight.

    Dave,

    I found a bugzilla link for the PHP bug I mentioned.

  18. Dave Child says:

    Having a look at that bugzilla link, Joen, I’d really not worry about it. The chances of it actually happening are very very small, and you’d need millions of visitors for it to become any sort of problem. I’ve got several sites on the same setup as you, and haven’t had a problem as a result of anything like a PHP/Apache timeout bug.

    A server crash like you describe is more likely down to an overloaded server. I don’t know anything about your hosting, but more sites are on shared hosting, and often if the traffic on one goes up significantly, the server can’t handle it (usually because hosts stuff sites on a server like there’s no tomorrow, leaving the server unable to cope with a spike in traffic).

  19. Joen says:

    Thanks Dave, for all your great feedback and help here.

    The 404 script is working great, and I’ve renamed the comment post file. Strangely enough I’m not getting “killed comments” email notifications any more, like I did prior to this.

    I will look in to the other suggestions, and eventually do a writeup.

  20. For the record, never, ever email yourself on a production website. You would be amazed at how much email a 404 can generate on a busy site, bringing down systems and causing exceeding massive headaches for your hosting provider.

    Please, don’t do it. Great for dev. B-A-D for production.

    Jonathan

Comments are closed.