This week, Amnesty International’s irrepressible.info campaign website (developed by the folks at Soda) was simultaneously Slashdotted and Observed (the Observer being a popular UK paper) and lived to tell the tale. The irrepressible.info campaign is all about human rights and freedom on the net.
Also, some news that I forgot to blog a couple weeks ago. Diggdot.us recently appeared in MacWorld as one of the go-to sites for geeks.

It’s great to see these successful sites with TurboGears under the hood.

14 Responses to “irrepressible.info TurboGears site slashdotted”
  1. Graham says:

    When you say “slashdotted”, do you mean they brought the site down? Or did they manage to live through the onslaught?

  2. tazzzzz says:

    I just added “and lived to tell the tale” to my posting. Yes, the site did just fine with the slashdotting.

  3. Florian says:

    Hm, that’s kind of useless info in itself ya know?

    How about:

    1) a graph showing hits/second over the few hours it peaked, together with a response time and cpu/io usage.

    2) a description what measures have been taken to ensure high performance.

  4. Joseph says:

    Yeah, this is not much more than hot-air marketing…and does the pledge site actually do anything other than replicating functionality that basic HTML pages could do? Doesn’t appear to be anything dynamic about it, other than the very basic pledge-signing form.

    I would HOPE a Slashdotting of those pages would have no effect!

  5. tazzzzz says:

    Dudes, lighten up. If you had an open source project and a blog, and users of your project were getting some good press, would you not blog about it?

    Sure, irrepressible.info isn’t gmail. But, it’s still a nice site, a good cause, slashdotted and TG-powered. I don’t know any of the details of their implementation or their load during the slashdotting. There is one additional bit of dynamic behavior which is the feature to display “repressed” information on other sites, and I’m not sure how widespread that is yet.

  6. Florian says:

    Actually, there’s far to little consideration for load scenarios among python web programmers.

    Hopefully you know beforhand a slashdotting that your site’ll hold up. The way to knowing that is to prepare for it.

    Sure idle pride is all pretty narcism, but contribution to a body of knowledge is better.

    Here’s how I make things scale.

    * Caching Proxy
    * Load balancer to a server farm
    * one app process handles about 16 full dynamic requests per second
    * per machine I can spawn about 3 such processes before performance degrades
    * per database I can spawn about 5 such machines before the db slows down (oracle)

    ==> ~240 fully dynamic rendered requests per second. However the caching proxy handles thousands of requests per second and most don’t hit trough to the app at all.

    Max response time per thread in this scenario is 6 seconds (response time drops dramatically per thread when the load is a little relieved.

    So how about you? Did you just get lucky you survived slashdotting?

  7. David says:

    In the interest of providing some numbers I can say that a slashdotting of my site resulted in about 100,000 hits during the first hour, which is an average of about 30 hits per second.
    Florian, I’m not sure which site you’re serving that receives thousands of requests per second (~10 million hits per hour), but I think a slashdotting is the least of your worries as it would barely be a blip on your radar.
    At any rate, I congratulate the TurboGears guys and the CherryPy guys for writing great software. I hope the use of TurboGears by these main-stream sites helps to dissuade any naysayers.

  8. Richard Jones says:

    Florian, I disagree. The Django project talks a fair bit about coping with load.

  9. Florian says:

    @David,

    let’s just say it has to do with a big country (india) and a lot of devices (mobile). :D

    Though of course I haven’t thousands of requests per second, but I’ve seen peaks of one or two hundred, and when I test the system I max it out to thousands of course (always be ahead of the load ya know)

    @Richard Jones, didn’t mean to imply Djang wouldn’t. In fact I know django is faster and more thought out in the “How to host a high-load website” department then Turbogears. What I mean to imply is that apart from django and say asyncore pretty much nobody seems to bother with high load.

  10. tazzzzz says:

    Florian: Sorry for dismissing your original comment, which was something that I’m certain that some people would be curious about. I don’t think I would have had it not been for Joseph’s comment which was a bit more flippant.

    While I do care about raw throughput to an extent (and there are a couple of efforts underway that will doubtless improve TurboGears’ raw throughput), if that was the primary concern I’d be coding in Java or C.

    To me, a far more interesting than “how was it set up” would be “how did you predict how much traffic you’d get?”

    Ability to scale has so much more to do with application design than raw throughput. If you manage to have a reasonable idea of what kind of traffic you’re going to get, you can choose an application design that makes sense. If you’re writing an app that runs on an intranet for 10 people, you can pretty much write the code however you want. If you’re going to be slashdotted in the first couple of days, you should design your data access, caching and user session handling intelligently. Partitioning of data, breaking pieces out into services and otherwise being able to shift load between boxes is the key.

    DHH said it succinctly here: “It’s boring to scale with Ruby on Rails” (or TG or Django):
    http://www.loudthinking.com/arc/000479.html

    There’s even a book about scaling LAMP:
    http://tinyurl.com/nj6r3

    Since I wasn’t involved in the creation of this site, I can’t say what steps they took. As Joseph points out, there’s not a ton of dynamic stuff going on here, so making this site scale would not be very difficult.

    Diggdot.us is more dynamic and does more work. I know that they’ve handled their traffic and spikes through caching and I believe they have a couple of boxes running.

  11. Florian says:

    @Tazzzzz

    Sure beeing able to scale is more important then raw troughput. I wouldn’t dismiss raw troughput though, because to a degree it makes that you need to do less to scale, which is always nice.

    I strongly disagree that even when my primary concern was raw troughput that I’d take C or Java. In fact I’ve a little tale to tell that might be interesting.

    Once there was a popular game launch (BF2). Turns out the game had online statistics. The BF2 crowd is pretty much forum based, so people began putting their medals on their forum sigs.

    There was a programmer (me) who thought, “neat, but how about if I do a sig service?”.

    So I took asyncore and hacked together to my best ability a small app that’s did
    * fetch sig data every once in a while for existing sigs
    * fetch sig data for new sigs
    * render the images
    * store the image bytes in memory
    * interprete a subset of 1.1 http and various caching/expires directive flavors

    The limiting factor of this appliction? Well, bandwidth it turns out.
    After a short while I was serving ~5000 different sigs. With this little hack I managed to clog up my whole 100mbit internet connection, and had my hoster telling me that at peak times all his other websites went down, and that no, 150GB transfer in two weeks is NOT GOOD.

    So I stopped the service, and I kid you not, a month after I turned it off I still had 5GB of network traffic for the http part of 404 not found…

    So what resources was required to run that? It was a single unthreded processes running python on a moderate 1800ghz amd box mit 2gb of ram.

    If you really care about raw-troughput, python doesn’t stand in your way.

  12. Tim and Garry from Soda says:

    Just coming in a bit late on this discussion… but just fyi here the main components of the design:

    (and by the way, I’d agree with some of the previous commentary,
    that there are loads of sites that involve much higher load than
    this one, and are much more technically involved and brilliant…
    we’re not trying to claim we invented the world. But having done
    another site in Turbogears, we did manage to turn this site around
    in a month from client briefing to launch, which we felt was good
    (maybe only for us…, but hey, “personal bests” are the best
    goals!)).

    The designed project actually involves three separate site

    1) the main irrepressible.info site, which allows people to sign
    the pledge (supporting the amnesty campaign) and also take a
    “fragment” banner on their own site or blog. (The fragment banner
    effectively works very like a google adwords space)

    2) an admin site where the client can add/edit the fragments of
    “repressed” sites (the ones that “someone doesn’t want you to
    read). We were interested in making this site a catalogue of
    (politically) suppressed content, with an interface for people to
    be able to submit new pieces of suppressed content, and thus
    having this “admin” (or cms) partially open to the public.
    Unfortunately, like so many real world projects turned around to
    tight deadlines, clear client goals and slim budgets, many
    non-core features had to be (initially) pared back…

    3) a “fragments” site where the distributed “fragment banners”
    would pull their content from.

    The main concern for us in building the site, interestingly, was
    not the slashdotting - because the main irrepressible.info,
    as pointed out, isn’t very process intensive. Rather we were concerned about the
    fragments site, because if the site was successful (that many
    people with successful sites put the fragment-javascript on their
    webpages) then there would be many many requests (intrinsically
    uncached, in order that the fragments were always changing on the
    client sites) on the fragments server - effectively a kind of DDOS
    attack on ourselves! So we actually decided to make the
    “publishing” of the fragments (a necessary step from the clients
    brief in any case, as they wanted to sign off any fragments) an
    (scripted) export of pages from a dynamic (the admin site) to a
    largely static site (fragments). Keeping these two sites on
    separate virtual hosts allowed us to use different technologies
    (turbogears for the CMS and the main (dynamic) site and keeping largely
    to pure-apache for the fragments site) as appropriate and also
    enables us to move the fragments site onto a different process or
    server if needed (given that we’re using largely vanilla apache
    for this, it means that redeploying to a new server is largely a
    matter of copying some files and repointing the DNS)

    Ok - it’s fairly simple technological design underlying it (sorry, we never
    claimed it was going to be clever…)… but maybe that’s a good
    (or even clever) thing… And yes, we did some load-testing, and
    no, sorry we’re not posting any graphs at present.

  13. tazzzzz says:

    Tim and Garry: thanks for the detail! Good to see the approach you took. There’s nothing like static files when you can get away with them :)
    Florian: sorry for not responding sooner, I’ve been meaning to but have been buried in a project.

    I think you’re actually making my point for me. I wasn’t contending that raw throughput was critical. I use Python because it’s “fast enough”, and I can make things faster in optimization if I need to. All you’re saying is that for that specific need, Python was fast enough. Python is not faster than Java or C. If your app needed to do more work, you may eventually have reached the point where it wasn’t fast enough on that box. Then you have the choice: try to optimize your software in Python (a reasonable first choice), get another box (not a bad choice, for many apps), or reimplement all or part in a faster language.

    Generally, I think it’s best to not “prematurely optimize”. Write the app in the ideal way, figure out what your performance targets are, and then measure to see if you can meet those targets. If not, choose which path you want to go to meet the performance target. (I’d start with a profilier and fix the Python before anything else…)

    150GB in two weeks is actually not very much. I’m somewhat surprised that was an issue…

    (DreamHost claims you can use 1TB a month for $10/month.)

Leave a Reply