BUT, that's via the firehose, which doesn't include direct messages or twitterers who've locked their feeds. The number also doesn't reflect what has to happen with each of those 200.
Sure it's not impressive if all they have to do is append a 140 char message to a flat file or DB table. It's all the other manipulation that's interesting.
What kind of 'manipulation'? They're basically routing messages, with a protocol change in the middle when the recipients' designated protocol differs (i.e. IM to text message). This can't be all that computationally expensive when the system is engineered right.
Knowing how little activity twitter has really makes them look incompetent in light of the service outages they experienced over the last year. Either Ruby on Rails is really really non-performant or the twitter code-monkies got the system architecture wrong.
summary: "Rails and Ruby haven’t been stumbling blocks... The performance boosts associated with a “faster” language would give us a 10-20% improvement, but thanks to architectural changes that Ruby and Rails happily accommodated, Twitter is 10000% faster than it was in January"
They're dealing with duplication, consistency, searchability, etc. across distributed storage systems and a variety of service mechanisms (more than just protocols). The 1 000 000 follower user has every message duplicated and cached at multiple layers up to 1000000 times. The message can show up on the web, API, or across any of the protocols, and it has to persist.
They're not incompetent code-monkies, they just guessed way low when they designed the architecture. It looks like the big move was from a CMS model (not at all unreasonable for a "microblogging" service) to a messaging model. In hindsight, targeting messaging to begin with would've saved them some down time, but wouldn't have been practical in the short term.
What you say is echoed by, Every incorrect assumption in this post seems to think that 1 tweet on twitter = 1 database row = "So easy!". You've left out the user fanout! One Obama Tweet = 1M database rows, someplace
Assuming that is true, which I have trouble believing it is, it sounds like they need some help with normalization. I understand the tradeoffs, but it just seems crazy.
Select tweet.* from tweets inner join tweeters on tweeters.id = tweets.tweeterid inner join followers on followers.followerid=loggedinuserid and followers.followee=tweet.tweeterid
Yeah, okay, I know there are performance problems with joins, but there are performance problems with 1,000,000 inserts as well and you could cache the list of followees and do an "in" statement such as: select * from tweets where tweeterid in (cached_comma_separated_list_of_followee_ids)
I don't see the value in Scoble's argument, if he has one. I don't see the point of comparing the amount of tweets/sec with the rate of Google queries. They are just very different things, in their essence, purpose, way of usage.
It would be more interesting (maybe) comparing the number of Twitter search queries with Google's, at least we would be talking about similar operations.
From the slope, we saw about 200 tweets/second, sometimes peaking at 250 as well. The right axis in this chart is the number of tweets per minute. You can see it rising and falling through the day.
Hold on a minute. Are we talking about peak load or average load?
Saying Twitter averages 200 tweets/second is vastly different than peaking at 200 tweets a second.
Am I just being slow?
Edit: Nevermind, Robert is saying that the average is 200 tweets/second during periods of peak load. The headline here makes it seem like the maximum capacity is 200 tweets/second which is misleading.
I think what the data shows is that the average sort of wobbles around a bit. Check out the update to my post for a new image link to see it varying throughout the day.
I've always assumed that most people don't get every tweet SMS'd to them. So most of the time, a tweet isn't sent to anybody, it's just returned when people request the tweets that are relevant to them.
The MySQL load is much higher than 200 Hz. Depending on how they do it, what you want to count as dependent load, and how you want to measure it all, you have a scalar for the average number of followers to update per tweet, work to update indices, etc.
Sure it's not impressive if all they have to do is append a 140 char message to a flat file or DB table. It's all the other manipulation that's interesting.