Twitter provide nice APIs that allow users access to their data. There’s a whole bunch of interesting applications built on top of this data, most of them gimmicks while a few have serious potential. One of the ways of accessing the twitter data is via the Twitter Firehose which is supposed to be the entire stream of “tweets” (tweet is a message on twitter). Unfortunately the firehose is still not open to all users which is a real pity.
Bill de hÓra has a very interesting posting on this topic. There are 2 interesting aspects to Bill’s post:
1. The story behind the firehose and some of the reactions to the fact that Twitter still haven’t given open access to this data.
2. The speculation that the reason for the delay is not so much a business reason but a technical limitation on Twitter’s ability to scale.
If the reason for the delay is that Twitter is protecting the data so they can exploit the value themselves, I wouldn’t be too surprised. To be fair that would be considered the second most valuable thing they have. The most valuable asset is the fact that they have everyone’s attention.
Bill seems to think the issues are technical. Roy Fielding suggests as much when he blogged about Inverse Economies of Scale with PubSub systems. Roy’s solution to the problem of people demanding too much data/events and therefore crippling your service was to charge for the service. That is possibly what Twitter may announce next year.

The technical problems that Roy and Bill discuss are caused by Event and/or Data gluttons. These are users/machines that oversubscribe to events or data services. This is human nature and provides developers with a challenge when building distributed software. Luckily there are a lot of useful patterns and tools available to today’s developers to solve these problems. At WestGlobal we face the challenge of Event Gluttons and we handle it by having a flexible architecture that can be optimised depending on the situation. In general, the approach we take is for a distributed Event Processing Network (EPN) with flexible deployment options. As such, Vantify deployment are layered as follows:
1. The first layer of consumers are our Event Processors who sit closest to the action. These event gluttons are deployed on a network and want to know everything that is happening in your business. These Event Gluttons process events as they happen using CEP and other techniques. The good news for the network is that you don’t need too many of these event processors so over subscription isn’t a problem. They can be distributed almost anywhere on your network and if there is a high volume of data then we can limit the chatter by deploying the processors closer to the events or splitting up the event streams. Although these are very greedy consumers we can satisfy their appetite.
2. The other place where we need to handle event gluttons is at the top layer in our product where real users want to view reports and dashboards about their business. Here we do need to worry about supporting a large number of users. However most of these users are using a browser and polling our data feeds which will scale very well. There will be a limited number of users who want subscriptions in order to be notified. This is where we need to worry about scale but this is typically a small number.
So I think the world is big enough to accommodate Event Gluttons. Thanks to people like Roy and Bill, developers have a lot of tools and techniques available to them to handle scale. The only good reason for restricting data or events should be a business reason. I certainly hope Twitter open up their data and like many others I’ll be watching with interest when they reveal all.