#LeadersDebate demo retrospective

30 Apr 2010

Following on from the first #LeadersDebate demo blog post we thought it would be good to look back on how we built the demo, how it performed on the two debate evenings and determine what worked and what did not.

We'd hoped to be able to do more to improve the performance of the #LeadersDebate demo between the second and final debate but time constraints mean we only just managed to make one improvement. This improvement was to remove the retweets (RT) from the demo. This meant that the tweet mention count better reflected unique tweets and the amount of data being pushed through the Kwwika service, and more importantly, to each web browser viewing the demo.
The latter is where the demo really fell down on each of the debate evenings. During the one and a half hour event last night around 120,000 tweets were made containing the tag #LeadersDebate. This means that on average there were over 1,300 tweets per minute and over 20 tweets per second.
How much data?

The amount of data that we are talking about probably isn't too much data for a web page to handle until you drill into the contents of the data and look at what is being done on the web page itself.
Only a small amount of the tweet data being sent to the browser was actually being displayed. Here's a sample message:

In the case of the #LeadersDebate demo we only really needed ScreenName, Text, UserProfileImageUrl, TotalTweets and sometimes the reply based fields. If we had only sent through the data we actually needed this would have made a small, but useful, dent in the amount of data the web browser would need to process.
What did we do with the data?


Once we had received the data we did a few calculations. We worked out the total number of tweets received for all the leaders together. So, TotalTweets could be 136,690,  BrownTweets 31,020, CameronTweets 38,313 and CleggTweets 28,704. So, the total leader tweets would be 98,037 (31,020 + 38,313 + 28,704).

On top of this we also converted these figures into percentages. Brown on 32% (31,020 / 98,037), Cameron on 39% (38,313 / 98,037) and Clegg on (28,704 / 98,037).

Although these are simple and small calculations they were performed each time there was a tweet update; 20 times a second.
Since we were receiving all #LeadersDebate updates but we wanted to make sure that each tweet was analysed to ensure that it went into the appropriate leaders' column. So, as each tweet came in we would perform a textual match on the tweet message (mUpdate.Text). This was done once per column for each tweet and there were also multiple filters to check against e.g. for David Cameron we filtered on David, Cameron, DC, Dave, Tories, Tory, Conservatives and a few others.
We also added the ability to live filter the far left column. We did this using some jQuery selectors and if you'd have tried this as the debate was happening it is more than likely that your browser crashed.
The main thing to learn here is it's still much better to do intensive work such as this in the Kwwika.NET application that is consuming all the data in real-time from the Twitter Streaming API. We were publishing all #leaderdebate tweets on the /KWWIKA/TWITTER/HASHTAGS/LEADERSDEBATE topic when instead we should have published things on different topics to allow the web browser client to concentrate on just displaying the data. Maybe having a few topics like this would have worked:

  • Tweets that didn't mention any leader but still contained #LeadersDebate: /KWWIKA/TWITTER/HASHTAGS/LEADERSDEBATE/OTHER

Using this all count and percentage calculations could have been done within the Kwwika.NET application, which was running on a web server, and pushed to the /COUNTS topic, all tweets about specific or multiple leaders could have gone to the appropriate leader topic, /BROWN, /CAMERON, /CLEGG and all others to /OTHER. The Brown column would display all /BROWN tweets, the Cameron column all /CAMERON tweets, the Clegg column all /CLEGG tweets and the all column would combine /OTHER, /BROWN, /CAMERON and /CLEGG.
It's potentially a bit more complex this way, since the Kwwika.NET API application has to analyse tweets and publish them to different topics, but it certainly take all strain off of the client.
How was the data displayed?

As mentioned in the first #LeadersDebate blog post we decided upon four columns. One for each leader and one to display all the tweets. The columns were formed from a Kwwika Twitter Widget that we've built and we enhanced it to allow us to make it a bit more configurable. You can see the widget standalone on the About Kwwika page. We also decided to use two Google interactive charts to provide a visual representation of the data as it arrived in real-time.
One of the enhancements we added to the Kwwika Twitter widget was to add some jQuery effects to allow the tweets to slide down into view. At low update rates this is a really nice effect. At higher update rates your browser is unlikely to cope.
The charting components are also a really nice effect at lower update rates but as soon as the update rates increase they don't really cope.

Yep, we admit it. There was a bug in our code. If you viewed the demo and things appeared to be working fine for you but then you stopped getting updates you've probably been bitten. We'll fix this in our core library very soon.

What did we prove?
We've proven that our Kwwika technology is clearly the bees knee's. We had absolutely no problems within the Kwwika service or within our Kwwika.NET application. We also think that the JavaScript Kwwika API held our well with the exception of the bug we've found.
The main thing for us is that it was very clear during the 20+ tweets per second period that there was no way that any human being could actually read the tweets. The statistics were very interesting so maybe the demo should have concerntrated on that rather than the individual tweets. If we'd have done this and done the tweet analysis in the Kwwika.NET application running on a server then the web browser would have had absolutely no problem at all dealing with the update rate.
It's potentially a little unfair to suggest a best browser since our bug mean using some other browsers was pretty tricky. However, for us, Chrome seemed to stand up to the update rate and seemed to work around the bug far better than Firefox or Internet Explorer.
By considering the following we would certainly be able to improve the #LeadersDebate performance and user experience:

  1. Do a little calculating and analysis in the client as possible when you are dealing with high volume data.
  2. Organise the data so that the client doesn't have to.
  3. Only send that data that is needed to the web browser. Don't send unused fields.
  4. Consider the update rate. If the target consumer is a human being will they be able to read the data?
  5. At high data rates visual effects and charting components don't seem to work in a web browser. For charting consider pulsing the data or only making an update at set intervals.
  6. We could certainly dig even deeper into the causes of the browser slow down by using tools to analyse bottle necks and potential areas for memory leaks within the browser.

We'll give you the code

We built this as a demo. It's not a product. So, if you would like to build something similar and want a starting point we are happy to have a chat and give you the code. So, get in touch.


| Leave a comment  »