Between 5am on Thursday 14 October 2010 and 5am on Friday 15 October, Greater Manchester Police has dealt with 3,205 incidents and posted details of every single one on Twitter. At the Hacks and Hackers Hack Day Manchester (#hhhmcr), Enrico Zini and I have developed a web-based search system that makes it easier to view and make sense of these thousands of tweets.
The prototype has a simple and clean user interface that enables easy and fast search.
Type in a search word, the system gives you both quantitative (statistics, tag clouds) and qualitative (the original tweets) results. For example, if you search for “theft or thefts”, you can see that there were 369 results over 3257 documents, 11.33% of total incidents reported. The tag clouds tell you what items were most stolen (e.g., car, bike, handbag) and where were they stolen.
And here, can you imagine animals (cats, dogs, horses) being one of the main nuisances the police faced?
The data were also geo-coded. Both manually produced codes and wikipedia data were used to infer geographical references on the tweets. For example, when searching for Didsbury, you can see all the incidents reported in Didsbury, get their percentage against the total number of incidence, and see what those incidents were about (see Fig.4 below).
The geo-located tweets can enable comparative study on incidents reported in different areas. If information on residents’ lifestyles and more administrative data were available, we could even explore more questions related to neighbourhoods, lifestyles and incidents. For example, searching Bolton and Didsbury respectively immediately shows that Bolton had higher incidents than Didsbury (see a search on Bolton as Fig. 5 below).
The system can also be used to validate an inquiry of current affairs. Anti-social behaviours of teenagers are becoming increasingly problematic. But is it really that bad? According to the GMP’s tweets (search “youth or youths or teenager or teenagers”), reports on youths anti-social behaviours were indeed high. There were 497 results over 3257 documents – a good 15.26%. The tags-cloud feature on the top of the search results page also shows the kinds of incidents teenagers were involved – such as throwing stones or fireworks (see Fig. 6).
Apart from the statistics and geo-locations, the data can also be analysed against social categories (such as occupations – “taxi drivers” or “landlords”) and social places (such as restaurants, pubs, hotels and toilets). Look up on the category “social places”, there were 7 incidents taking place in hotels. And the tag clouds showed the hotels were at Manchester City Centre, Wigan, and Ashley-u-Lyne, and related to drunk, women, abuse, assault, theft.
Searching for social groups category, the data shows that men / male (search for “man or men or boy or boys or male or males” – 16.64%) were more than twice more likely to be involved in troubles than women / female (search for “woman or women or girls or female or females”- 7.09%).
These tweets provided snapshots of 24-hour police work. The number of tweets is large – shows how busy the police was. This system is very extensible – if the GMP tweets again, the software can be easily applied to analyse vast amounts of data. It’s easy-to-use user interface can attract residents of all age groups at Greater Manchester to use it.
Unfortunately we do not have enough resources to maintain this search site, so it has been taken offline after we won the first prize at the hhhmcr. However, we have released the source code under the GPL licence and you can find the instructions and source code here. With this, you’ll be able to duplicate the search system within a very short period of time. And since it is free software, you’ll also be able to fix the bugs, modify and improve the code, and distribute it further. Have fun.