Any self-respecting Internet researcher will already be aware that the Association of Internet Researchers (AoIR) is the place to keep track of what’s cutting edge in the field. On the AoIR mailing-list, there’s been an interesting discussion over the last few days about the available tools for tracking, capturing, and analysing Twitter data – and since nobody had yet mentioned yourTwapperkeeper as a useful solution for many standard tasks, I’ve just posted an overview of how we’ve been approach such research.
As I haven’t had much time recently to post many updates on this Website about our methodological work, I thought I’d cross-post my message to the mailing-list here. For those who’ve come to Mapping Online Publics only recently, perhaps it also provides a useful summary of where we’re at so far.
Erica Ciszek asked:
> I was wondering if anyone can suggest particular tools for aggregating and
> analyzing Twitter content.
Maybe I’m old-school on this, but I’m surprised no-one’s mentioned yourTwapperkeeper yet – in my experience, very straightforward to set up (all you need is a standard LAMP server setup to run it on), and fine for most standard Twitter capture tasks (e.g. tracking hashtags, keywords, specific users, etc.). It’s open source and available here:
We’ve made some modifications to more easily export datasets in CSV/TSV-format datasets – see details here:
Personally, I don’t trust most out-of-the-box Twitter analytics tools, and prefer to roll my own – for processing CSV/TSV datasets containing Twapperkeeper-format data, I’ve been using the scriptable command-line tool Gawk with great success. A collection of Gawk scripts for standard Twapperkeeper data processing tasks is available under a Creative Commons licence here:
Additionally, my ‘Swiss army knife’ Gawk script for extracting activity metrics from a Twapperkeeper dataset is here:
The question of developing standard, case-independent metrics for the description of Twitter activity patterns is something Stefan Stieglitz and I are taking up in two forthcoming papers (happy to share drafts – email me off-list). The keynote which Jean Burgess and I presented at the recent Conference on Science and the Internet foreshadows some of this discussion, though:
Axel Bruns and Jean Burgess. “Notes towards the Scientific Study of Public Communication on Twitter.” Keynote presented at the Conference on Science and the Internet, Düsseldorf, 4 Aug. 2012. (The slides and video of the presentation are here: http://snurb.info/node/1678.)
Detailed notes on how we use these scripts to process Twitter data, and additional processing tools, are also on our Website – see http://mappingonlinepublics.net/category/twitter/ for more.
For network visualisation, I recommend the open source software Gephi. My article in Information, Communication & Society describes how I’ve used yourTwapperkeeper, Gawk and Gephi to create dynamic visualisations of Twitter conversation networks:
Axel Bruns. “How Long Is a Tweet? Mapping Dynamic Conversation Networks on Twitter Using Gawk and Gephi.” Information, Communication & Society, 17 Nov. 2011. http://dx.doi.org/10.1080/1369118X.2011.635214
For more sophisticated, ‘big data’ research (i.e. upwards of a few million tweets per dataset), the yourTwapperkeeper approach is less useful (the LAMP framework just isn’t built for big data), and you’ll probably need to build your own customised solution. Eugene Liang and I discuss the pros and cons of both approaches in a recent article in First Monday (while we frame this in a crisis communication context, the discussion applies well beyond this):
Axel Bruns and Yuxian Eugene Liang. “Tools and Methods for Capturing Twitter Data during Natural Disasters.” First Monday 17.4 (2012).
Hope that helps.