Our work with Twitter data builds on a number of tools. Many posts on the blog describe how we’re using them. Here are our key tools:
Data Gathering
yourTwapperkeeper – an open source platform building on the popular Twapperkeeper Web service. Both capture tweets containing particular hashtags or keywords.
We have made some further extensions and modifications to the yourTwapperkeeper platform in order to ensure compatibility between TK and yTK datasets and to be able to export data in comma- and tab-separated formats. These modifications are described here; the modified yTK PHP scripts are available here:
yTK-modifications-v1.0.zip (9.4 kB) – v1.0, released 20 June 2011
Data Processing
Gawk – an open source, multiplatform, programmable command-line tool for processing CSV/TSV documents; essential for manipulating the datasets produced by our gathering tools.
We have developed a number of Gawk scripts for processing Twitter datasets in Twapperkeeper format. Many of the individual scripts are discussed on the blog; the current collection can be downloaded here:
Gawk-Twitter-scripts-v1.0.zip (23.7 kB) – v1.0, released 22 June 2011
Textual Analysis
Leximancer – commercial, multiplatform textual analysis tool: extracts key concepts from large corpora of text, examines and visualises concept co-occurrence
WordStat – commercial, PC-only textual analysis tool; part of a larger text statistics package: similar to but more powerful than Leximancer, and generates concept co-occurrence data that can be exported in standard formats for subsequent visualisation
Visualisation
Gephi – open source, multiplatform network visualisation tool: wide range of visualisation options, extensible plugin system, exports maps as PDF or SVG
Wordle – simple word cloud visualisation tool
Seadragon – handy tool for embedding large-scale images on a Web page; handles images, PDFs, SVGs, even URLs for Web pages…
RELATED POSTS
- Tracking Twitter: yourTwapperkeeper and Other Options
- Resolving Short URLs: A New Approach
- More Twitter Metrics: Metrify Revisited
- Creating Basic Twitter Language Metrics
- Twapperkeeper and Beyond: A Reminder
- Taking Twitter Metrics to a New Level (Part 4)
- Taking Twitter Metrics to a New Level (Part 3)
- Taking Twitter Metrics to a New Level (Part 2)
- Taking Twitter Metrics to a New Level (Part 1)
- Twitter Research Methods
Just a hint – you might be aware of this – but using the pipe sign (|) as deliminator while exporting from TK might be the best choice, seeing as how commas, semi-colons et.c. might be used in the tweets themselves…
I am playing with early data from a social network experiment in Wikipedia and exploring the dynamic network possibilities of Gephi. Would you mind sharing the GEXF dataset you used for the Twitter visualization?