Tools

TWITTER

Our work with Twitter data builds on a number of tools. Many posts on the blog describe how we’re using them. Here are our key tools:

Data Gathering

yourTwapperkeeper – an open source platform building on the popular Twapperkeeper Web service. Both capture tweets containing particular hashtags or keywords.

We have made some further extensions and modifications to the yourTwapperkeeper platform in order to ensure compatibility between TK and yTK datasets and to be able to export data in comma- and tab-separated formats. These modifications are described here; the modified yTK PHP scripts are available here:

yTK-modifications-v1.0.zip (9.4 kB) – v1.0, released 20 June 2011

Data Processing

Gawk – an open source, multiplatform, programmable command-line tool for processing CSV/TSV documents; essential for manipulating the datasets produced by our gathering tools.

We have developed a number of Gawk scripts for processing Twitter datasets in Twapperkeeper format. Many of the individual scripts are discussed on the blog; the current collection can be downloaded here:

Gawk-Twitter-scripts-v1.0.zip (23.7 kB) – v1.0, released 22 June 2011

Textual Analysis

Leximancer – commercial, multiplatform textual analysis tool: extracts key concepts from large corpora of text, examines and visualises concept co-occurrence

WordStat – commercial, PC-only textual analysis tool; part of a larger text statistics package: similar to but more powerful than Leximancer, and generates concept co-occurrence data that can be exported in standard formats for subsequent visualisation

Visualisation

Gephi – open source, multiplatform network visualisation tool: wide range of visualisation options, extensible plugin system, exports maps as PDF or SVG

Wordle – simple word cloud visualisation tool

Seadragon – handy tool for embedding large-scale images on a Web page; handles images, PDFs, SVGs, even URLs for Web pages…

RELATED POSTS

Join the Conversation

2 Comments

Anders Larsson says:

24 August 2010 at 00:47

Just a hint – you might be aware of this – but using the pipe sign (|) as deliminator while exporting from TK might be the best choice, seeing as how commas, semi-colons et.c. might be used in the tweets themselves…

Dario says:

9 July 2011 at 04:09

I am playing with early data from a social network experiment in Wikipedia and exploring the dynamic network possibilities of Gephi. Would you mind sharing the GEXF dataset you used for the Twitter visualization?

Join the Conversation

Leave a comment

Cancel reply