Taking a quick break from the AoIR 2015 liveblogging at snurb.info: today’s presentation by Fabio Giglietto, Luca Rossi and Jiyoung Kim got me thinking. They built on a paper by Stefan Stieglitz and me which compared some basic properties of a large number of hashtag datasets (and some keyword-based datasets, too), and used these to classify different hashtag uses (mainly distinguishing between crisis events and media audiencing).
Back then, we looked at the percentage of tweets containing URLs, and the percentage of tweets that were retweets, as well as the total number of tweets in each dataset:
From: Axel Bruns and Stefan Stieglitz. “Quantitative Approaches to Comparing Communication Patterns on Twitter.” In Klaus Bredl, Julia Hünniger, and Jakob Linaa Jensen, eds., Methods for Analyzing Social Media. 20-44.
I’m keen to update that study with new data from more recent hashtags, and we’ve already started to work through our own archived datasets to generate further metrics. But our datasets are limited to the research interests we’ve pursued over time, and to Australian and international topics.
So, I’m wondering whether we could build this up to a much larger collection by taking a collaborative, crowdsourced approach: if anyone else out there has Twitter datasets from the past few years, could you run a handful of quick analyses over your archives and share the results? What we’d need are:
- Hashtag(s) or keyword(s) used to capture the dataset
- Timeframe of capture (from/to date)
- Total number of tweets
- Total number of tweets containing URLs – using the regular expression /http/
- Total number of tweets containing retweets – using the regular expression /(\”@|RT @|MT @|via @)[A-Za-z0-9_]+/
You could leave those details in the comments attached to this post, or email them to me at a.bruns(at)qut.edu.au.
This is an experiment, in the spirit of AoIR collegiality. Would anyone be interested in sharing the metrics for their datasets? In return, I’d be very happy to include you as a contributing author in the paper we’ll eventually develop from this. Thanks in advance!