Capture Tools Twitter — Snurb, 9 January 2012
Twapperkeeper and Beyond: A Reminder

Those of you who have followed our adventures in Twitter research for some time now will know that we’ve relied to a significant extent on Joe John O’Brien III’s excellent Twapperkeeper as a tool for capturing tweets. Twapperkeeper (as a stand-alone, free Web-based service) no longer exists in its original form, however – though some of its functionality for creating Twitter archives appears to have been subsumed into the for-pay services available as premium offerings from Hootsuite – and so we’ve been getting the occasional inquiry about what to do now.

Some months ago, I published a quick post to outline how we’ve transitioned from Twapperkeeper(.com) to the open-source solution yourTwapperkeeper, which offers comparable functionality as a Web package which users are able to install on their owns servers, and the start of a new year seems like a good point to reiterate this, as well as to add a few further pointers. So:

  • yourTwapperkeeper does pretty much exactly what Twapperkeeper did, and provides data in almost the same format. For the purposes of using the Gawk scripts which much of our work is built on, though, we need CSV or TSV files in the original Twapperkeeper format, and I’ve made available a small modification for yourTwapperkeeper which generates them. More details here.
  • Your Web server must be running 24/7 if you want to capture comprehensive datasets from Twitter. If there’s any chance that it may go down at some point (e.g. due to regular scheduled maintenance), you need to make sure that yTK is restarted as soon as the server is back up again. Some information on how to do so is here.
  • Most importantly: recent changes at Twitter now require API requests to use the https (rather than plain http) protocol. This means that (if you have an existing install of yTK)you need to make some minor changes to the yourTwapperkeeper code (or use the latest version from Github, which has the changes already built in); without these changes, you may only receive data from the search API, but not the (more important) streaming API, or even none at all. Details on how to make these changes are here.

Hope this helps. Happy Twapperkeeping!

About the Author

Dr Axel Bruns leads the QUT Social Media Research Group. He is an ARC Future Fellow and Professor in the Creative Industries Faculty at Queensland University of Technology in Brisbane, Australia. Bruns is the author of Blogs, Wikipedia, Second Life and Beyond: From Production to Produsage (2008) and Gatewatching: Collaborative Online News Production (2005), and a co-editor of Twitter and Society, A Companion to New Media Dynamics and Uses of Blogs (2006). He is a Chief Investigator in the ARC Centre of Excellence for Creative Industries and Innovation. His research Website is at, and he tweets as @snurb_dot_info.

Related Articles


(3) Readers' Comments