Processing Twitter — Jean Burgess, 18 February 2011

This is just a quick post to share another new script – this one takes a list of tweets with pre-resolved URLs, and filters the list for known image-hosting services. I whipped this up as part of our ongoing efforts to go deeper into the dynamics of communication at various phases of the Queensland Floods disaster – prompted in part by the observations I made on the link data, which showed a very high prevalence of user-uploaded images being posted and retweeted. Besides that, our project aims to investigate not only text-based public communication, but also the role of image- and video-sharing (as well as the communities that have emerged around these practices, particularly on the Flickr and YouTube platforms). I’m partway through drafting a substantial post taking a closer look at the role of image sharing (and communication around images) in both Twitter and Flickr during the floods, but for now here is the script and the instructions.

Please note that this script won’t work unless the urlextract.awk and urlresolve.awk scripts have been run on the archive first.


# extractimages.awk - extract tweets containing links to images
#
# this script takes a preprocessed CSV of tweets based on the Twapperkeeper format, looks at the longurl field, and removes any lines that do not contain a link to a known image hosting service
# the urlextract.awk and urlresolve.awk scripts should be run prior to running this script
# expected data format:
# longurl,url,text,[other columns]
#
# Released under Creative Commons (BY, NC, SA) by Jean Burgess - je.burgess@qut.edu.au and Axel Bruns - a.bruns@qut.edu.au
#Project website http://mappingonlinepublics.net

BEGIN { 
	getline 
	print $0
}

#add more services below as you find them
$1 ~ /(twitpic\.com|flickr\.com|yfrog\.com|plixi\.com|instagr\.am|photobucket\.com|occip\.it|picasaweb\.google|sphotos\.ak\.fbcdn\.net|facebook\.com\/photo|imgur\.com)/ {

print $0 

}

About the Author

Jean Burgess is a Professor of Digital Media and Director of the Digital Media Research Centre (DMRC) at Queensland University of Technology. She is @jeanburgess on Twitter.

Related Articles

Share

(2) Readers' Comments