{"id":560,"date":"2011-02-18T14:09:08","date_gmt":"2011-02-18T04:09:08","guid":{"rendered":"http:\/\/www.mappingonlinepublics.net\/dev\/?p=560"},"modified":"2012-04-10T14:47:38","modified_gmt":"2012-04-10T04:47:38","slug":"extracting-images-from-twapperkeeper-archives","status":"publish","type":"post","link":"https:\/\/mappingonlinepublics.net\/dev\/2011\/02\/18\/extracting-images-from-twapperkeeper-archives\/","title":{"rendered":"Extracting images from Twapperkeeper archives"},"content":{"rendered":"<p>This is just a quick post to share another new script &#8211; this one takes a list of tweets with pre-resolved URLs, and filters the list for known image-hosting services. I whipped this up as part of our ongoing efforts to go deeper into the dynamics of communication at various phases of the Queensland Floods disaster &#8211; prompted in part by <a href=\"http:\/\/www.mappingonlinepublics.net\/dev\/2011\/01\/22\/media-use-in-the-qldfloods\/\">the observations I made on the link data<\/a>, which showed a very high prevalence of user-uploaded images being posted and retweeted. Besides that, our project aims to investigate not only text-based public communication, but also the role of image- and video-sharing (as well as the communities that have emerged around these practices, particularly on the Flickr and YouTube platforms). I&#8217;m partway through drafting a substantial post taking a closer look at the role of image sharing (and communication around images) in both Twitter and Flickr during the floods, but for now here is the script and the instructions.<\/p>\n<p>Please note that this script won&#8217;t work unless the <a href=\"http:\/\/www.mappingonlinepublics.net\/dev\/2010\/08\/02\/using-gawk-to-resolve-url-shorteners\/\">urlextract.awk and urlresolve.awk scripts<\/a> have been run on the archive first.<\/p>\n<div class=\"wlWriterEditableSmartContent\" id=\"scid:887EC618-8FBE-49a5-A908-2339AF2EC720:90c54806-39d2-4ca2-80e1-0820d53412ee\" style=\"padding-right: 0px; display: inline; padding-left: 0px; float: none; padding-bottom: 0px; margin: 0px; padding-top: 0px\">\n<p><code> <\/p>\n<pre>\r\n\r\n# extractimages.awk - extract tweets containing links to images\r\n#\r\n# this script takes a preprocessed CSV of tweets based on the Twapperkeeper format, looks at the longurl field, and removes any lines that do not contain a link to a known image hosting service\r\n# the urlextract.awk and urlresolve.awk scripts should be run prior to running this script\r\n# expected data format:\r\n# longurl,url,text,[other columns]\r\n#\r\n# Released under Creative Commons (BY, NC, SA) by Jean Burgess - je.burgess@qut.edu.au and Axel Bruns - a.bruns@qut.edu.au\r\n#Project website http:\/\/mappingonlinepublics.net\r\n\r\nBEGIN { \r\n\tgetline \r\n\tprint $0\r\n}\r\n\r\n#add more services below as you find them\r\n$1 ~ \/(twitpic\\.com|flickr\\.com|yfrog\\.com|plixi\\.com|instagr\\.am|photobucket\\.com|occip\\.it|picasaweb\\.google|sphotos\\.ak\\.fbcdn\\.net|facebook\\.com\\\/photo|imgur\\.com)\/ {\r\n\r\nprint $0 \r\n\r\n}\r\n\r\n<\/pre>\n<p><\/code>\n<\/div>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>This is just a quick post to share another new script &#8211; this one takes a list of tweets with pre-resolved URLs, and filters the list for known image-hosting services. I whipped this up as part of our ongoing efforts to go deeper into the dynamics of communication at various phases of the Queensland Floods &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/mappingonlinepublics.net\/dev\/2011\/02\/18\/extracting-images-from-twapperkeeper-archives\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Extracting images from Twapperkeeper archives&#8221;<\/span><\/a><\/p>\n<p><!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[176,8],"tags":[73,7,46,84,297,9,298],"class_list":["post-560","post","type-post","status-publish","format-standard","hentry","category-processing","category-twitter","tag-qldfloods","tag-gawk","tag-hyperlinks","tag-images","tag-methods","tag-twapperkeeper","tag-twitter","entry"],"_links":{"self":[{"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/posts\/560","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/comments?post=560"}],"version-history":[{"count":0,"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/posts\/560\/revisions"}],"wp:attachment":[{"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/media?parent=560"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/categories?post=560"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/tags?post=560"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}