{"id":733,"date":"2011-06-22T23:27:00","date_gmt":"2011-06-22T13:27:00","guid":{"rendered":"http:\/\/www.mappingonlinepublics.net\/dev\/2011\/06\/22\/gawk-scripts-for-processing-twitter-data-vol-1\/"},"modified":"2012-04-10T14:00:00","modified_gmt":"2012-04-10T04:00:00","slug":"gawk-scripts-for-processing-twitter-data-vol-1","status":"publish","type":"post","link":"https:\/\/mappingonlinepublics.net\/dev\/2011\/06\/22\/gawk-scripts-for-processing-twitter-data-vol-1\/","title":{"rendered":"Gawk Scripts for Processing Twitter Data, Vol. 1"},"content":{"rendered":"<p>Well, getting stuck in Melbourne for a day and being unable to participate in day one of <a href=\"http:\/\/www.mappingonlinepublics.net\/dev\/2011\/06\/20\/project-update-and-press-release\/\">our ATN-DAAD workshop with Cornelius Puschmann and Katrin Weller from the University of D\u00fcsseldorf<\/a> has at least enabled me to put the finishing touches on something I&#8217;ve been meaning to do for some time: to collect and share the various Gawk scripts for processing <em>Twitter<\/em> data collected by <em>Twapperkeeper<\/em> or our modified <em>yourTwapperkeeper<\/em>. <a href=\"http:\/\/mappingonlinepublics.net\/resources\/\">A ZIP file of all our (half-way decent) scripts is now available on the Tools section of our site<\/a>.<\/p>\n<p>These scripts enable the processing of comma- or tab-separated value files containing tweets related to specific hashtags or keywords, as <em>Twapperkeeper<\/em> used to produce them, and as <em>yourTwapperkeeper<\/em> does once you&#8217;ve installed the modified export functions <a href=\"http:\/\/www.mappingonlinepublics.net\/dev\/2011\/06\/21\/switching-from-twapperkeeper-to-yourtwapperkeeper\/\">which I shared in a previous post<\/a>.<\/p>\n<p><!--more--><\/p>\n<p>There are too many scripts here to provide instructions for individually; some of them I have already discussed in previous posts over the past year or so, however, and all of them contain some instructions on how to use them in their headers. To see those instructions, simply open the scripts in a text editor of your choice. I have also included with the ZIP file a brief list of all the scripts made available here, with a short description of what they do. Eventually, I may write up some more detailed instructions, but certainly not now.<\/p>\n<p>The latest version of this scripts package will always be available <a href=\"http:\/\/mappingonlinepublics.net\/resources\/\">from the Tools section of this Website<\/a>.<\/p>\n<p>The Readme file included with the ZIP also provides some general instructions &#8211; here are the most important ones:<\/p>\n<h2>Installation<\/h2>\n<p>All scripts should be placed in a central directory which is easily accessible from the command line interface. The urlresolve.awk script requires the open source tool wget to be installed and in the command path; it also needs to be modified so that the &#8216;path&#8217; variable points to the directory containing the scripts (relative or absolute paths are acceptable; relative paths must be relative to the location that scripts are intended to be excecuted from). Paths must conform to standard PC, Mac, or Linux notation as appropriate; special characters (e.g. backslash) need to be escaped. <\/p>\n<p>Additionally, of course, it is assumed that Gawk is installed and in the command path.<\/p>\n<h2>Usage<\/h2>\n<p>A brief overview of the scripts and their respective functions is provided in the Quick Guide file included in this archive. <\/p>\n<p>Generally, scripts should be called as follows: <\/p>\n<p>gawk -F , -f [script].awk [argument]=&quot;[parameters]&quot; input.csv &gt;output.csv   <br \/>(for Twapperkeeper files in comma-separated value format) <\/p>\n<p>or <\/p>\n<p>gawk -F \\t -f [script].awk [argument]=&quot;[parameters]&quot; input.tsv &gt;output.tsv   <br \/>(for Twapperkeeper files in tab-separated value format) <\/p>\n<p>Some scripts do not take any command line arguments; some may take multiple. All scripts are able to process both comma- and tab-separated value formats (CSV\/TSV), and will usually return their results in the same format. <\/p>\n<p>The exceptions from this rule are atextractfromtoonly.awk, preparegexfattimeintervals.awk, and gexfattimeintervals.awk: the first two output CSV only, while the latter generates a GEXF file. This is necessary since the network visualisation tool <a href=\"http:\/\/gephi.org\/\">Gephi<\/a> only processes CSV or GEXF formats.<\/p>\n<h2>Known Issues<\/h2>\n<p>The Mac version of Gawk has not implemented the &#8216;switch&#8217; statement <em>(sigh)<\/em>; atreplycount.awk and&#160; multicount.awk will not work, therefore. Mac Gawk can be recompiled to include &#8216;switch&#8217;; search the Web for instructions on how to do so. There will be a workaround in a future revision of these scripts, replacing &#8216;switch&#8217; with &#8216;if\/then&#8217; constructions. <\/p>\n<p>If used with the &#8216;stats&#8217; command line argument, atreplycount.awk will produce a division by zero error if any of the usernames specified with the &#8216;search&#8217; command line argument did not tweet and\/or receive @replies\/RTs in the dataset being processed. Remove these usernames from the command line argument if the problem occurs. A fix will be made available in a future revision of these scripts.<\/p>\n<h2>Licence and Acknowledgement<\/h2>\n<p>All scripts are provided as is, with no guarantee of accuracy or reliability. While all efforts have been made to ensure the reliability of these scripts, no warranty is given or implied. <\/p>\n<p>If you publish research which was conducted using these scripts, please acknowledge this. The script package can be cited as follows (modify for other bibliographic referencing schemes as required): <\/p>\n<p>Axel Bruns and Jean Burgess. &quot;Gawk Scripts for Twitter Processing.&quot; v1.0. <em>Mapping Online Publics<\/em>, 22 June 2011. &lt;<a href=\"http:\/\/mappingonlinepublics.net\/http:\/\/mappingonlinepublics.net\/resources\/\">http:\/\/mappingonlinepublics.net\/<a title=\"http:\/\/mappingonlinepublics.net\/resources\/\" href=\"http:\/\/mappingonlinepublics.net\/resources\/\">http:\/\/mappingonlinepublics.net\/resources\/<\/a><\/a>&gt;.<\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>Well, getting stuck in Melbourne for a day and being unable to participate in day one of our ATN-DAAD workshop with Cornelius Puschmann and Katrin Weller from the University of D\u00fcsseldorf has at least enabled me to put the finishing touches on something I&#8217;ve been meaning to do for some time: to collect and share &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/mappingonlinepublics.net\/dev\/2011\/06\/22\/gawk-scripts-for-processing-twitter-data-vol-1\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Gawk Scripts for Processing Twitter Data, Vol. 1&#8221;<\/span><\/a><\/p>\n<p><!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,176,113,8],"tags":[7,42,37,6,9,298,111],"class_list":["post-733","post","type-post","status-publish","format-standard","hentry","category-methods","category-processing","category-tools-2","category-twitter","tag-gawk","tag-mapping","tag-research","tag-tools","tag-twapperkeeper","tag-twitter","tag-yourtwapperkeeper","entry"],"_links":{"self":[{"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/posts\/733","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/comments?post=733"}],"version-history":[{"count":0,"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/posts\/733\/revisions"}],"wp:attachment":[{"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/media?parent=733"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/categories?post=733"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mappingonlinepublics.net\/dev\/wp-json\/wp\/v2\/tags?post=733"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}