Two Headlines is important to various user groups, as pointed out in my Statement of Significance. Due to its nature as a Twitter bot, it is made up of its code and its tweets. The commentary and news articles written about would also be important to preserve for context.
From code to tweet
The bot’s code is freely available online on GitHub under the MIT license, which has no restrictions on future use of the software, therefore allowing it to be preserved with or without the creator’s permission. However, there is only one version of the code available. If there have been revisions to the code, there would be no way to preserve older versions of it.
To provide future users with a working version of the bot and not just the code, node.js and npm will also be preserved. According to its website, node.js is an “event driven framework [that] is designed to build scalable network applications”. It also says that it relies on npm, which allows developers to share and reuse “packages of code”. For preservation, only one installer will need to be acquired to have access to both programs.
The ‘packages of code’ that are used by Two Headlines are called ‘cheerio’, ‘request’, ‘twit’, and ‘underscore.deferred’. Cheerio is described as a “tiny, fast, and elegant implementation of core jQuery designed specifically for the server.” Request is a “Simplified HTTP request client”. Twit is “Twitter API client for node (REST & Streaming).” Underscore.deferred is a “jQuery style Deferreds”. I have no idea what any of that means but the webpages full of documentation for the individual code bits can be preserved fairly easily. The code can be installed to a computer through node.js.
After all the code is run, the bot can finally post to Twitter. These tweets will need to be acquired for preservation. The good news is that collecting tweets poses few legal or ethical challenges as they are posted in a publically accessible medium, negating any expectation of privacy and they are probably not copyrightable due to being too short to say anything original. Twitter’s Terms of Service allows for collections of tweets to be preserved as long as they aren’t made available online. In addition, Twitter’s Copyright Policy
mentions some types of media that might be covered under copyright, such as photos, videos, and links to allegedly infringing materials, but not tweets.
Unfortunately, there are many programs that can collect tweets and they will need to be investigated to determine which program performs the needed tasks best. The tweet preservation program will need to collect the tweets and timestamps. As all the posts will be by the bot, there is no need to have the poster’s username, unless it is a comment on the post. Also useful to have would be the followers of the bot, who it is following, and who likes which posts. The number of times individual tweets are retweeted would be nice to have, if it is possible to collect the information. All of this information would need to be in a format that is easy to access, search, and manipulate.
While the bot is creating the tweets, it checks to make sure that the tweet follows certain guidelines, including gender agreement between subjects and if a second headline is not found that matches the first. Any tweet that does not meet the guidelines is rejected and some other combination of headlines is put together. Unfortunately, there is no record of these rejected tweets, so it would be impossible to collect but they would shed some light on the bot’s tweet creation process.
How important is Twitter?
The posts on Twitter are how the majority of the world sees the bot. The program used to save the tweets will not also save the experience of using Twitter. Even though the interface is probably in several preservation collections, preserving the original experience through a few screenshots would not be difficult. Preserving the Twitter experience is not vital to preserving Two Headlines but it will provide the users with the context in which the bot was originally encountered. Creating a few screenshots will probably fall under the fair use exemption in copyright.
As the bot is reading google news to get the headlines that it mashes up, preserving them would be an interesting addition to the collection. However, it would be outside the scope of the current project and require too much additional work. The day’s news will be archived to some degree by other institutions and the internet so that the events that created the tweet can be determined and understood.
Any thoughts on the matter?
Two Headlines has had an impact on the bot creator community, in part because it is used as a teaching tool. While it would be nice to preserve the many news articles and blog posts have been written about the bot and the comments of the creator, Darius Kazemi, and other bot makers to provide some context around the importance of the bot, the effort required may make the task unfeasible to attempt to collect more than a handful. Bot makers that would have interacted with Two Headlines would need to be contacted and to answer questions about their views about and connection to bot. Surveys typically have poor response rates. Any response that is received from a survey could be added to the collection, but there is no way to estimate how many people will be contacted nor how many responses will be received.
The news articles would be trickier to preserve, as there are copyright permissions that would need to be acquired. It would also be impossible to say that all information available about the bot was found, as there are many websites on the internet and only so much time to find them.
- Preserve as close to a working version of the code as possible using the code available on GitHub and the install for node.js, cheerio, request, twit, and underscore.deferred in a location that the users can access and use.
- Create screenshots of Twitter.
- Contact Darius Kazemi and any other bot creator for statements or interviews about Two Headlines and its importance.
- Any discovered news articles and web posts copyright holders will also be contacted to preserve their articles.
3 Replies to “Preserving Two Headlines”
The rejected tweet idea is very interesting; there’s almost a proto-AI to it since it has to check for noun/pronoun agreement. It has to know/learn what gender a person is or identifies with and how that person reflects that choice in his/her name. Taylor Swift is female, but “Taylor” was a boy’s name until the late 80’s. Caitlin Jenner was famous before openly selecting a new gender. And the bot has to understand these social norms in a media construct – headlines – famous for being flippant about appropriate forms of address (literally I was reading a book from the 30s with a short rant about how news headlines were never giving people their full titles in their headlines; there’s a similar bit in a Sherlock Holmes story about this too).
The question is, if the code is available to anybody, would there be a way for someone with adequate coding ability to alter it to collect the rejected tweets for the interested computer scientist?
You’ve done a great job poking around and figuring out many of the components of Two Headlines. It’s great that you are planning to grab many of those dependent libraries too so that a future user can pick apart how the bot itself worked as well as how it used and depended on other open source tools. Given that the bot depends on the Google News API and the Twitter API it might also be good to go and grab some of the documentation for how those both worked. One could even imagine grabbing some chunk of the Google News API’s data so that a future user could try and mimic that as a source to feed into it and experiment with how it functioned. I would also suggest that getting at least some batch of the tweets it produced would be useful too. That is, to go beyond just getting screenshots. If you didn’t want to mess around with the API, you could even just scroll back through a bunch of the tweets on the twitter account page and then either use Firefox’s “Save this Page” button, or just copy and paste a bunch of them out of there into a text document. If Darius was up for it, you could also ask him if he could export an archived copy of the tweets that the account has generated since Twitter itself has a way to let a user export an archive of their own tweets. In any event, you are well on your way to a great project and approach here.
It’s really conscientious to contact news sites before preserving their stories, but I wonder if responses wouldn’t be knee-jerk denial. What if a news organization didn’t quite understand what you were up to, or what Two Headlines is about, and denied permission in order to buy time? Or do you think being open about the project and its intentions would instead build goodwill and support? Wondering if this is a good time to capture first and answer questions later, bringing fair use into the conversation.