PressForward and Ethical Content Scraping

What is PressForward?

Roughly speaking PressForward is a back-end WordPress plugin developed by the Center for History and New Media that allows users to aggregate, curate, and redistribute web content pulled from RSS or ATOM feeds, or through the use of PressForward’s Bookmarklet tool. Once a site-runner has added their desired feeds to the plug-in, or has marked content for rehosting through the Bookmarklet tool, they can review specific pieces, add metadata, format them for Word Press, add any categories or tags they wish, and finally publish the content on their blog.

Screenshot of the PressForward Dashboard, taken from the PressForward User Manual.

Aggregate

There are a few ways to start collecting content for rehosting through PressForward, but let’s start with web feeds. RSS (Really Simple Syndication or alternatively Rich Site Summary) basically takes unique text files from websites that a user would like to “subscribe” to that, when uploaded to a feed reader program like Feedly or The Old Reader, then allows users to create their own feeds of automatically aggregated content. So instead of visiting a bunch of blogs individually, you could just have posts from all of them pulled into the feed reader program to create your own newsfeed. ATOM on the other hand is a more recently created alternative format to RSS. Linking these feeds to PressForward creates a feed of content within your WordPress site (visible only to you), from which you can begin to select specific content for rehosting.

The second key way to collect content is through PressForward’s “Nominate This” bookmarklet. While RSS feeds pull content from designated sites as it is published, “Nominate This” allows for a more intentional selection of specific content from specific sites. Say you found a cool blog post on a site you have not incorporated into your RSS feed, or for which RSS is not available. In this case you can just click the “Nominate This” button on your browser’s toolbar and send the selected content to your WordPress Drafts section manually. If the site does have an RSS feed you are unsubscribed to, this tool also offers you the option to do so if such a feed exists.  

Nominate This bookmarklet in action. In this instance the old homepage of the Center for History and New Media website has been pulled for republication. Image taken from the “Installing and Using the Nominate This Bookmarklet” section of the PressForward User Manual

Curate

 Once you’ve got your feeds set up/articles from other sources nominated, it’s time to curate. At this stage you can start picking out content from your feeds for republication on your blog. There are two key panels to use here, the “All Content” panel and the “Nominated Panel.” The former contains all of the content pulled from your RSS feeds that are pending review and nomination, the latter contains content that you have marked for republication. At either stage you can use the Reader View option to open the content to check for readability and any errors in the text or formatting before sending it over to either the Nominated Panel or to a Word Press Draft.

Redistribute

Now that you’ve sent content over to the drafts section all that remains is formatting/editing the post and publishing it to your blog like any other post. Which brings us to the overarching goal of this plug in: to disseminate scholarship, blogs, digital projects, etc. to a wider audience by allowing bloggers and site runners to curate their own informal journals so to speak. Unlike content-scrapers, which have a less than stellar reputation among digital content creators, PressForward is not intended to be a platform by which people can collect and republish content to their sites in an unethical drive to increase their own site traffic (and ad revenue) by rehosting others’ unattributed work. Yet when you get down to brass tax I don’t really think its all that far off from such tools.

PressForward does a few things to encourage responsible aggregation and republication: it “offers the option to auto-redirect back to the original source,” it “retains detailed metadata about each aggregated post,” and “the original author’s name will appear with a republished post if you use WordPress default themes such as Twenty Fourteen.” The FAQs also emphasize that author consent should be sought before republishing. Reading through the plug-in’s Manual and FAQs I noticed that there are a lot of “ifs” involved when it comes to the display of metadata. If users want to display more metadata, they have to use Custom Fields. If users have the overwrite author option enabled (it is by default but can be shut off), the author of the original post will be displayed on your rehosted site. Links to the original post are contained in the new Draft post, but can be deleted if the user chooses to do so. None of these options seem to impose a strict requirement that users include metadata in their final posts. If a user does not “use default themes,” will the metadata still appear?

I don’t mean to be overly critical about PressForward in this respect, especially as there are far easier ways to go about plagiarism, and chances are digital humanities scholars aren’t exactly the same level of target for content-scrapers as say artists or tech-reviewers. But, I do think the conversation surrounding the ethics of content-scraping and rehosting is an interesting one to have especially if we are talking about shifts in the landscape of scholarly publication. While scholars may not be producing their content for ad revenue as other types of digital producers may be, is it ethical for a “big” blog like Digital Humanities Now (which does actually publish a full list of the feeds they are subscribed to) to pull content (and views) away from their pages? Is re-hosting really all that different from linking to a blog post as a form of citation (I think it is)? While it could certainly be argued that there are philosophical differences in the motivations behind publishing a scholarly article and a swing-cover of Nirvana’s “Smells Like Teen Spirit,” shouldn’t scholars still have a right to their labors?

The undead scholarly monograph and avoiding obsolescence

In Kathleen Fitzpatrick’s first book, The Anxiety of Obsolescence: The American Novel in the Age of Television, she put forward the argument that the claims people and groups make about the obsolescence of certain cultural forms ultimately reflect more on the people making the claim than they do on the reality of the situation, that these claims typically serve certain political and ideological goals. After going through a traditional peer review process and revising her manuscript, Fitzpatrick submitted it to the scholarly press that she had been working with—and they informed her that they would not be publishing the book. It was through no fault of hers, they explained, just that the marketing department didn’t think they could sell enough copies even to return the book’s cost.

Her reflections on this experience appear throughout Planned Obsolescence: Publishing, Technology, and the Future of the Academy, in which she details the changes necessary in academic publishing to adapt to new technology and to avoid obsolescence. Fitzpatrick refers to the scholarly monograph as being “undead”: it’s no longer a viable mode of communication from a financial standpoint, but it’s still the gold standard in the academy, the most important form of output by which a scholar’s contributions are judged. The changes needed to keep the field of academic publishing alive, she argues, are not only technological, but also institutional, social, and intellectual, and she details what she thinks needs to happen and why it will work in the rest of the book.

Reforming peer review

Fitzpatrick jumps right in with urging change in the peer review system, which she acknowledges is seen as one of the most important institutions within the field of academic publishing. She takes issue with several facets of the current system and proposes far-reaching changes, mostly involving a new system of “peer-to-peer review.” In this system, as Fitzpatrick models it with various case studies, authors put their manuscripts online and make them available for open commenting. In contrasting the current peer review system with this open review system, she addresses a variety of issues, including:

  • The anonymity of reviewers. Fitzpatrick acknowledges that this anonymity is supposed to enable reviewers to share their true thoughts on a manuscript without hesitation, but points out that when a name is attached to a comment, you have a better sense of what that comment is worth to you. If it’s coming from a peer whose advice on a topic you specifically value, it’s enormously helpful to know that.
  • The number of reviewers. In the traditional peer review system that Fitzpatrick describes, you receive two or three anonymous reviews. In the case studies that she details of authors putting their manuscripts online for open review and commenting (including when she did this on her own blog, with the manuscript for Planned Obsolescence), she finds that these manuscripts receive a much greater breadth of reviewers. This means that the author hears more opinions and can get a better sense of whether something seriously doesn’t work or whether it just rubbed one person the wrong way.
  • The opportunity to respond. As Fitzpatrick describes it, the traditional peer review process is not a conversation—it’s closed and compartmentalized. With open review, the author has the opportunity to act on people’s feedback more directly, responding to them and discussing it with them. This allows for a more collaborative process (and it’s this emphasis on collaboration that leads Fitzpatrick to spend her second chapter destabilizing the importance of individual authorship in the academy) and for more meaningful feedback.

Community focus

Something that comes up repeatedly throughout Fitzpatrick’s book is the importance of establishing communities for academic publishing. Her open review system depends on the establishment of a community of scholars willing to comment on each other’s manuscripts. Her arguments about challenging the notion of individual authorship in favor of supporting a wider scholarly network promote community and conversation over the idea of the scholarly monograph as something that enters the world as a finished product. Her arguments about digital preservation, about staving off the physical question of obsolescence, center around establishing a community that sets standards, stores metadata, and ensures continued accessibility of texts.

Indeed, Fitzpatrick’s response to the looming threat of obsolescence for academic publishing is that the academy as a whole needs a substantial overhaul. She’s challenging a culture that she calls “We Have Never Done It That Way Before,” both by historicizing such venerated ideas as peer review and individual authorship and showing that things have not always been the way they are now and by proposing profound changes to the system. One of the common threads between these proposals is that they all involve working together and building a stronger community. When discussing preservation, Fitzpatrick emphasizes the importance of forming these social systems, encouraging scholars to “take advantage of the number of individuals and institutions facing the same challenges and seeking the same goals.” This advice would seem to apply to academic publishing as a whole, given Fitzpatrick’s proposals.

Considering the future

Fitzpatrick’s recommendations are myriad and diffuse, but the broader strokes of her argument can be summed up as follows:

  • Adapt the current system of closed peer reviews, utilizing an open, peer-to-peer review system instead that allows for more meaningful dialogue and collaboration
  • Revise our understanding of individual authorship to acknowledge that texts arise from conversation and collaboration
  • Change existing publishing structures to reposition texts as jumping-off points for further conversation and collaboration, rather than as solitary works
  • Cultivate communities to distribute and preserve these texts
  • Reevaluate the system of university presses—how they work with their institutions, how they work with their institutions’ faculty, and what their ultimate aims are, putting aside the question of financial success

Obviously, these are pretty big goals, and there are plenty of questions to be asked about how any of these can be achieved. (Fitzpatrick seems to think that changes will occur as they must when the system simply can’t hold itself together anymore.) She shares a number of “looming unanswered questions” at the end of the book, but the one that I’d like to discuss here relates back to her continued emphasis on building communities:

“How can we get scholars to accept and participate in these new publishing and review processes?”

How, if we are to build these communities, can we secure buy-in from a field so committed to its culture of We Have Never Done It That Way Before? How can we convince scholars, for instance, to share their unfinished works online for peer-to-peer review, in light of the omnipresent fear of “scooping,” in light of the rigidly upheld standards of individual authorship? On what level do we begin to reform the system? Fitzpatrick has worked to establish mechanisms that can introduce these ideas as the co-editor of MediaCommons, but how can we get scholars to accept them as valid? Is it possible? Surely it is, right?

The Programming Historian

The Programming Historian is a website which publishes “novice-friendly, peer-reviewed tutorials” designed to help teach historians “digital tools, techniques, and workflows.” It is aimed at helping historians who identify as “technologically illiterate” to become programming historians. If you’re a historian and you want to know how to set up an Omeka site, or edit an oral history using Audacity, then The Programming Historian is a place to learn how and where to get started.

Over half of the lessons have been translated into Spanish. If you speak French, you’re out of luck at the moment.

Clicking on the English-language portal presents us with three options: we can Learn, we can Teach, or we can Contribute. Learn takes us to the lessons and Contribute provides links to pages with information for those interested in writing a lesson or becoming one of the reviewers. Teach has little beyond a link to provide feedback on ways to make the lessons better suited to being used as teaching tools. We’re going to Learn today.

Clicking on learn brings up all the lessons that you can access. There are 78 lessons available in English, which is quite a few to browse through.

The Programming Historian provides a few ways to organize the lessons to make it easier to find what you’re looking for. At the top, you can click on buttons to display all the tutorials that are tagged with one of five categories: Acquire, Transform, Analyze, Present, and Sustain. 30 lessons fall under the category of Transform, making that the largest of the five categories.

The next way to sort the lessons is by more specific criteria: for example, you can click to see all the lessons tagged with “Web Scraping” (only 6) or lessons that have to do with the  programming language Python (19 lessons – second only to “Data Management”).

Finally, you can sort the lessons by their publication date or by their difficulty. Lessons are given a difficulty – Low, Medium, or High. These difficulty lessons appear to be assigned based on the difficulty of the subject matter covered by the lesson, not the difficulty of using the lessons to learn the programming tool.

Here’s half of the lessons tagged with “Digital Publishing”

Let’s click on the lesson “Up and Running with Omeka.net”. This is a lesson designed to help historians set up their own content on Omeka.net.

The lesson is all text and images – no video or audio. The lesson reads like a longer version of one of our digital tool reviews, featuring walkthroughs of how to use the digital tool. When I say “longer,” I do mean significantly longer – here is the table of contents for the Omeka.net lesson:

And here is what the content of the lesson looks like:

The lessons all seem well-written and informative. However, they are not infallible: several lessons have notifications that reviewers have caught inaccurate information. Rectifying these errors is dependent on the website administrators contacting the authors and then having the authors correct the mistakes in their lessons.

Overall The Programming Historian seems to be a very helpful resource for any historian looking to expand their technical skills.

Data visualization and scholarly publishing: Scalar

Scalar is a University of Southern California and Alliance for Networking Visual Culture project that aims to create a platform for the creation and management of scholarly documents in a visually appealing and modular way. The main draw to Scalar is its use of a  multimedia, multiplatform approach to data visualization.

A good example of born-digital, Scalar provides scholars with the tools to create unique and personalized presentations of their work in a format that is from online and for online. 

The basics

As a scholarly platform, Scala is naturally intended for members of institutions like universities and museums. In order to prevent overload of their system or general abuse by people who are simply interested but not for scholarly purposes, a registration code is necessary. As a result, it wasn’t possible to test the platform as it doesn’t seem to be monitored as much as one would hope. Still, someday my registration code will come.

That said, as a born digital platform, Scalar does have online tutorials and overviews of their platform. 

A registered user that has received their registration code in a timely manner would be able to perform some fairly robust digital data presentation feats with Scalar thanks to its scholar-oriented programming. The most important of these are the importation of products from YouTube/Vimeo/etc. and the ability to utilize multiple types of formats like Adobe Illustrator, PDFs, and so forth. In general, the formats and functional programs that scholars use frequently. 

Scalar is also fairly compatible with a wide range of web standards that utilize HTML5 when needed and presenting information in a way that can be coded in a portable, intelligible way that can be carried over to other platforms that utilize these standards. This carries over to the front-end and API, which both forego proprietary, walled-off code, for an extremely functional and forgiving interface that allows both live interface with other platforms, but also code that can be understood by users of other platforms. 

The API, advertised as an Open API, provides the user with a means of presenting their works on their own website and for sharing it in a way that can be modified and posted on other platforms and websites. This is unique in the world of scholarly platforms as most are walled-off and opaque when it comes to code and functional interface with other platforms. You know who I’m talking about. 

As far as crafting your work goes, Scalar has clearly made an effort to open up the system to the kind of visualization and functionality that is oriented toward the modern visually appealing, born-digital work that has been increasingly common. Prior to Scalar, hand-coding or using a proprietary, closed source platform was the norm, or scholars were presented with a low-tech and low-functionality platform. In either case, scholars were stuck.

Here, in Scalar, the user is given the ability to index reams of data and feed them into visualizations. Data can be tabbed to a specific page or can be global. A page can draw from both its native data and the global data, allowing for uncomplicated management of contextually sensitive data that requires an approach from its own base in order to keep it well-organized. The visualization from there is simple plug-and-play, using multiple types of visualizations like charts, webs, and tables.

A scholar will be able to create a multiple-page, visually robust document, but it can then be presented for one of the most important aspects of scholarly writing: review.

Scalar is different from a lot of platforms in that it gives readers and reviewers the ability to comment and offer feedback in numerous ways and on numerous pages with little fuss. The feedback capability is a core function and can be disabled for release of the final, reviewed and edited product. 

Impressions

As I only had access to online documentation, it is hard to make an honest assessment of the functionality within the system. On the other hand, it provides a fine selection of presentations that use Scalar, including Seth Rogoff’s The Nature of Dreams (http://scalar.usc.edu/anvc/the-nature-of-dreams/index).

As a platform for scholars, Scalar gives a lot of deference to the needs of scholars, and as such it models a good deal of important best practices for this kind of platform and interface. The most important of these, data and narrative organization, is a core consideration. An open API and portability are vital, and scholars should endeavor to make sure they present their information in a way that can be moved and molded to numerous platforms: most scholars are not limiting themselves to a single platform, so platforms like Scalar should play fair with other platforms (even if those platforms don’t play well with others). Scholars should look for—and demand—open APIs and intelligible code that can interface with other platforms.

As far as data and visualization, Scalar gives the scholar an ability to massively sculpt data landscapes that are otherwise hard to condense. Here, Scalar models another best practice, and that’s giving the scholar multiple levels of visualization and portability within the platform. It also helps break the scholar free of the classic digital visualization standard of simplicity under siloed data presentation. There is no need here to force all data into one bucket, there are multiple approaches to the data management and multiple data buckets are available, along with a global one.

Finally, Scalar opens up the scholarly world to a truth that’s necessary to acknowledge: publishing and data presentation are increasingly going to be “born digital” and it is vital to learn how to create such functional documents. Instead of relying on just classic presentation or proprietary platforms, Scalar gives scholars the ability to functionally interface with the classic and the digital, all starting from a digital base.

Criticisms of the platform are few for me, chief among them being the difficulty in access. For a platform that endeavors to not wall off the data and presentation, access should be much easier. Again, waiting for that registration code. Another criticism is the subtle suggestion that classic methods of publishing will be supplanted by this type of platform. Both, digital and classic articles, will be the norm for at least another generation. Within years, most digital standards will be obsolete. It is better, then, to work from a belief that you are modeling best practices while offering the best available platform in the current digital era. 

Otherwise, Scalar is a solid 7/10 for me.

MLA Core

What is MLA Core ?

The landing page for MLA Core gives this description: CORE is a full-text, interdisciplinary, non-profit social repository designed to increase the impact of work in the Humanities.

So what does that mean?

Core stands for Common Open Repository Exchange. Funded by the National Endowment for Humanities, MLA Core is a collaboration between the Modern Language Association and Center for Digital Research and Scholarship at Columbia University. Core, which is currently a beta release, is basically a repository of open source scholarship housed by the MLA Commons, or the scholarly network for MLA members.

Through this initiative members have access to:

  • Upload a variety of objects and formats
  • Insert metadata for objects
  • Add additional authors
  • Assert CC copyright
  • Get a DOI or insert publisher’s DOI if published
  • Associate object with MLA Group
  • Comment on and discuss others’ uploads

Visitors to the site (aka people who aren’t a member of MLA Commons) have access to:  

  • Browse deposited material
  • Perform full search & faceted browse of deposits
  • View author’s Commons profile
  • Download deposited material      
What is special about Core?

Here’s what they have to say:

Not just articles and monographs: Core includes course materials, white papers, conference papers, code, and digital projects

Community notifications: Send instant notifications about the work you’ve shared to members of your Humanities Commons groups.

Citation and attribution: All items uploaded to CORE get a DOI, or digital object identifier, that serves as a permalink, citation source, and assertion of authorship all in one.

Licensing: Select the Creative Commons license that best meets your needs.

Archiving for the future: Files deposited in CORE are stored in the Columbia University Libraries long-term digital preservation storage system.

Open-access, open-source, open to all: Anyone can read and download your work for free (no registration required)

The great thing about the concept of CORE is that you can also use it to upload peer-reviewed journal articles, dissertations and theses; works in progress; conference papers; syllabi; abstracts; data sets; presentations; translations; book reviews; maps; charts; etc.; and you remain the owner of any work deposited here. This allows for a database of diverse scholarship, which is all open source. Also the collaborative aspect, which allows users to comment on, upload, and give input also helps to bridge the gaps in scholarly communication.

So how does it work in practice?

I decided to give the database a try. To upload scholarship, you must become a member of MLA Commons, either by being an MLA member, or joining the open humanities commons network. Membership to the MLA costs money based on your salary (anywhere from $26-$359), or if your a graduate student ($26). If you create a free account through the humanities commons, you have access to CORE, but not as much as full MLA members. I created an account through the open network:

At first glance, the form to upload things seems pretty simple!

I decided to try and upload my research paper from my Civil War and Reconstruction class.

It took a total of 5 minutes to upload my paper–super easy! It looks like there is a review process as well.

Now that I’ve uploaded my paper, I can find it in my deposits:

Overall, the the process of uploading scholarship seems super easy. I wonder how visible this will be to other people? A database is only as good as its search function, so I am going to test that out next.

Searching for scholarship on MLA Core

When you click on “find open access materials” you are brought to this page:

It automatically sorts deposits starting the the newest ones at the top. As you can see, the top three most recent are already fairly different topic wise, which is a testament to all the different academic fields that are using the Core.

Keeping with the theme, I typed in “civil war” to the search bar. It came up with 459 results, all of which (besides mine) seemed only tangentially related to civil wars.

I couldn’t seems to find an option to do an advanced search, other than the side bar which allows to narrow results by date, item type, or subject. There was also no option to sort the results by most relevant, only by most recent and alphabetically. I tried to search again using boolean phrases, hoping to narrow my results. I typed in “civil war” AND “united states” into the search bar. It turned back no results, suggesting it may not have the capability to process boolean phrases (or no one else has uploaded papers about the American civil war, which I doubt).

So, it seems as though the search function for Core is a little lack luster. Nonetheless, there are some other cool features. You can join different groups based on your areas of interest.


Whenever you upload something to Core, groups you are a member of will get notified by getting an email (which is a setting you can turn off), and by appearing in the activity feed of the group. I joined the Digital Humanists group:

You can also search for Core member’s personal websites, as well as create your own, using wordpress:

Overall, MLA/Humanities Core works as sort of a social network for scholars of really any discipline. It offers an easy way to communicate with people in your field as well as people who aren’t–working to open the lines of scholarly communication. While it the Core depository’s search function doesn’t seem great, the platform is still in Beta form. The website even offers a roadmap of whats to come. So, despite this minor flaw, this type of transparency combined with the overall concept of an academic social network results in what could become a highly effective platform for scholarly communication.