One Reply to “Draft: Topic Modeling Foreign Relations: Implications for U.S. Policy in the Middle East”

  1. Kudos again on going for it with topic modeling! I reviewed the spreadsheets you sent over and I think I may have sorted out what was going on with interpreting the second sheet. Once you convert all the fields for each document over to percentages then I think it becomes more straightforward to see how the columns report the percentage of each document associated with each of the topics. I think that is how to read that, but I’m not 100% sure. What does seem to be clear is that you are getting some percentage fit reported for each of the individual topics in the rows in relation to each of the individual documents identified in the columns. So be in touch if you would like any more help on that.

    Some notes on specifics in your paper, instead of “I argue that topic modeling allows historians to generate an index of a vast corpus of digital texts, locating important topics and themes without having to read through all the material.” in the beginning, think about framing your paper as much more exploratory. That is, scholars have not been using topic modeling as a method in diplomatic history, your paper is about exploring how this technology might be of use to dealing with the scale of documentation that is now available. What’s great about this kind of framing is it significantly lowers the barrier for the work, you don’t need to show that topic modeling is great or useful, you just need to explore how it is or isn’t and what kinds of new questions that using topic modeling might raise.

    Your literature review section on computational humanities work and topic modeling is in good shape. I think you nicely set up how and why these techniques would be useful for the kinds of diplomatic history resources you are studying.

    It would be good to provide some more context on why you chose to run a 40 topic analysis and how and why the results would likely have turned out differently if you had gone with a different number of topics.

    The analysis section is coming together. I think it’s going to be really helpful for you to be able to start moving back and forth between the close and distant reading parts of this by looking at some of the individual documents that strongly fit with the various topics.

    It is great to see you drawing attention to what you found surprising about the topics in the corpus. I think it’s similarly interesting to see how you are identifying things that aren’t showing up as topics that you thought you would see. For those absent things, you might want to go and do some full text searches of the corpus of documents and see the frequency that some of those terms appear outside the models. It would seem possible that there might be batches of the texts that get into issues around religion but they might just not hang together enough with a set of co-occurring words to show up as a topic.

    Overall, this is great work. You’ve taken a challenging new analytic tool, implemented it against a novel corpus of primary sources, and started to puzzle through what you can make of the results. I think if you keep at this you could end up with a piece that you could ultimately publish as a journal article.

Leave a Reply

Your email address will not be published. Required fields are marked *