Blog Post #5

When it came to creating my own tidy data that took information from the American Philosophical Society’s “Curator’s Record of Donations to the Cabinet, 1769-1818 Volume 1: 3 February 1769-20 1818” I took great care to review Hadley Wickham’s article, “Tidy Data,” Journal of Statistical Software 59, no. 10 (August 2014), especially the three principals he uses to define his concept. Understanding Dr. Wickham’s principle in theory versus putting it to use was a bit of a learning curve for me when I dug into the data myself, only because I had never taken information from an archive and transcribed it on my own before.  I had the opportunity to categorize data into a tidy data set in the classroom setting with a small group, but it’s certainly a much different beast alone! I found myself entering columns and rows as I went through the record, then realizing that I should add another column as I came across new bits of information.  

Although the task was daunting the standardization set forth by Dr. Wickham helped greatly, I wanted to create a repository for data that could be looked at and understood both structurally and would be easy to extract information from with ease.  In this way treating a primary source as data allows the source to become structured in a way that allows for analysis.

Handling the primary source and displaying them into data is a very time consuming, but rewarding task. I saw firsthand how cumbersome detailing handwritten information could be; I spent at least 5 minutes agonizing over the spelling of one person’s name (which eventually I resolved by searching for them in an online search engine).  Another laborious exercise was trying to infer what the red and blue check marks meant on the Curators records and deciding if I should or should not include them, or if they were meaningful, or if my exclusion of them would create a silence mentioned by Dr. Trouillot, and what the implication of that would be! In the end I left the “checkmark” question out of my table because I didn’t know how to quantify it and tie it into the meaning of my table.  On how primary sources can be displayed into meaningful data, I can see where creating a tidy data set, be it in an excel document, a map, a chart, a pie graph, as evidence in a journal, etc allows the audience to grasp something fundamental about the source beyond endless and un-contextualized.  Organizing information into data sets provides a means to extrapolate, so it follows organizing information into tidy data sets provides a faster and more coherent means to extrapolate.

Leave a comment

Your email address will not be published. Required fields are marked *

css.php