Blog Post #3

The three principles of tidy data, as defined by Hadley Wickman in his article from August 2014 from the Journal of Statistical software are that “each variable forms a column, each observation forms a row, and each type of observational unit forms a table.”  This outline of how to organize data creates a standard for information to be displayed in a way that is the most comprehensible, both in a visual way (physically) and in a meaningful way (conclusions or analysis about the data may be extracted).  As the author correctly points out, most information and data when first gleaned from a source is unorganized.  Being aware of how to disentangle, codify, re-group (without altering the information), and displaying data will be an extremely difficult, but rewarding.  The author lists several possibly pitfalls and potential solutions to said problems, when I start engaging with data sets I can see where having a handy understanding of this section will come into play.  During the small group breakout last Tuesday it became apparent that creating tidy datasets would be a time-intensive exercise, but that once the dataset is created it pays dividends when trying to extract conclusions.  For example using the data could answer questions such as, what were the main demographics of people who perished in the Capitol collapse?  

In organizing this information our group took a straight forward approach, listing all the fill-able sections as columns starting with “sex” and so on, with the rows being the “Name of the Deceased, in full.”  If there was an observational unit that could not be filled due to being left blank in we left it as “n/a.”  Having the full breadth of knowledge would be helpful and tailor-able later, it is better to have datasets that are too big that can be reduced rather than one that is not capturing the full picture.

As far as using this method, I’m already using it at my job where I’m having to analyze huge sets of data to draw conclusions, however, I can see its usefulness for my degree path as well. In my research as a digital historian and scholar, this method will help me engage with datasets that are more lucid than merely slapping info in pie graph or an excel document. Understanding Hadley Wickman’s definition, principles, and potential pitfalls of data will help me be mindful when I make my own “tidy” datasets.

Leave a comment

Your email address will not be published. Required fields are marked *