If your friends were to share a news story with you, like the popular Dollars for Docs series, might you be the type of person to skip the step of downloading the accompanying data, opting instead to skim the highlights of the news report? I was completely that type of person for the longest while, and I’m still mostly that kind of person today. Mostly. I’m starting to understand that there will be times when I’m interested in a topic, but there are no comprehensive, neatly prepared highlights to brief me on that topic. For this reason, I’m forcing myself to slowly change.
These days, when I come across an interesting story accompanied by a data file, I make an attempt to check that file out, even if I’m just skimming the file itself. This is meant to condition me for instances when I want certain information, but all I get is a file. So far, changing my media consumption in this one way hasn’t been so bad. If you’d like to change with me, I have a new exercise for you. “Who’s Afraid of a .CSV File?” is a Kaggle-hosted exercise where you’ll grab a data file, but you’ll do a little more than skimming it. You’ll specifically examine a .csv file containing the responses to the Fiscal Year (FY) 2021 Public Libraries Survey (PLS) administered by the Institute of Museum and Library Services (IMLS). You’ll get a general sense of public library characteristics across the United States, and you’ll also see some examples of ways to use the survey data.
The exercise might be a little easier if you’ve started the “Fact-checking Me on Hoopla” exercise that focuses on data cleaning with R, but that doesn’t mean newcomers can’t follow along. In fact, the exercise is meant to demonstrate the usefulness of rudimentary statistics more than it’s meant to be an exercise in R programming. The code is not the focus. And you don’t need to know statistics either; you’ll pick up some concepts as you move through the exercise. I think you should try it.