In a previous post about the value of public libraries, I claimed that Hoopla partners with 2,918 libraries (public and otherwise) in the US, but I didn’t say where I got that number from. The number is a count derived from map data, which was provided by Hoopla’s support staff. I say that my count was “derived,” because the map web page provides its own count that I couldn’t use. The page currently shows 3,310 Hoopla library partners worldwide, visible both as points on the map and as names in a list.
The first problem with the face-value 3,310 count is that I only wanted a count of US libraries for the blog post. Second, without having to scroll very far down the list, I could see a lot of same-looking names. Some locations, sitting states away from—and possibly unaware of—each other, simply happen to share a name, as is the case with the “Anderson Public Library” locations. Other locations seem to be duplicate entries on the map, as is the case with the “Adams Public Library System” locations, “Allen County Public Library” locations, and many others. Ignoring the presence of duplicates didn’t feel like the right thing to do.
And the list of libraries on the page is only a list of names. The details needed to tell libraries apart, like physical addresses and individual library website links, are only visible on the map. Clicking and counting map points one-by-one was obviously out of the question, but I knew there was something else I could try.
So I resolved to get a decent-looking count just for you, reader. I think I got it done with the help of a programming language called R, and I’ll show you how. The process that I applied to Hoopla’s map data is called data cleaning, and I’ve turned the steps of my process into an exercise for you. “Fact-checking Me on Hoopla” is a multi-part exercise organized into Kaggle-hosted notebooks, which will allow you to run R code and watch the map data get cleaned. Using R, you can follow my steps and evaluate the rigor of my process to arrive at a 2,918 count. Decide whether you think I cleaned enough, overdid it, or underdid it. You don’t need to have any prior knowledge of R to do this, but you do need to create a free Kaggle account. While logged into Kaggle, use the Copy & Edit button to create your own workable copies of the original notebooks.
I’ll try to make “Fact-checking Me” as straightforward as possible, but I don’t want you thinking that I think that data cleaning is practical for everyday web consumption. If you’re not a specialist in some field of work somewhere, doing something highly technical with data, it shouldn’t even have to cross your mind. For better or worse, though, web interfaces will only get more fanciful over time, which means that web content will become more difficult to parse; and I wonder about accessibility. Maps can be interesting to look at; but, depending on what you’re trying to do online, looking might not meet the full extent of your needs. Someone with no investigative or analytical background might still need access to a count of something like those Hoopla libraries.
I personally somewhat resent having gone through an elaborate process just to get a count of some libraries from a web page. At the same time, I realize this is possibly the way things ever will be. You and I might as well pick up new tricks and keep up with all of this progress going on around us. Right? 🙃
Try to have fun with the exercise. Click here to start.