This is the latest in a series of blogs from the teacher cohort.
By Lee Cristofano
Over coffee one Saturday morning, I was struck by an interesting article of data journalism from the newspaper: What is the most common name for a licensed Allegheny County dog in 2016? (Before reading further, I invite you to guess…)
I was intrigued by this innocuous question. After all, did the newspaper task some reporter (or, more likely, intern) to survey pet stores looking for dog food customers to ask for their dog’s name? How does one get insight regarding this question?
Fortunately, we are living in times where data such as this is becoming readily available to the public, the so-called Open Data revolution. Government agencies, in their efforts to become more open and transparent, are making datasets available freely for the average citizen (and data nerds like us!) And it’s now possible for nearly anyone to answer the question above. (Did you guess “Bella”?)
And so, we asked our high school class that very same question: What do you think the #1 dog name in Allegheny County was last year? We had many good suggestions – Buddy was a class favorite – but as the teacher, we were less interested in their answer and more interested in how the class might answer the question.
Thus began our journey into the process. Finding the dataset; downloading it so it might be analyzed, sorted and counted; creating a meaningful and engaging data visualization; and ultimately telling the story of what the data was trying to tell us. Along the way, our students learned how to use many software tools and data visualization programs to present their stories. Of course, the best of stories always end with more questions.
So for our example, students used an Excel spreadsheet to build a pivot table to count the number of unique dog names, sorted the list for the top ten, and visualized the results in various charts, graphs, infographics and the like. Winner: “Bella”.
But the elephant in the room: Why that name, why Bella? For how many years was it in the top ten? In what ZIP Code would we find the largest number of Bellas? How many Bellas were German Shepherds, and how many were Labradors? (It was obvious, more study was required to answer those questions, and down the rabbit hole we went…)
A fun and entertaining project by all accounts, and the students had some fun with the data and reaching their conclusions. But if our metric is to measure this project’s impact on society or a student’s life, we rate it low.
But what if the dataset instead dealt with accidental overdoses, or car crashes, or incidence of crime in our neighborhoods. What about air quality, obesity rates or census data? Are there stories to tell lurking in those datasets? Can we find the patterns and trends lurking in this data, and will we know what to do with it?
What about data that students themselves generate from, say an air or water quality monitor, or a home radon kit? Can high school students use their talents and skills to obtain, analyze and visualize this data? Can students really discover trends? Can teenagers really make informed conclusions, decisions and advocate for change based on their analysis of these datasets?
We believe so, and that is our goal as we hope to equip our students with the tools and techniques needed to explore the world of data and how it can be an agent of change for their lives and their community.
*Lee Cristofano co-teaches an elective class for Data Analtitics with Emily Smoller at Bethel Park High School.*