Note: This is a much older post of mine that contains an approach to coding I no longer use. I keep it up here as a reminder of the importance of growing coding skills. One of the key pieces of feedback I received from this post was to look into looping and interation. As a result, I learned about a package called {purrr}, which was a great solution for this recommendation.
This evening, the Ed Trust is hosting a discussion on twitter at 8PM ET on school funding inequity, so I thought I would do a code-through of a visualization I made this past summer while researching the legacy of a famous school finance Supreme Court case that ended up determining that Americans had no fundamental right to an education. In short, it shows the shares of funding over time for each of the three main sources of funding for public education in America: federal funding, state funding, and local funding (mostly property taxes).
Why Visualize School Finance?
In recent years, much has been debated about school funding, or the lack thereof. Notably, recent studies coming out of places like Texas A&M and the US Commission on Civil Rights have highlighted funding deficits throughout America that not only threaten the futures of currently enrolled public school students, but our future economies throughout America.
You may not have children, but school finance impacts you regardless. And as one news executive said recently in a visit to one of my public policy classes, “school finance is the thing people should be talking about but will likely never get talked about.”
So here’s to helping people understand with clean visualizations. Alright, let’s see the map and get to the code-through.
The Map
The Code Through
While we’ll visualize the three sources of funding, this dataset will organize a few other data points of use. While I’ll do my best to address what each one is, I may overlook a few variables. If I do, you can reach out to me or reach out to the folks at the Census’ Educational Finance Branch.
While you already have the data, the US Census has been collecting data from every school district in the country since the 1960s. Using their FTP database, you can download the files here and other various datasets related to public finance. For our purposes, we’ll use their elementary and secondary school finance survey files and start from 1993 for a couple of reasons explained by clicking on the tooltips here12.
To access this raw data, you can go to the US Census FTP database where the school finance datasets are stored and download them into a folder there. First, go to the FTP database. Second, choose “Allow” when prompted and select guest when asked for authentication.
Once you do this, you should see a dialogue showing your connection status that looks like this.
After the data downloads into the remote disk, find the files with the following labels highlighted in purple below (i.e., "elsec.93.xls") and download them into a folder called “data”. Once you do that, save a .Rproj file into the same location and you’ll be able to run the code below without much trouble (I hope).
One note about my code. By no means am I an expert in R and when I produced this code, I knew way less about the tidyverse than I do now. If any other R folks have feedback, I’m 100% open to ways I can make this code cleaner and more efficient.
At this part, we’re limiting the raw data to the revenue sources, along with a few other identifiers and more granular funding sources that will get accounted into the larger buckets of Federal, State, & Local revenues but are interesting to see in our table nonetheless.
Truth be told, many databases use strange identifiers to identify what those variables represent. In the census dataset, they used some names that are easy to interpret at first glance. For instance, TOTALREV equals “Total Revenue”. However, some variables aren’t easily interpretable. For my purposes, I wanted to create more readable names for some of the variables I pulled in earlier. This code does that.
Here, we’re combining the inflation data from the BLS Education Inflation Index alongside their annual counterparts from the census files so we can multiply the index value by each year’s recorded funding amount. Once we do that, we’ll have our inflation adjusted figures and be able to visualize them.
Code
#Add Inflation Indexinflation <- inflation %>%as_tibble() %>%mutate(Annual =NULL)t1993 <-left_join(t1993, inflation, by ="YEAR")t1994 <-left_join(t1994, inflation, by ="YEAR")t1995 <-left_join(t1995, inflation, by ="YEAR")t1996 <-left_join(t1996, inflation, by ="YEAR")t1997 <-left_join(t1997, inflation, by ="YEAR")t1998 <-left_join(t1998, inflation, by ="YEAR")t1999 <-left_join(t1999, inflation, by ="YEAR")t2000 <-left_join(t2000, inflation, by ="YEAR")t2001 <-left_join(t2001, inflation, by ="YEAR")t2002 <-left_join(t2002, inflation, by ="YEAR")t2003 <-left_join(t2003, inflation, by ="YEAR")t2004 <-left_join(t2004, inflation, by ="YEAR")t2005 <-left_join(t2005, inflation, by ="YEAR")t2006 <-left_join(t2006, inflation, by ="YEAR")t2007 <-left_join(t2007, inflation, by ="YEAR")t2008 <-left_join(t2008, inflation, by ="YEAR")t2009 <-left_join(t2009, inflation, by ="YEAR")t2010 <-left_join(t2010, inflation, by ="YEAR")t2011 <-left_join(t2011, inflation, by ="YEAR")t2012 <-left_join(t2012, inflation, by ="YEAR")t2013 <-left_join(t2013, inflation, by ="YEAR")t2014 <-left_join(t2014, inflation, by ="YEAR")t2015 <-left_join(t2015, inflation, by ="YEAR")
At this point, once each of our individual datasets that are adjusted for inflation are compiled, we can merge them into one single dataset that is one step closer to being visualized.
To take full advantage of the geofacet package, I used FIPS code identifiers to help identify each state. For more on what FIPS is, check this page out.
Code
#Import the FIPS state codesfipscode <-read_csv("data/FIPSStates.csv", col_types =cols(`census id`=col_character()))#Remove Unnecessary Columnsfipscode <- fipscode %>%as_tibble() %>%mutate("Official USPS Code"=NULL)#Rename the FIPS id column to 'STATE', which we'll use to replace our current column identifying each state. fipscode <-rename(fipscode, STATE ="census id")#Add the New State ID to our Uniform Datasetfullest <-left_join(full25, fipscode, by ="STATE")#Remove All Columns Except our Major Revenue Columnsfull25a <- dplyr::select(fullest, one_of(c("YEAR","Name", "TOTALREV", "TFEDREV", "TSTREV", "TLOCREV")))#Change our YEAR column to a Date Formatfull25a <-rename(full25a, State = Name) %>%mutate(YEAR=as.Date(YEAR, format ="%Y"))#Begin Summarizing our Data by Year & Statefull25b <- full25a %>%group_by(YEAR,State) %>%summarise_each(funs(sum))#Now that we have summarized data, calculate the share of revenue by diving each of the three revenue source amounts by the total amount of revenue in that state each year.full25b <- full25b %>%as_tibble() %>%mutate(PctFED = (TFEDREV/TOTALREV)*100,PctSTA = (TSTREV/TOTALREV)*100,PctLOC = (TLOCREV/TOTALREV)*100)#Remove the Total Revenue Columnfull25c <- dplyr::select(full25b, one_of(c("YEAR", "State", "PctFED", "PctSTA", "PctLOC")))#Rename Columns with Printable Identifiersfull25c <-rename(full25c, "Federal Revenue"= PctFED,"State Revenue"= PctSTA,"Local Revenue"= PctLOC)#Transform the Data Longways for Visualization Purposesfullest25e <-gather(full25c, "Source", "Percent", 3:5)yearly_avg <- fullest25e %>%group_by(YEAR,Source) %>%summarise(Annual_Avg =mean(Percent))
Plot The School Finance Map
At this point, we’ll use the geofacet and ggplot2 packages to visualize our data. For style purposes, I’ve used a theme from the ggthemes package to make the visualization a bit cleaner, along with a few custom theme stylings. Once you run this, you should be able to reproduce the visualization I shared at the top of this post.
Code
showtext::showtext_auto()gg <-ggplot(arrange(fullest25e,Source,YEAR), aes(x = YEAR, y =Percent, fill = Source, group = Source)) +geom_area(alpha =0.8, position ='stack') +scale_fill_manual(values =c("#ef8a62", "#d1e5f0", "#67a9cf")) +facet_geo(~ State, grid ="us_state_grid2") +scale_x_date(expand=c(0,0), date_breaks="7 year", date_labels ="%Y", limits =c(as.Date(c("1993-02-27", "2016-06-01")))) +scale_y_continuous(expand =c(0, 0)) +labs(title ="How Schools Were Paid For Over The Past 25 Years, 1993-2015",subtitle="The chart shows annual school funding shares in each state by federal, state, and local sources.",caption ="Data Source: US Census, Public Finance Survey \nNote: Figures are adjusted for inflation using the BLS Education Index.",x ="Year",y ="Percentage of Overall Funding") +theme_hc() +theme(text =element_text(color ="#4D4D4D"),plot.title=element_text(face="bold", size =40),plot.subtitle=element_text(size=24),plot.caption=element_text(size=14),legend.title=element_text(size =18, face="bold"),legend.text =element_text(size =18),legend.position =c(.85, 0.2),legend.box.just ="right",legend.key.size =unit(1, "cm"),axis.title.y =element_text(color ="#4D4D4D", face='bold'),axis.title.x =element_text(size =14, color ="#4D4D4D", face='bold'),strip.text.x =element_text(size =11.5, color ="#4D4D4D", face='bold'),strip.text.y =element_text(size =14, color ="#4D4D4D", face='bold'),strip.background =element_rect(fill ="#ffffff", color =NA)) +guides(fill=guide_legend(direction ="vertical", title.position ="top", label.position ="right", byrow=TRUE, title="Source of Funding"))# Save The Plot as a .png fileggsave("SFNationalMapOT.png", plot = gg, device ="png", dpi=300, width =20, height =12)
Bonus School Finance Map
While I had limited time to produce the code-through for the first map, here’s an extra map that shows how much each state’s per pupil funding compares to the national average since 1992. One note about this one is that it uses the All-Consumer series inflation index from the Bureau of Labor & Statistics for 1992, while the 1993 data onward uses the BLS Education Index.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Footnotes
Before 1992, the method of collection for the census didn’t disaggregate federal revenues from state revenues because federal dollars were technically implemented through the state. However, since 1992, the census shifted their reporting protocols for states and began requesting that each state delineate between their contributions versus what the federal government provided.↩︎