Proving Ground was my final, and arguably most important project, at the Center for Education Policy Research that inspired me to pursue a doctorate in Information Systems. The overarching goal of the project is to help school districts and charter management organizations evaluate programs they are implementing in their schools. Given the wealth of data school districts have, they should be able to know whether the programs they invest in are working for their students. Though seemingly simple, there are major methodological and technical barriers to accomplishing this goal. My team is innovating in the areas of research design, data visualization and data cleaning to make this vision a reality.
In the first stage of the project, we evaluated multiple educational software products in three large urban school districts and ten charter management organizations. One purpose of the pilot was to test how well different research designs perform in different contexts. Because education programs, specifically educational software, are not assigned to students at random, it is difficult to discern whether the effect of a program is due to the program itself or other observable or unobservable factors. For example, students who use certain software products may also have teachers who are more effective; if this is the case, it is more difficult to know whether increases in student test scores are due to the software or students’ teachers. I was responsible for testing how different matching methods performed under different treatment assignment scenarios in different agencies (see this paper by Gary King for similar research).
I also led the effort to develop data dashboards for our partners to monitor the implementation of their programs. When researchers find education programs to be ineffective, there are often questions about whether the program would have been effective if the program was implemented at a greater dosage or more uniformly across classrooms. For example, to best assess educational software, students must use it for a reasonable amount of time each week and make satisfactory progress through the curriculum. Our dashboards help district leaders and principals view the usage of the software at the school and classroom level so they can intervene when teachers are not using the software adequately. I developed prototypes for the data visualization in R Shiny (see also code) and will be led a development team to complete the work for the partners.
Finally, my team and I have also put in a great deal of effort to clean and standardize the data from our school district partners. Though data cleaning is not the sexiest of topics, having clean data is vital to a successful quantitative research project. Education data is uniquely challenging in that it is usually housed in disparate databases and it is up to savvy researchers to clean and connect it together. Data on teachers from a human resources database, for example, may be structured completely differently from data on students from a student information system. In typical research projects in education, researchers clean and standardize data from one district at a time; my team has completed thirteen agencies so far and may expand to dozens more. We have made this possible through our many years of combined experience with education data, an eye for detail, and love of programming.
Proving Ground is a five-year project on a grant from the Bill and Melinda Gates Foundation. My team aims to make the project self-sustaining. School districts will pay a modest fee to access our web-based tools and keep the project going. To read more about Proving Ground or become a partner, please visit our website.
I was one of the lead developers on the OpenSDP project, which is an open source website and code repository created to foster collaboration among education analysts and researchers. You can learn more about OpenSDP by visiting our website or by watching the plenary conference session below. (Note: I begin speaking at 13:45.)
I am the lead instructor on the college-going strand at the SDP Institute for Leadership in Analytics. In this weeklong professional development workshop, education analysts from around the country strengthen their skills in programming, statistical methods, and problem solving. I teach the hands-on lab session where my students learn to use the statistical software, Stata, to explore and analyze education datasets. The SILA experience is a once in a lifetime opportunity for analysts to network and grow their skills. Many of my students have said that it was a transformative experience. To learn more about SILA or to sign up, visit our website.
In the 2016-17 school year, I worked as a consultant with Helix Learning Partners on a data visualization project that helps principals collaborate and learn from their data. These principals are part of the Turnaround Principal Project, a program that helps to develop school leadership at high needs schools in Connecticut. My role on the project was to design and implement a data dashboard that allowed principals to view their school’s key metrics over time and compare them to other schools in the program. In collaboration with Sarah Birkeland, I also helped to develop innovative data collection methods to obtain monthly data from the principals on student and teacher absences, teacher feedback visits, and student disciplinary actions.
In 2015, the Center for Education Policy Research surveyed teachers in five states about how they have changed their teaching in response to the Common Core. We then linked these responses to administrative data from the states to see how teacher practices correlated with student achievement. I used value added modeling to analyze the association between teacher practices and test score gains on the common core assessments. The project also involved a massive amount of data cleaning and management to obtain consistent data for five states. I led the efforts to standardize data cleaning across the states and personally cleaned the administrative data for the state of Massachusetts. Read the research report here.
I worked on the research for the Strategic Data Project (SDP) from October 2013-March 2016. Though the research portion of the project has ended, I continue to be involved in the project as an instructor for the teaching and learning program (see SILA). The goal of the project was to provide school districts with actionable, high-impact analyses in the areas of college readiness, high school graduation, college enrollment, college persistence, teacher retention, teacher turnover, and teacher evaluation. Below I highlight some of my work on research questions in these areas.
Historically, the Strategic Data Project’s work on human capital has focused on teacher retention, teacher turnover, and teacher placement. Pittsburgh Public Schools was interested in teacher placement and whether all students have access to effective teachers. Specifically, the district wanted to know if there is a link between effective teaching and college enrollment outcomes. This human capital project also intersected heavily with our research strands in college-going and teacher evaluation.
Previous research has shown that effective teachers create positive outcomes for students later in life. Chetty et al. (2014) find that having a teacher that successfully impacts student test score growth (value-added) in grades 3-8 positively impacts college attendance and earnings. My research with Rodney Hughes corroborates and extends this finding by looking at the outcomes associated with highly effective teachers, as measured by teacher evaluation measures other than value-added. We found that high school students, if taught by more teachers who scored highly on their classroom observations, have an increased likelihood of college enrollment. This relationship persisted after accounting for a rich panel of student and class characteristics. My coauthors and I received positive reviews at a research conference for this work and have submitted our paper for academic publication.
In addition to contributing to the literature on this topic, our paper is highly relevant to education policy practitioners. We were able to assess the impact of effective teaching using data that is readily available to many school districts. By showing the association between teacher evaluation measures and college-going outcomes, districts and states can use our methodology to provide external validity for teacher evaluation measures.
The goal of SDP’s work on college-going is to provide a complete descriptive picture to our partner school districts of which students are achieving post-secondary success. One strand of research that has really resonated with our district partners examines the college choices of well-prepared students of different racial/ethnic backgrounds. I show a basic visual representation of the results from two of the partners we worked with below. Among high school graduates who scored in the top quartile on the state assessment in eighth grade, African American students enroll in college at the same or higher rates in both the state of Kentucky and the Cleveland Metropolitan School District. Latino students who scored similarly on the state assessment attended college at lower rates.
It is important to remember, however, that African American and Latino students are disproportionately represented in the lower quartiles of achievement. Much more work is necessary to ensure that all students, but particularly minority students, are prepared for postsecondary success. Nonetheless, many of our state and district partners are surprised by the relatively high rates of college attendance among well-prepared African American students.
Teacher observations are typically a substantial component of a teacher evaluation system. Teachers are observed by a peer, principal, or outside observer (depending on the district and state) and assigned a score using a rubric. The person doing the evaluation of the teacher may hold biases based on race, gender or other characteristics that cause teachers of different demographic groups to get systematically lower scores. In two states that we worked with, we found that male teachers and teachers that were not White, received lower teacher observation scores. However, these teachers also tended to work with more difficult, lower-achieving student populations. Descriptively, I controlled for the teacher’s value added measure (based on test scores), classroom characteristics and school of the teacher to see how much these factors accounted for the variation between teacher observation scores. Controlling for these influences substantially shrunk the gap between teachers, though the differences between groups remained statistically significant.
Ultimately, my team and I did not feel comfortable concluding that there were significant biases in the teacher evaluation system. After all, there could be differences between teachers that we could not observe in our data. However, these findings are certainly puzzling and will hopefully be explored in future research.
Teacher evaluation systems are typically made up of multiple measures or components, including but not limited to an observation or multiple observations and a growth or value added measure based on test scores. Over the course of our research into evaluation systems at different school districts and states, we have observed a pattern in the distribution of scores—observation measures tend to have a much narrower distribution of scores than growth measures. Assume that an evaluation system has a 1-5 scale for both growth and observation, for example, then the distribution of scores on the growth measure would resemble a bell curve with the median teacher receiving a score of three. In contrast, teacher observation measures tend to be heavily skewed, with most teachers receiving the highest two ratings and almost no teachers receiving the lowest rating. Additionally, in most teacher evaluation systems, observation scores account for a greater share of the overall evaluation score; for example, the growth score may account for 25% of the total score and the observation score may account for 75%.
Paradoxically, however, because there is more variation in the growth score, the growth score nonetheless accounts for a much higher share in the variation in the overall score. Take this most simple example—say all teachers received the exact same score on their teacher evaluation, but different scores on their growth measure. Even if their overall scores were nominally weighted as the previous example given (25%/75%), the growth score would account for all of the variation between teachers. Put another way, whether a teacher was in the 99th percentile (top) in terms of teacher effectiveness or the 1st percentile (bottom) would be entirely dependent on the growth score, even though it was given a smaller weight for the overall score. We refer to the share of variation between teachers explained by a component measure as that measure’s “effective weight.”
My colleague, Kate Klenk, and I expanded the literature in this area under the supervision of Andrew Ho in his course on Psychometrics at the Harvard Graduate School of Education. We were interested in the impact of discretizing the measures on their effective weights. Using reported distributions of measures from four states, we find that discretization of the components has a much greater impact on effective weights than the discretization of the composite score.