This year two major education-related levies expire and will likely be put up for renewal: the Families and Education Levy, and the Seattle Preschool Levy. The preschool levy was passed in 2014 to fund a four-year “demonstration program” to offer preschool to 3 and 4 year old children in Seattle, hopefully proving its viability and discovering what it would take to scale it up.
This morning, the City Council started its process of looking at the two levies in anticipation of crafting renewals for the fall ballots. It may choose to combine them into a single levy, or opt to keep them separate if there is a risk that one would drag the other down — or if the complexities of trying to split the funds between City Hall-run programs and the politically separate Seattle Public School District become overwhelming. City Hall runs the Seattle Preschool Program through its Department of Education and Early Learning, independent of the school district (though in some cases utilizing classrooms at public schools).
Earlier this week, KUOW published an article raising questions about the quality of the Seattle Preschool Program, based on the second annual outside evaluation of the program. The article reports that the program is seeing “mixed results.” Let’s dive into the study report and see what it says.
The multi-year study is performed jointly by Rutgers University and University of Washington. In 2016 it released a “year one” report on the 2015-2016 academic year, and this past November it released its “year two” report with updated numbers. We’re going to focus on the second study, as it includes a fair amount of comparison with the first-year numbers.
The study attempts to assess three things:
- the quality of the preschool classrooms, on a number of factors detailed below;
- how well children in the program are learning;
- how children in the program fare compared to similar children not participating in the program.
Here’s how the ECERS-3 test is described:
The ECERS-3 is an observation and rating instrument for preschool classrooms serving children aged three to five. The total ECERS-3 score represents an average of the scores on the 35 items under 6 domains. A rating scale between 1 and 7 is used, where a rating of 1 indicates inadequate quality, a rating of 3 indicates minimal quality, a rating of 5 indicates good quality, and a rating of 7 indicates excellent quality.
Here are its component parts:
CLASS, on the other hand, is focused on assesssing specific classroom practices. Here’s the description:
The CLASS is an observational system that assesses classroom practices in preschool and kindergarten by measuring the interactions between students and adults. Observations consist of
four-to-five 20-minute cycles, with 10-minute coding periods between each cycle.
And here is CLASS’s component parts:
To assess students’ learning, they implemented separate assessments for vocabulary, literacy, math, and executive function.
For the third part, comparing SPP students to a “control” group not participating in the program turned out to be challenging. The study organizers acknowledged the challenges and that the group they constructed was actually a poor comparison. It could have been useful to compare SPP students to students in other preschools, and SPP students to children with similar demographics but not attending preschool. But the group they ended up constructing was a mix of both and with different demographics. I spent a fair amount of time poring over this section of the report looking for useful insights, but in the end I concluded that the poor construction of the comparison group invalidated any comparisons and chose to simply ignore this part of the report. So let’s focus on the first two parts: the classroom assessment, and the assessment of learning.
Between Year 1 and Year 2, the program scaled up considerably, from 228 students in 15 classrooms to 627 students in 33 classrooms. We can imagine the challenges this could present: opening up 18 new classrooms and hiring new staff. The 15 classrooms continuing on, and the returning students, may in fact be doing better than the newer ones, but they all get mixed in together in the study results. However, we could also expect that the leadership of the SPP applied key learnings from its first year to smooth out some of the wrinkles in opening up new classrooms, and has probably improved its administrative processes for all classrooms and students. This will all color the assessment results.
So let’s look at the classroom assessment results. Here are the ECERS numbers for both years, represented as a rating on a scale of 1 to 7 (7 is best):
(SD is “standard deviation.” As a reminder, this means that 80% of the classrooms were rated within one standard deviation of the mean rating, and 95% were within two standard deviations)
Overall we can see a year-over-year improvement. Interaction and program structure are relatively strong; learning activities and personal care routines are relatively weak. Overall, nothing stands out as outstanding or deeply deficient. When you look at the distribution of individual classroom scores, you see a good sign: as a group they moved up incrementally.
Here are the CLASS numbers, also on a scale of 1-7:
Here we see clearer strengths and weaknesses: emotional support is a strong practice, and instructional support is weak — though both improved incrementally year-over-year. Classroom organization dropped a bit, though it’s probably not statistically significant given the jump from 15 to 33 classrooms and the small number of classrooms altogether.
But it’s hard to know whether these numbers are good or bad, until we compare them to other preschools. Fortunately, the research team did that, and presented comparable ECERS assessments for Georgia, a UW statewide study, Pennsylvania, and New Jersey. The New Jersey preschool program is well-established and mature; the others, not so much, and in some ways better apples-to-apples comparisons for Seattle’s nascent program — though the NJ program sets an aspirational goal, and speaks to what a theoretical ceiling might be for quality of a preschool program operating at scale.
The CLASS assessment is compared to studies in a number of different cities and states:
The big insight here is that The Seattle Preschool Program’s ECERS-3 and CLASS assessments are on par with peer programs — including the low ratings for personal care routines, learning activities and instructional support. That speaks more to the challenges of implementing those aspects of preschool programs than of particular successes or failures in Seattle; all preschools struggle with the same issues. On the flip side, Seattle’s program strengths don’t particularly distinguish it from its peers either, although it seems to be off to a stronger start for interaction and program structure. The Seattle Preschool Program is squarely in the middle of the pack.
Does class size matter? Not really.
How about student demographics? Nope.
And which agency is running the classroom? (SPP contracts out classrooms to external providers) Here we actually see some differences, though all of them are strong in providing emotional support to the children. The bigger differences are in instructional support, and to a lesser extent classroom organization.
Now let’s look at the four student learning outcomes assessed: vocabulary, literacy, math, and executive function.
For vocabulary, literacy, and math, students are assessed and then their raw score is normalized to compare to children of the same age (as measured in months). A score of 100 is the mean, and the scores are distributed with a standard deviation of 15 (i.e. 80% of students have scores between 85 and 115). The research team assessed each student twice, at the beginning and end of the academic year. However, since the children aged over the course of the year, the goalposts shift: a score of 100 at the end of the year represents a larger vocabulary than a 100 at the beginning. What that means is that if the SPP scores for a classroom or a demographic group go up over the course of the academic year, then learning happened faster than is typical; or if the number went down, learning happened more slowly. We might also see students catch up to their peers, or lose ground.
With that context, let’s look at the numbers for the 2016-2017 academic year. The vocabulary stats, broken out by demographic groups and by how well classrooms did on the ECERS and CLASS assessments:
And the literacy stats:
There’s good news here: numbers went up almost universally, and groups that started out behind started to catch up. Let’s look closer at the size of the changes over the course of the year:
There are some really interesting things in here. Asian and bilingual students, 3 year olds, and students living below the federal poverty line saw the biggest gains. And surprisingly, the worst-rated classrooms saw the biggest gains.
Similar results for literacy: 3 year olds, black, and bilingual students, and those below the poverty line, gained. Classes with lower emotional support saw bigger gains. Compared to the previous year, white students saw much smaller gains. It’s important to note that no demographic did poorly; none fell back. Rather, the statistics show that there appears to be particular emphasis on helping some historically underserved groups. And the classroom assessments don’t seem to correlate with how well preschool students learn vocabulary and literacy.
Here are the math assessments for the 2016-2017 acedemic year:
A lot of very strong results, especially for 3 year olds and students of color. But it’s a truly dramatic turnaround from year 1, when students fell behind across the board:
White students lost momentum, but they started out well above the mean (113.1, almost a full standard deviation) and still ended up well above (110.3). Again, intuition fails us: the classrooms with better ECERS and CLASS assessments didn’t deliver better results. Smaller classes did, as one might expect.
That brings us to executive function. The research team used two different tests: DCCS, an “attention shifting test;” and “Peg tapping,” which is a test of being able to manage two different rules simultaneously. DCCS isn’t normalized with age; it’s just a raw score. The mean score for the average age of study participants is 1.42 at the beginning of the school year, and 1.62 at the end — an average gain of .20 over the year.
Similarly for peg tapping, the mean raw score is 6.02 for the average age SPP student at the beginning of the year, and 8.80 at the end, for an average gain of 2.78.
In both cases, we see steady improvement across the board. The DCCS gains hover right around the mean (.20). The peg tapping gains are generally a bit below the mean (2.78).
The two executive function tests have a couple of strange anomalies: ECERS rating signaled bigger gains on DCCS, but smaller ones on peg tapping. And Hispanic children and those below the poverty line saw smaller gains on peg tapping, but not on DCCS. But there are no red flags here; perhaps a “yellow flag” for the peg-tapping test (which is hardly a comprehensive assessment of executive function on its own).
One last note across all of the student assessments: of the two preschool curricula adopted by SPP classrooms, Creative Curriculum outperformed HighScope on literary and math assessments and matched it on the others. If that trend continues, it suggests a clear preference between the two.
So what are the big takeaways from this study? First, the program is hardly failing, or even delivering “mixed results” as KUOW suggests. The assessments suggest that it’s on par with peer programs, neither significantly better nor significantly worse. Students who start out behind their peers are catching up; those who are in the middle of the pack are keeping pace. The leadership of SPP has clearly decided to focus on emotional support for the students, and on extra attention to historically underserved populations. After a dicey first year, students now seem to be learning at a typical pace across vocabulary, literacy, math, and executive function. There are clearly identified areas of improvement for classrooms; that said, there is little evidence so far that the metrics measured by the standard classroom assessments actually correlate with better student performance, so investing in those improvements may or may not yield results.
In short: the program is working as advertised, it’s helping to address some inequities in our society, and it looks to be worth renewing the levy funding so the city can continue to scale it up.
Keeping this blog going takes an enormous amount of time and effort — not to mention out-of-pocket costs. If you find these posts valuable, please consider supporting my work by making a contribution on my Patreon site. Thanks!