Teacher evaluation pilot shows promise

When using a new, more detailed teacher evaluation process, principals mostly got it right as their ratings lined up with value-added student test scores, according to a University of Chicago Consortium on School Research study released Tuesday.

While researchers say this is evidence that the new evaluation works, the Chicago Teachers Union’s Carol Caref says the results prove that test scores don’t need to be factored into teacher evaluations in order to make them accurate.

“If there is a strong correlation, then why not rely on teacher practice and principal observation?” she says.

The use of test scores, or growth in test scores, as part of a teacher’s job evaluation is a hot issue in the education world. But in Illinois and other states, it’s an issue that has been decided. The Performance Evaluation Reform Act, passed in 2010, requires that by 2013, test scores count for a significant portion of a teacher’s evaluation. The bill also requires a new evaluation process.

Researchers did not give teachers an overall mark. But using a formula from another urban district, about a third of the teachers in the pilot would be rated unsatisfactory, 42 percent would be satisfactory and a quarter would be excellent.

These percentages would be a big change from the current CPS teacher evaluation, in which fewer than 1 percent of teachers have been rated unsatisfactory and more than 90 percent rated superior or excellent.

The Consortium researchers studied the use of the Charlotte Danielson framework, a well-regarded evaluation tool, in about 100 schools over a two-year period. Last year, CPS also tested the Teaching and Learning Framework, an evaluation tool that is modeled on a tool used in Washington, D.C. public schools and was developed in the district by principals, teachers and CPS officials.

CEO Jean-Claude Brizard has said that he likes the Danielson framework and insiders say that may be the way his administration is leaning.

What ratings show

The Danielson framework is more detailed than the state’s current checklist format, and includes paragraphs describing what teachers rated in each category should or should not be doing.

For example, a distinguished teacher—which is the highest rating—in the area of classroom management would involve students in determining the behavior standards for the classroom and monitoring them. A proficient teacher–the second highest rating—would make behavior standards clear and monitor them. An unsatisfactory teacher—the lowest rating—would establish no standards and respond to bad behavior by shaming the student.

“We are seeing a paradigm shift from a checklist to describing practice in a specific way,” says Sara Ray Stoelinga, who was one of the study’s authors, speaking at an Education Writers Association conference on Saturday.

Caref, who was surprised when the district began piloting its own framework, notes that the Danielson framework has been around for more than a decade and is well-vetted.

Consortium researchers did not give teachers in the pilot an overall rating, but instead focused on the various aspects of instruction and classroom management in which teachers were rated and analyzed whether their value-added test scores matched up.

Across almost all of the Danielson Framework components, teachers with the lowest ratings had the lowest value-added test scores and those scores increased as the teacher’s rating increased, according to the report.

But this pattern didn’t hold true in all instances. For reading and math scores, the pattern did not hold in the areas of “creating an environment of respect and rapport.” For math scores, the patten did not hold true for “managing classroom procedures” and “organizing physical space.” (Researchers note that few principals gave teachers low marks in these areas.)

Also, researchers found that the ratings and value-added test scores were more likely to align in the area of instruction than with classroom management.

The proof that the framework can identify good teaching is helpful for another reason: Students aren’t tested in subjects such as social studies and art, and these teachers don’t have classroom test scores on which they are to be measured.

Rather than create tests in these subjects, the validity of the framework means that it can be counted on to pinpoint who are good teachers, said University of Chicago Urban Education Institute Director Tim Knowles.

Another question researchers sought to answer is whether principals have the ability to properly evaluate teachers. The key finding is that principals reliably peg bad and middle teachers, but were more likely to peg teachers as distinguished, when outside observers gave them only proficient marks.

Almost 30 percent of principals were either more severe or more lenient than outside observers. Yet, the researchers suggest that the principals were more likely to be right than the outside observers. Those with distinguished ratings had higher value-added scores than those with proficient. “Maybe the principals know something the outside observers don’t,” Stoelinga said.

Still, when CPS adopts a new framework, some checks will have to be built in so teachers are not adversely affected by principals who are more severe than others.

“If principals are too severe, this could affect whether a teacher gets tenue or not,” she said.

The Consortium report includes an example of a principal who did not embrace the framework and it suggests that if principals aren’t trained properly, the framework would become too subjective.

While the new evaluation will play a big role in whether teachers get tenure and keep their jobs, researchers and experts note that its best use will be to help teachers improve.

“It gives teachers and principals a shared language about good practice,” Stoelinga said.

But whether principals will be able to help teachers improve with the evaluation depends on the training and support they are given. The researchers found that the conversations that principals and teachers had with the new framework were more reflective and based on concrete evidence. However, they note that the principals did most of the talking and that, for the most part, they didn’t ask teachers hard questions.

Teacher evaluation pilot shows promise

More Reads by TCR

Sarah Karp