Mistakes made with teacher evaluation scores, CPS admits

Citing a computer coding error, district officials have acknowledged that they miscalculated last year’s REACH performance task scores for one out of every five educators.

Only a tiny fraction of the 4,574 errors were significant enough to result in ratings changes, however. A total of 166 teachers were given corrected ratings earlier this year, and most moved up a category, CPS officials say. Teachers whose ratings dropped won’t be penalized.

The coding error involved matching student rosters with scores on performance tasks, the subject- and grade-specific assessments that were developed by committees of CPS teachers.

Though the problem was not extensive, the number of mistakes – and the possibility that there are still others – has renewed criticisms about the use of such a complex system to evaluate educators and put jobs on the line.

On the front end, the REACH (Recognizing Educators Advancing Chicago’s Students) evaluation system relies on teachers and principals to do classroom observations, administer tests and input data into a computer system. On the back end, REACH relies on a web of CPS departments, custom-designed software and third-party vendors to calculate student improvement on assessments, verify class rosters and tabulate final ratings.

“There needs to be an effort to reduce the complexity so teachers are getting fair, accurate, timely feedback,” says Jen Johnson, the CTU’s special coordinator for teacher evaluations. “This is serious stuff that leads to problems with potential job security.”

CPS Accountability Chief John Barker, whose office wrote the computer code that contained the error, says it’s now been corrected. He repeatedly stressed that the impact of the incorrect scores was minimal.

“Yes, that is something we are absolutely interested in as a district in getting it right. And getting it right means changing some scores,” Barker says. “What is important to know is that those are very, very minor issues. The number of changes in those scores that resulted in a teacher moving out of one category into another category is very, very small.”

It’s unclear, however, whether all of the errors have been corrected. Two teachers who spoke with Catalyst say their newly downgraded ratings no longer take all of their students’ growth on the performance task into account.

In one case, an elementary school teacher said her new score only counted half of the students who took the performance task at the beginning and end of the year. The other students — who she says were counted the first time around — are now missing from her data.

Similarly, an arts teacher at another school doesn’t understand why her updated summative evaluation report now says that none of her students’ performance task scores were counted toward her rating calculation. The “student growth” portion of her evaluation is now based entirely on her school’s NWEA metrics – and not the performance task growth.

“Now we’re about to issue the second round of REACH Performance Tasks task for this school year,” she said. “And I ask myself, ‘Why am I even doing this? Who is to say it’s not going to be lost next year?”

Similar problems elsewhere

Chicago isn’t the first school district that’s discovered coding typos or miscalculations after releasing new teacher evaluations based in part on student growth on test scores. Last year, for example, bad data submitted by school districts in New Mexico led the state to issue incorrect ratings for hundreds of teachers.

A year earlier in Washington, D.C., district officials changed the ratings for 44 teachers after it was learned that the third-party vendor hired to calculate value-added scores for teachers (how much “value” teachers added to their students’ improvement on tests) made a typo in a long string of computer code. One of those teachers was mistakenly fired as a result, though district officials later reinstated that teacher.

In 2010, Illinois lawmakers passed a state law that requires student growth to count toward evaluations for principals and teachers. Other states passed similar laws, under pressure from the Obama Administration’s Race to the Top education initiative.

Jessica Handy, the Illinois governmental affairs director for Stand for Children, a supporter of the law, says “it’s totally reasonable that there are going to be some issues when implementing any new big system.”

Handy, who was a policy analyst for the state’s Senate committee on education when the new law was being drafted, says the rating system is complex because it includes multiple measures of growth. “Part of it is a way of doing checks and balances,” she says. “That way, we’re not over relying on just one test.”

Most CPS teachers will be evaluated under REACH this year. In its first two years, only untenured and some tenured teachers were evaluated.

The CTU first suspected there might be problems with the 2013-2014 ratings last fall, when several teachers complained that their performance task scores did not seem accurate. Johnson says the CTU notified the district of the problems in the fall during routine meetings on REACH.

In February, she says, CPS officials acknowledged the coding mistake; in March, teachers were notified of evaluation changes.

“When we checked that code after the conversations with CTU, we realized there was a very minor correction that needed to be made,” says Barker, who described the coding error as “incredibly technical.”

Essentially, the CPS computer code that was supposed to calculate how performance task scores affect teacher ratings skipped over some students who took the assessment. Teachers are supposed to verify who is in their class through a third-party roster verification system run by Batelle for Kids, a national non-profit group that is one of many vendors that contribute to the REACH system.

On certain occasions, “what the code didn’t do is pick up exactly what had been entered by that roster verification process,” Barker said.

The same roster verification system is also linked to student scores on standardized tests to calculate the “value-added” portion of teacher evaluations. No similar problems have been reported in this case. CPS contracts with the University of Wisconsin to calculate the value-added metric.

Asking for clarity

Catalyst spoke with two teachers whose ratings changed after the glitch was discovered. They expressed frustration about the matter, but asked not to be identified due to the sensitivity about the subject.

The elementary school arts teacher, whose rating dropped from “proficient” to “developing,” said she asked her principal for help, emailed the district and called the IT Services Help Desk phone number listed in the initial CPS email notifying her of the change. It was only after reaching out to her union rep – and eventually connecting with Johnson from the CTU – that she finally was able to speak with someone at the district about her case.

But she still doesn’t know if the rating is correct.

“Never was CPS apologetic, flexible or clear about what happened,” she says. “I don’t think it’s fair to change my score without explaining anything.”

Another elementary school teacher says she’s worried about the school year ending without a resolution to her case. “It’s coming to the end of the school year. I don’t want to spend my summer chasing this down,” she said.

The teacher wishes that someone from the district would sit down with her, go through the hard copies she kept from last year’s performance tasks and explain the math behind her score.

“Maybe I’m wrong, but I would love for the district to sit down with me politely and explain where the mistakes were made,” she says. “This is my career we’re talking about, my profession, what I love to do.”

Mistakes made with teacher evaluation scores, CPS admits

More Reads by TCR

Melissa Sanchez