In Chicago, the status of every public elementary school is determined during a 40-minute period in early May. That’s when children in 3rd through 8th grade take a multiple-choice test of reading comprehension. Depending on their grade level, they read up to seven passages ranging from a few sentences to a full page and answer 36 to 49 questions.
When time is up, answer sheets are collected and hand delivered to central office. There, staff work around the clock—stacking sheets, feeding them through scanners, recording the results on tapes that are run through a mainframe computer, downloading the data to a high-powered work station in the research and evaluation department, and tabulating the results by student, by classroom and by school.
For schools, the crucial calculation is the percent of children scoring at or above the national norm, or average. On the basis of that calculation alone, 84 elementary schools (and 39 high schools) have been put on academic probation, rendering their principals and local school councils subject to dismissal.
In March, the independent Consortium on Chicago School Research issued a report that, in effect, put the accountability program on probation. Scores on the Iowa Tests of Basic Skills (ITBS), the nationally standardized test Chicago uses, are “crude and sometimes seriously biased indicators” for judging the productivity of individual schools, the Consortium argued.
The Consortium agreed that the test legitimately identified the city’s worst schools, those in which less than 15 percent of the students scored at or above norms. “We looked at the probation schools, and by almost any imaginable cut, they looked pretty bad,” says John Easton, who resigned as the board’s research director in 1997 to become deputy director of the Consortium. “Even the growth, even the gains, even the productivity is pretty low.”
The Consortium also agreed that probation has forced schools to pay more attention to how and what they teach. “In the short term, that’s been productive,” says Consortium Director Anthony Bryk.
Beyond that, though, the ITBS and the Reform Board’s current use of it are seriously inadequate for measuring schools’ progress and, worse, can propel schools to adopt practices that shortchange their students, the Consortium says.
“Now that you’ve got [schools’] attention,” says Bryk, “it becomes important that you get good signals and indicators in place. You want tests that test what you want children to know and be able to do.”
Schools chief Paul Vallas has zigzagged on the ITBS and its high school counterpart, the Tests of Achievement and Proficiency (TAP). In January 1996 he told CATALYST the board would develop new academic standards that would be accompanied by a “new assessment system, to replace the Iowa Tests.”
The next month, Chief Education Officer Lynn St. James, now retired, talked along the same lines at a news conference. The Chicago Tribune, for one, pounced. In a front-page story the next day, it accused the board of “conceding that Chicago public students are so far behind, the national tests can no longer be used as a measuring stick” and suggested that local tests “could be manipulated to artificially inflate achievement.”
Vallas insists that he never made a decision to drop the ITBS, that it was merely a possibility. The board will keep the ITBS, he now says, “because it makes sense,” not “because it’s politically correct.”
The test gives the system not only a historical reference point for tracking improvement, but also the ability to compare Chicago students against their peers nationally, he reasons.
That’s the popular perception; however, neither the ITBS nor any test like it can easily accomplish both goals. While the ITBS is a nationally standardized test, it is not given to every student in the nation. Rather, it is first given to a group of students the test publisher deems to be representative of the nation—a group that includes students from different states, cities, and small towns, and from a cross-section of racial and economic backgrounds. It is this group that forms the basis for the national average, or norm, or grade level. The norm group that Chicago uses as a basis for comparison dates back to 1988.
Chicago’s ITBS reading and math scores both have risen since then, meaning they’ve gotten close to the ITBS norm that was established in 1988. However, another test, the National Assessment of Educational Progress (NAEP), indicates that in math, Chicago is not alone in its progress; the rest of the country is doing better, too. Thus, if Chicago switched to a more recent ITBS norm, its math score likely would drop because its relative standing had not changed. This is less likely in reading because, according to NAEP, reading achievement has changed little since 1988.
A 1987 study by a West Virginia physician and education activist cast a bright spotlight on this form of grade inflation. The study by Dr. John Jacob Cannell found that most school districts and all 50 states were reporting above-average scores on norm-referenced achievement tests. Later research discovered that outdated norms had inflated test scores. This phenomenon came to be known as “the Lake Wobegon effect,” for Garrison Keillor’s fictional town in Minnesota “where all the children are above average.”
The School Reform Board has begun developing tests for primary-grade and high school students that are based on local standards. In April, it approved a $500,000 contract with the National Center for Research on Evaluation, Standards, and Student Testing, based at the University of California at Los Angeles, to help it along—the money came from the John D. and Catherine T. MacArthur Foundation. Tests for grades 3 to 8 would be in four subject areas—math, language arts, science and social studies.
Vallas says the new system will be in place by 2001, at which time “we will move away from such a heavy reliance on the Iowas.”
The Consortium’s Bryk says this direction “sounds right. That is basically what we’ve recommended.” In the meantime, he adds, the board still needs to rethink the way it looks at test scores.
For one, judging schools by the percent of students scoring at or above national norms can be grossly misleading, the Consortium points out. For example, zeroing in on a test’s midpoint would fail to credit a school for substantial growth among students who, nevertheless, still fall short of the midpoint. Similarly, it would fail to fault a school for substantial slippage among students who, nevertheless, stayed above the midpoint.
Bryk says that the national-norms standard also promotes “educational triage,” where schools expend the most effort on kids who are close to the norm and ignore the rest.
Probation partners and others who work in schools say they’ve seen this happen.
“You take a group of kids near the cutoff, and you spend all your time and efforts trying to get them up a notch. That’s a game that many principals are playing,” says Michael Klonsky of the Small Schools Workshop at University of Illinois at Chicago.
Donald Moore of the advocacy group Designs for Change says he observed an elementary school reassign Reading Recovery tutors from its lowest readers—whom the program is intended to serve—to kids just below the norm.
The Consortium recommends that account-ability be tied, instead, to the school’s average score, which encompasses the achievement levels of all children in the school.
The Consortium also recommends that accountability be tied to the average annual growth of individual students in a school, not just the school’s absolute score. Otherwise, “automatically your middle-class schools are called good, and your poor schools are called bad,” notes Easton, referring to the strong link between income levels and test scores. “You want to judge schools on how much they teach kids, not just the population of students they enroll.”
The Consortium goes on to report, however, that the ITBS is not designed to measure year-to-year progress. Its analysis shows that individual students, grade levels or schools—indeed, the entire system—may register gains or losses due to nothing more than changes in the form, or version, of the test being used and the grade level being measured.
For security reasons, Chicago uses a different form of the test each year. Also, students take a different level of the test as they progress from grade to grade. But the forms and levels are not comparable, the Consortium found.
The researchers gave a sample of 3rd-graders the math and reading tests from both 1990 and 1991; students were far more likely to score better on the 1990 test. Further, 3rd-graders were seven times more likely to get a higher “grade”—that is, score better relative to the norm—on the 1991 3rd-grade test than on the 1990 2nd-grade test.
H.D. Hoover, senior author of the ITBS, points out that forms and levels of the test are comparable from year to year for a national sample, but he agrees they’re not comparable for a regional sample. The reason, he explains, is that curricula differ from place to place and a particular test form may favor one curriculum over another. The Consortium’s findings don’t surprise him.
The Consortium says the School Board can achieve both test security and year-to-year comparability by developing tests that have some questions in common each year, and excluding those questions from the scoring.
Finally, the Consortium recommends that when calculating school scores, the board count only students who have been enrolled in the school for a certain number of days. That way, a school doesn’t get penalized by a low-scoring student who transfers in a month before testing.
Beyond these important technicalities, the Consortium and other critics of the board’s accountability program say that it tends to force schools to focus on the relatively few skills that are tested and give short shrift to the rest of the curriculum. For example, the board’s own language arts standards for 3rd-graders call on them to give effective oral presentations, conduct research projects and write multi-paragraph essays—none of which is tested by the ITBS.
“If the system is going to have a high-stakes test that doesn’t match the standards they say are important—you’re just confusing people,” says Easton. “What you’re really doing is saying the standards don’t matter.”
The ITBS’s Hoover agrees that too much emphasis on test preparation can lead teach-ers to narrow the curriculum, possibly leaving out, for example, the reading of novels.
“We can’t in a 40-minute period [measure] things that kids only get from reading an entire book— personal interactions with the text, the things that they think, the things that they feel,” he says. “These are important aspects of reading that the test can’t measure.”
Tests geared to local standards—which flesh out more general state goals—would provide a more well-rounded picture of the learning taking place in Chicago classrooms, the Consortium and many school reformers believe.
A number of school reform activists have rallied around the Consortium’s report. “It backs up what we’ve been saying all along,” says Sheila Castillo of the Chicago Association of Local School Councils. “The Iowas are so bad, we need to find a better way to measure.”
On May 2—the eve of the Iowas—seven reform groups and the Consortium sponsored an open meeting on student testing, and more are upcoming. “I don’t know what the answers are,” says Castillo. “But I think the system needs to be focused on [finding] what the answers are.”