Constructing an assessment system: The options

People want tests to do it all: To measure student performance against a standard and against other students. To measure a few skills in depth as well as the entire curriculum. To pass judgment on schools and to help them improve teaching. Tests can do all these things, say the experts, but no one or two tests can do them all. To avoid non-stop testing, school districts need to set priorities and accept trade-offs. Here are some of the considerations.

Multiple choice or performance assessment?

Performance assessments, which use activities such as writing an essay or conducting a science experiment, are better than multiple-choice tests for measuring the depth of student learning, many educators feel.

“Certain questions are hard to ask in a multiple-choice format,” notes Carole Perlman, the Reform Board’s director of testing and evaluation. She cites, for example, asking students to contrast the viewpoints of two authors or compare characters in a story. Public speaking and library research skills also need a performance approach, she says.

Performance assessments have other advantages, advocates say: They can model worthwhile classroom activities that make “teaching to the test” an appropriate endeavor. In learning how to score the assessments, teachers also gain skills they can use in the classroom.

Critics call performance assessments unreliable—meaning that scores vary so greatly from one judge to the next that they can’t be used with confidence. Giving and scoring performance assessments clearly take more time and, thus, more money. Political pressure has led a number of states that experimented with performance assessments to drop them or scale them back.

In 1991, Kentucky introduced a new assessment system that included writing portfolios, short-answer questions, essays and individual and group activities. A 1996 report by RAND, Inc. found the tests had mainly positive affects on instruction. For instance, teachers reported that writing portfolios prompted them to become more innovative. This April, however, the Kentucky Legislature voted to return the emphasis to multiple-choice questions. Legislators said the 1991 assessment was unreliable and too time consuming.

“A lot of alternative assessments have come under attack because efforts to design them were done too quickly or not done with enough attention to the technical quality,” says David Niemi of the University of California at Los Angeles, who will assist the Chicago School Reform Board develop a new testing system.

Recent research has led to better scoring systems, he says. “The problem is no longer scoring reliability,” he says. “The key is training people appropriately.”

Get information on individual students or on how well the system is teaching its curriculum?

Research has found that when low test scores carry sanctions for schools, teachers tend to focus on the objectives that are tested. One way to guard against narrowing the curriculum is to ask different questions of different students, which allows more of the curriculum to be tested without increas-ing the time devoted to testing. This practice, called sampling, also allows for more complex performance assessments to be given.

On the downside, sampling doesn’t provide a comprehensive picture of the strengths and weaknesses of individual students. Scores likely would fluctuate more widely from year to year as different students are tested on different skills, and that would detract from measuring school progress from year to year.

In addition, students typically exert less effort on tests that don’t count toward grades or promotion.

Test yearly or less frequently?

To hold schools accountable for student gains, students need to be tested every year, says the Consortium on Chicago School Research. For example, a 4th-grader’s score would be compared to his 3rd-grade score; the “gain” scores for all the students in the school would then be averaged to come up with a school score.

However, if the Chicago School Reform Board follows through on plans to develop its own tests in the four core subjects and retain the Iowa Tests of Basic Skills, students at some grade levels could take nine standardized achievement tests a year, including the Illinois Goal Assessment Program (IGAP) tests.

How should success be judged?

Chicago looks at absolute test scores, the percentage of students scoring at or above the national norm in reading on the ITBS. Other school systems, like the State of Tennessee, look at whether individual students make a year’s gain in a year of instruction. Kentucky looks at whether a school achieves preset improvement goals.

While Chicago looks at school performance each year, some other systems take a longer view. Tennessee averages gains over three years; Kentucky judges school progress every two years.

Until this year, the Reform Board’s accountability program has encompassed only the lowest scoring schools. Now, all schools are being judged on the basis of year-to-year comparisons of test scores, which lead to designations of A, B or C. For example, schools that register an increase or no significant decrease from the previous year are put on the A list, which this year included 319 schools.

Principals say the board is making too much of year-to-year fluctuations. “Scores are fickle things,” says Karen Morris, principal of Saucedo Scholastic Academy in South Lawndale, an A list school. “There are some schools on the B list that I think are superior. But they started out with high scores and then dropped a few percentage points.”

One school landed on the C list after a program for gifted students moved to another location, throwing the school’s scores into a nosedive. The school, Burbank Elementary, gets regular visits from an Accountability Officer staffer. “He’s been out here several times with a checklist,” says Principal Hiram Broyls, who doesn’t mind the attention. “He gave the school a once over and said it should be cleaner. There wasn’t any-thing else he found wrong.”

Tennessee has a more sophisticated system for holding schools accountable, initiated in 1992. The Tennessee Value-Added Assessment System (TVAAS) uses a complicated statistical analysis to graph each student’s academic progress from grades 2 through 8, based on standardized test scores in five subject areas. School gains are also averaged over three years, “to factor out anything that might have happened by chance,” says William Sanders of the University of Tennessee, who invented the system. Further, a school’s annual score is based only on students who have been enrolled for at least 150 days.

Sanders says that he has found that the schools that earn the highest ratings are the ones that adjust their teaching to match each student’s ability level. “The teacher in 4th grade needs to know where kids left off in 3rd grade; otherwise gains will be poor,” he says. “[TVAAS ] is forcing more communication across grade levels than ever before.”

TVAAS also reports gains for students in each school by economic status, achieve-ment level, race and gender, so that schools can be held accountable for the progress of all students. San Francisco and Dallas are among city school systems that break down school data in a similar way.

TVAAS has shown that teachers tend to direct most of their attention to the lowest performers while above-average students slip back a little each year. “Some of the kids that are getting hammered the hardest are the above-average African-American kids in inner-city schools,” says Sanders. “They started out above average and by the time they get to 6th grade, they are below average because their gains have been retarded.”

On the other hand, a school with disadvantaged students occasionally will average more than a year’s gain annually and catch up to the pack. “It’s rare, but you certainly can observe it,” he says.

Do we need more than a test?

School systems use a variety of indicators to judge school success. Chicago opts for simplicity with a single test score. Other districts, including Philadelphia and Seattle, consider other numeric data such as attendance, promo-tion rates and dropout rates. A handful, including San Francisco and Minneapolis, use surveys or site visits to incorporate parent satisfac-tion or school climate.

“In order for communities and LSCs to hold principals and the school accountable, data has to be simple enough for people to understand it,” says Philip Hansen, the board’s chief accountability officer. Focusing on too many indicators takes attention away from the bottom line, he says: student achievement.

Anthony Bryk, a University of Chicago professor who heads up the Consortium, would hold schools accountable not just for student achieve-ment but also for serious efforts to improve achievement. With the sole focus on test scores, he says, schools opt for quick fixes like practice testing rather than substantial investments that typically have a slower rate of return, like professional development for teachers. Some schools also push out low achievers to boost scores, Bryk believes.

His solution: Get schools to focus on the long term. In addition to test scores, hold elementary schools accountable for professional development, the number of students held back and how well their graduates succeed in high school.

Prof. Paul Hill, director of the Center on Reinventing Public Education at the University of Washington in Seattle, cautions against using “soft data,” such as the quality of professional development or teacher collaboration. “There’s more room to argue,” he says, and more chance schools will put on a false front for a central office inspector.

More Reads by TCR