Tuesday, November 17, 2009

New ED Research Agenda Taking Shape

We’ve heard administration officials say that the stimulus programs provide a laboratory for ideas that can be built into the ESEA (aka NCLB) reauthorization, as well as into the reauthorization of the Education Sciences Act. So we are studying closely the RFAs, draft RFAs, and other guidance from the US Department of Education for stimulus programs such as Race to the Top (R2T), Investing in Innovation (i3), Enhancing Education Through Technology (EETT), and the State Longitudinal Data Systems (SLDS) looking for clues about the new research agenda. They are not hard to find.

As a general trend, there is no doubt that the new administration is seriously committed to evidence-based policy. Peter Orszag of the White House budget office has recently called for (and made the case for, in his blog) systematic evaluations of federal policies consistent with the president’s promise to “restore science to its rightful place.” But how does this play out with ED?

First, we are seeing a major shift from a static notion of “scientifically based research” (SBR) to a much more dynamic approach to continuous improvement. In NCLB there was constant reference to SBR as a necessary precondition for spending ESEA funds on products, programs, or services. In some cases, it meant that the product’s developers had to have consulted rigorous research. In other cases, it was interpreted as there having to be rigorous research showing the product itself was effective. But in either case, the SBR had to precede the purchase.

Evidence of a more dynamic approach is found in all of the competition-based stimulus programs. Take for example the discussion of “instructional improvement systems.” While this term usually refers to classroom-based systems for formative testing with feedback to the teacher allowing differentiation of instruction, it is used in a broader sense in the current RFAs and guidance documents. The definition provided in the R2T draft RFA reads as follows (bullets and highlights added for clarity):

“Instructional improvement systems means technology-based and other strategies that tools that provide

* teachers,
* principals,
* and administrators

with meaningful support and actionable data to systemically manage continuous instructional improvement, including activities such as:

* instructional planning;
* gathering information (e.g., through formative assessments (as defined in this notice), interim assessments (as defined in this notice), summative assessments, and looking at student work and other student data);
* analyzing information with the support of rapid-time (as defined in this notice) reporting;
* using this information to inform decisions on appropriate next steps;
* and evaluating the effectiveness of the actions taken.

It is important to notice, first of all, that tools are provided to administrators, not just to teachers. Moreover, the final activity in the cycle is to evaluate the effectiveness of the actions. (Joanne Weiss, who heads up the R2T program, uses the same language inclusive of effectiveness evaluation by district administrators in a recent speech).

We have pointed out in a previous entry that the same cycle of needs analysis, action, and evaluation that works for teachers in the classroom also works for district-level administrators. The same assessments that help teachers differentiate instruction can, in many cases, be aggregated up to the school and district level where broader actions, programs, and policies can be implemented and evaluated based on initial identification of the needs. An important difference exists between these parallel activities at the classroom and central office level. At the district level, where larger datasets extend over a longer period, evaluation design and statistical analysis are called for. In fact this level of information calls for scientifically based research.

Research is now viewed as integral to the cycle of continuous improvement. Research may be carried out by the district’s or state’s own research department or data may be made available to outside researchers as called for in the SLDS and other RFAs. The fundamental difference now is that the research conducted and published before federal funds are used is not the only relevant research. Of course, ED strongly prefers (and at the highest level of funding in i3 requires) that programs have prior evidence. But now the further gathering of evidence is required both in the sense of a separate evaluation and in the sense that funding is to be put toward continuous improvement systems that build research into the innovation itself.

Our recent news item about the i3 program takes note of other important ideas about the research agenda we can expect to influence the reauthorization of ESEA. It is worth noting that the methods called for in i3 are also those most appropriate and practical for local district evaluations of programs. We welcome this new perspective on research considered as a part of the cycle of continuous instructional improvement. — DN

Tuesday, September 22, 2009

Research as Innovation

Many of us heard Jim Shelton, the ED Assistant Deputy Secretary for Innovation and Improvement, speak to the education publishing industry last week about the $650 million fund now called “Investing in Innovation” (i3). Through i3, Shelton wants to fund the scaling up of innovations having some evidence that they’re worth investing in. These i3 grants could be as large as $50 million.

With that amount at stake, it makes sense for government funders to look for some track record of scientifically documented success. The frequent references in ED documents to processes of “continuous improvement” as part of innovations suggest that proposers would do well to supplement the limited evidence for their innovation by showing how scientific evidence can be generated as an ongoing part of a funded project, that is, how in-course corrections and improvements can be made to the innovation as it is being put into place in a school system.

In his speech to the education industry, Shelton complained about the low quality of the evidence currently being put forward. Although some publishers have taken the initiative and done serious tests of their products, there has never been a strong push for them to produce evidence of effectiveness.

School systems usually haven’t demanded such evidence, partly because there are often more salient decision criteria and partly because little qualified evidence exists, even for programs that are effective. Moreover, district decision makers may find studies of a product conducted in schools that are different from their schools to have marginal relevance, regardless of how “rigorously” the studies were conducted.

The ED appears to recognize that it will be counter-productive for grant programs such as i3 to depend entirely on the pre-existing scientific evidence. An alternative research model based on continuous improvement may help states and districts to succeed with their i3 proposals—and with their projects, once funded.

Now that improved state and district data systems are increasing the ability of school systems to quickly reference several years of data on students and teachers, i3 can start looking at how rigorous research is built into the innovations they fund—not just the one-time evaluation typically built into federal grant proposals.

This kind of research for continuous improvement is an innovation in itself—an innovation that may start with the “data-driven decision making” mode in which data are explored to identify an area of weakness or a worrisome trend. But the real innovation in research will consist of states and districts building their own capacity to evaluate whether the intervention they decided to implement actually strengthened the area of weakness or arrested the worrisome trend they identified and chose to address. Perhaps it did so for some schools but not others, or maybe it caught on with some teachers but not with all. The ability of educators to look at this progress in relation to the initial goals completes the cycle of continuous improvement and sets the stage for refocusing, tweaking, or fully redesigning the intervention under study.

We predict that i3 reviewers, rather than depending solely on strong existing evidence, will look for proposals that also include a plan for continuous improvement that can be part of how the innovation assures its success. In this model, research need not be limited to the activity of an “external evaluator” that absorbs 10% of the grant. Instead, routine use of research processes can be an innovation that builds the internal capacity of states and districts for continuous improvement.
-DN

Friday, September 11, 2009

Easton Sets a New Agenda for IES

John Easton, now officially confirmed as the director of the Institute of Education Sciences, gave a brief talk July 24th to explain his agenda to the directors and staff of the Regional Education Labs. This is a particularly pivotal time, not only because the Obama administration is setting an aggressive direction for changes in the K-12 schools, but also because the Easton is starting his six-year term just as IES is preparing the RFP for the re-competition for the 10 RELs. (The budget for the RELs accounts for about 11% of the approximately $600 million IES budget.)

Easton made five points.

First, he is not retreating from the methodological rigor, which was the hallmark of his predecessor, Russ Whitehurst. This simply means that IES will not be funding poorly designed research that does not have the proper controls to support conclusions the researcher wants to assert. Randomized control is still the strongest design for effectiveness studies, although weaker designs are recognized as having value.

Second, there has to be more emphasis on relevance and usability for practitioners. IES can’t ignore how decisions are made and what kind of evidence can usefully inform them. He sees this as requiring a new focus on school systems as learning organizations. This becomes a topic for research and development.

Third, although randomized experiments will still be conducted, there needs to be a stronger tie to what is then done with the findings. In a research and development process, rigorous evaluation should be built in from the start and should relate more specifically to the needs of the practitioners who are part of the R&D process. In this sense, the R&D process should be linked more directly to the needs of the practitioners.

Fourth, IES will move away from the top-down dissemination model in which researchers seem to complete a study and then throw the findings over the wall to practitioners. Instead, researchers should engage practitioners in the use of evidence, understanding that the value of research findings comes in its application, not simply in being released or published. IES will take on the role of facilitating the use of evidence.

Fifth, IES will take on a stronger role in building capacity to conduct research at the local level and within state education agencies. There’s a huge opportunity presented by the investment (also through IES) in state longitudinal data systems. The combination of state systems and the local district systems makes gathering the data to answer policy questions and questions about program effectiveness much easier. The education agencies, however, often need help in framing their questions, applying an appropriate design, and deploying the necessary and appropriate statistics to turn the data into evidence.

These five points form a coherent picture of a research agency that will work more closely through all phases of the research process with practitioners, who will be engaged in developing the findings and putting them into practice. This suggests new roles for the Regional Education Labs in helping their constituencies to answer questions pertinent to their local needs, in engaging them more deeply in using the evidence found, and in building their local capacity to answer their own questions. The quality of work will be maintained, and the usability and local relevance will be greatly increased.
— DN

Thursday, July 9, 2009

The Problem with National Experiments

We welcome the statement of the director of the Office of Management and Budget (OMB), Peter R. Orszag, issued as a blog entry, calling for the use of evidence.

“I am trying to put much more emphasis on evidence-based policy decisions here at OMB. Wherever possible, we should design new initiatives to build rigorous data about what works and then act on evidence that emerges — expanding the approaches that work best, fine-tuning the ones that get mixed results, and shutting down those that are failing.”

This suggests a continuous process of improving programs based on evaluations built into the fabric of program implementations, which sounds very valuable. Our concern, however, at least in the domain of education, is that Congress or the Department of Education will contract for a national experiment to prove a program or policy effective. In contrast, we advocate a more localized and distributed approach based on the argument Donald Campbell made in the early 70s in his classic paper “The Experimenting Society” (updated in 1988). He observes that “the U.S. Congress is apt to mandate an immediate, nationwide evaluation of a new program to be done by a single evaluator, once and for all, subsequent implementations to go without evaluation.” Instead, he describes a “contagious cross-validation model for local programs” and recommends a much more distributed approach that would “support adoptions that included locally designed cross-validating evaluations, including funds for appropriate comparison groups not receiving the treatment.” Using such a model, he predicts that “After five years we might have 100 locally interpretable experiments.” (p.303)

Dr. Orszag’s adoption of the “top tier” language from the Coalition for Evidence Based Policy is buying into the idea that an educational program can be proven effective in a single large scale randomized experiment. There are several weaknesses in this approach.

First, the education domain is extremely diverse and, without the “100 locally interpretable experiments,” it is unlikely that educators would have an opportunity to see a program at work in a sufficient number of contexts to begin to build up generalizations. Moreover, as local educators and program developers improve their programs, additional rounds of testing are called for (and even the “top tier” programs should engage in continuous improvement).

Second, the information value of local experiments is much higher for the decision-maker who will always be concerned with performance in his or her school or district. National experiments generate average impact estimates, while giving little information about any particular locale. Because concern with achievement gaps between specific populations differs across communities, it follows that, in a local experiment, reducing a specific gap—not the overall average effect—may well be the effect of primary interest.

Third, local experiments are vastly less expensive than nationally contracted experiments, even while obtaining comparable statistical power. Local experiments can easily be one-tenth the cost of national experiments, thus conducting 100 of them is quite feasible. (We say more about the reasons for the cost differential in a separate policy brief). Better yet, local experiments can be completed in a more timely manner—it need not take five years to accumulate a wealth of evidence. Ironically, one factor making national experiments expensive, as well as slow, is the review process required by OMB!

So while we applaud Dr. Orszag’s leadership in promoting evidence-based policy decisions, we will continue to be interested in how this impacts state and local agencies. We hope that, instead of contracting for national experiments, the OMB and other federal agencies can help state and local agencies to build evaluation for continuous improvement into the implementation of federally funded programs. If nothing else, it helps to have OMB publicly making evidence-based decisions. —DN

Campbell, D. T. (1988). The Experimenting Society. In E. S. Overman (Ed.), Methodology and epistemology for social science: Selected Papers. (pp. 303). Chicago: University of Chicago Press.

Tuesday, June 9, 2009

It’s Not the Money, It’s What You Spend It On

Our neighbor from the Hoover Institution, Eric (Rick) Hanushek, who also currently chairs the National Education Sciences Board, has just published a very interesting book (with co-author Alfred Lindseth) on the financing of schools1. It provides a very readable narrative of the last couple of decades’ court decisions about how much money it should take to provide an equitable and adequate K-12 education. The authors’ basic thesis is that the amounts of money schools spend are generally unrelated to increases in achievement, unless one considers what the money is spent on. Clearly, if spending was focused on policies and programs that lead to achievement gains and to decreases in the achievement gaps between populations, things would improve. But court-ordered increases in education spending have seldom used credible estimates of likely impact of various programs, even though the programs' costs were used in calculating how much an equitable or adequate education will cost. The authors document in fascinating detail the irrationality of the process of producing these cost estimates.

Hanushek and Lindseth propose that, where administrators and teachers are accountable and rewarded for results, they will consider the trade-offs in efficiency of spending money one way or another. For example, smaller class size may lead to better results but, if the same money were spent to increase teacher quality, the results may be much more substantial. This proposal, of course, depends on there being sufficient evidence that various programs, policies, or approach have a measurable impact. And they further acknowledge that getting this information is not a matter of running one-time experimental evaluations. The wide variation of populations, resources, and standards in US school systems means that a large number of smaller scale evaluations are called for. If states and school districts were to get into the habit of routinely pilot testing programs locally (and collecting and analyzing the data systematically) before scaling up within the district or state, the gains in efficiency could be substantial.

Hanushek and Lindseth do not address the question of how local evaluations of sufficient quality and quantity can be paid for. If one depends solely on the Institute of Education Sciences for grants and contracts, the process will be slow and the resources inadequate. Setting aside a certain percentage of federal grants to states and districts for evaluations is often unproductive because the evaluations are not designed or timed to provide feedback for continuous improvement. Too often, educators and administrators treat the evaluation as a requirement that takes money from the program. We have argued elsewhere that integrating research into program implementation at the local level calls for building local school district capacity for rigorous evaluations. It also calls for a reform agenda that changes how decisions are connected both to explorations of district data and to locally generated evidence as to whether programs and policies are having the desired impact. This is different from contracting with the evaluator once program is under way because the plan for the evaluation is part of the plan for implementation. Directing a good portion of the program funds to a process of continuous improvement will make the program more efficient and provide educators with the hundreds of studies that will begin to accumulate the kind of evidence that they need to make a rational choice about what programs are worth trying out in their own locale.

Educators, especially those who spend their days engaged with children in a classroom, may find the rational economic model on which the authors’ proposals are based unsatisfying and perhaps simplistic. Most people don’t go into education because they are maximizing their economic return. Nonetheless, it is hard to find a rationale for retaining teachers who are demonstrably ineffective beyond the traditional practice of union solidarity that militates against differentiation of skills among its members. The authors' arguments are thought provoking in that they demonstrate in rich narrative detail the obvious irrationality of considering only the amount of money put into schools and not considering the effectiveness of the programs, policies, and approaches that the money is spent on. —DN

1 Hanushek, E.A. & Lindseth, A.A. (2009). Schoolhouses, courthouses and statehouses: Solving the funding-achievement puzzle in America’s schools. Princeton NJ: Princeton University Press.

Wednesday, May 13, 2009

Compliance Anxiety

Stimulus funds are beginning to flow. But not as quickly as needed to provide a boost to the economy. One source of hesitation might be called “compliance anxiety.” People in school systems know that the Department of Education is looking for bold innovations and progress toward lasting reforms of the schools (see, for example, the recently published suggestions), but are not sure exactly what is going to be asked of them in terms of accounting for the funds they spend. The third guiding principle of ARRA calls for K-12 districts to “ensure transparency, reporting, and accountability.” This is meant to prevent fraud and abuse, to support the most effective uses of ARRA funds, and to accurately measure and track results.

Over the past few weeks, in webinars and similar venues, educators have been asking what this means. Many are hesitant to commit funds without knowing what evidence of compliance will be called for. The following quotes were compiled by Jennifer House, Ph.D., Founder of Redrock Reports:

Superintendent of a large suburban district: “We just need to know what kind of data needs to be collected for the accountability portion of ARRA—especially funds in the State Fiscal Stabilization Fund.”

Superintendent of an urban district: “When is the Department of Education going to tell us what data they need for the accountability and reporting requirements of ARRA?”

Title I director of a major urban district: “I know what I need to do for Title I reporting. Is there any other data I need to collect to report on the use and impact of the ARRA funds?”

IDEA director of a large suburban district: “What other data is needed about ARRA funds”

Paraphrase of seven questions from a single MDR webinar: “When will we hear what the accountability requirements are for ARRA?”

CIO of a large suburban district: “We need to accommodate the data that needs to be collected in our system for ARRA. When do we get the word?”

These educators need to know what is meant by “accurately measure and track results.” Will this information just be used to audit who was paid for what? Or will ED be calling for a measure of results in terms of impact on schools, teachers, and student achievement?

State Education Agencies are asked for “baseline data that demonstrates the State’s current status in each of the four education reform areas.” Will the states and districts be asked for subsequent data showing an improvement over baseline?

Educators have heard that, in the near future, ED will describe specific data metrics that states will use to make transparent their status in the four education reform areas for the purpose of “showing how schools are performing and helping schools improve.” They expect that this will not be a one-time data collection; instead, they expect an element of tracking to help them with continuous improvement.

ED has a one-time opportunity to move education toward an evidence-based enterprise on a massive scale by calling for evidence of outcomes—not just the starting baseline. Conditions are ripe for quickly and easily promoting a major reform in how districts measure their own results. Educators already expect this. A simple time series design is all that is needed. Training and support for this can be readily supplied through existing IES funding mechanisms.

Monday, April 6, 2009

The Advantages of Research on Local Problems

The nomination of John Q. Easton as the new head of IES highlights a debate that has been going on for quite a long time. As Donald Campbell noted in the early 70s in his classic paper “The Experimenting Society” (updated in 1988), “The U.S. Congress is apt to mandate an immediate, nationwide evaluation of a new program to be done by a single evaluator, once and for all, subsequent implementations to go without evaluation.” In contrast, he describes a “contagious cross-validation model for local programs” and recommends a much more distributed approach that would “support adoptions that included locally designed cross-validating evaluations, including funds for appropriate comparison groups not receiving the treatment.” Using such a model, he predicts that “After five years we might have 100 locally interpretable experiments.” (p.303) The work of the Consortium on Chicago School Research, which Easton has led, has a local focus on Chicago schools consistent with the idea that experiments should be locally interpretable. Elsewhere, we have argued that local experiments can also be vastly less expensive; thus having 100 of them is quite feasible. These experiments also can be completed in a more timely manner—it need not take five years to accumulate a wealth of evidence. We welcome a change in orientation at IES from organizing single large national experiments to the more useful, efficient, and practical model of supporting many local rigorous experiments. –DN

Campbell, D. T. (1988). The Experimenting Society. In E. S. Overman (Ed.), Methodology and epistemology for social science: Selected Papers. (pp. 303). Chicago: University of Chicago Press.