Americans are more polarized than at any point in recent history. On issue after issue—abortion, the Affordable Care Act, or just about anything else— Democrats stand on one side and Republicans stand on the other. It can be difficult for leaders to build consensus around policy when the two sides each have their own base of support. But is the public so divided over school issues?
Last year, Education Next conducted a poll asking Americans about 17 education issues. On eight of these issues, there is no evidence that parties differ. Democrats are no more or less supportive than Republicans when it comes to universal vouchers, vouchers for students in failing schools, tax credits for donations to scholarship programs for private schools, higher pay for teachers in hard to staff subjects, higher pay for teachers in hard to staff schools, and awarding tenure on the basis of student performance.
There are differences on other issues—increasing spending, raising teacher pay, government funded universal preschool, government funded preschool for low income families, charter schools, vouchers for lowincome families, merit pay, tenure, and Common Core—but these differences hardly pit the parties in opposing corners of the ring. In only one case does the majority from one party oppose the majority from the other. Nearly threefourths of Democrats favor more spending on public schools, and 54 percent of Republicans oppose it.
More typically, one of the parties musters a majority while the other splits. For example, 66 percent of Republicans oppose government funded vouchers for students from lowincome families, but 42 percent of Democrats oppose and 45 percent support. For two issues, majorities of both parties are on the same side despite a gap in support. Sixty two percent of Republicans favor merit pay, but so do 54 percent of Democrats. Seventy one percent of Democrats want to raise teacher pay; so do 52 percent of Republicans.
Without partisan division surely the door is open for bipartisan school reform? Not exactly. The lack of polarization on school issues probably has more to do with confusion than consensus.
The opinions that most Americans express on school issues are not wellinformed, not organized in any coherent way, and not consistent over time. The 2014 survey contained factual questions about Common Core. Nearly twothirds of respondents had either never heard of the standards or answered “don’t know” to the factual questions. In 2013 another survey asked Americans factual questions about charter schools. Half of the respondents said they did not know the answers while another 20 to 30 percent gave the wrong answers. Other past surveys have shown that Americans consistently underestimate perpupil spending and teacher salaries.
If a coherent belief system underlay the opinions expressed on the 2014 survey, we could expect that a person would take similar positions on similar issues. They do not. Knowing where someone stands on charter schools does not reveal much about where they stand on vouchers or merit pay, much less tenure, testing, or spending. Responses across these issues are weakly correlated (the average pairwise correlation is 0.16 and all but a handful fall below 0.25).
Finally, many individuals change their opinion quickly. Each year, the Education Next surveys include a sample of respondents from the previous survey. With one important exception (Common Core), aggregate opinion is relatively stable. Yet, this aggregate stability masks flux at the individual level. For example, on merit pay and charter schools just 60 percent and 57 percent, respectively, come down on the same side in 2014 as they did in 2013. Only 51 percent take the same side on vouchers for students in lowincome families in both years. These changes appear to be random. People are not changing their minds so much as just changing their responses without giving the issue much “mind” in the first place.
These are the trademarks of what public opinion scholars call “nonattitudes,” uninformed and haphazard responses without any real underlying opinion. This occurs when the public has not given an issue much attention. Americans may value education, but as an issue it is not at the forefront of their minds. When asked what they think is the most important issue facing the nation, only about five percent say education.
This murky ground of confusion is unlikely to make a solid foundation for consensus. Typically when the public starts paying attention to an issue, they look to their party leaders and fall in line accordingly. As they learn about the debate, confusion turns into polarization. It is unsurprising that the biggest partisan gap here concerns spending, an issue that easily taps into a familiar broader debate between parties. We are now seeing parties polarize over the Common Core as well. If issues such as testing, charters, or preschool seize the public mind, they may soon follow the same path.
Americans are more polarized than at any point in recent history. On issue after issue—abortion, the Affordable Care Act, or just about anything else— Democrats stand on one side and Republicans stand on the other. It can be difficult for leaders to build consensus around policy when the two sides each have their own base of support. But is the public so divided over school issues?
Last year, Education Next conducted a poll asking Americans about 17 education issues. On eight of these issues, there is no evidence that parties differ. Democrats are no more or less supportive than Republicans when it comes to universal vouchers, vouchers for students in failing schools, tax credits for donations to scholarship programs for private schools, higher pay for teachers in hard to staff subjects, higher pay for teachers in hard to staff schools, and awarding tenure on the basis of student performance.
There are differences on other issues—increasing spending, raising teacher pay, government funded universal preschool, government funded preschool for low income families, charter schools, vouchers for lowincome families, merit pay, tenure, and Common Core—but these differences hardly pit the parties in opposing corners of the ring. In only one case does the majority from one party oppose the majority from the other. Nearly threefourths of Democrats favor more spending on public schools, and 54 percent of Republicans oppose it.
More typically, one of the parties musters a majority while the other splits. For example, 66 percent of Republicans oppose government funded vouchers for students from lowincome families, but 42 percent of Democrats oppose and 45 percent support. For two issues, majorities of both parties are on the same side despite a gap in support. Sixty two percent of Republicans favor merit pay, but so do 54 percent of Democrats. Seventy one percent of Democrats want to raise teacher pay; so do 52 percent of Republicans.
Without partisan division surely the door is open for bipartisan school reform? Not exactly. The lack of polarization on school issues probably has more to do with confusion than consensus.
The opinions that most Americans express on school issues are not wellinformed, not organized in any coherent way, and not consistent over time. The 2014 survey contained factual questions about Common Core. Nearly twothirds of respondents had either never heard of the standards or answered “don’t know” to the factual questions. In 2013 another survey asked Americans factual questions about charter schools. Half of the respondents said they did not know the answers while another 20 to 30 percent gave the wrong answers. Other past surveys have shown that Americans consistently underestimate perpupil spending and teacher salaries.
If a coherent belief system underlay the opinions expressed on the 2014 survey, we could expect that a person would take similar positions on similar issues. They do not. Knowing where someone stands on charter schools does not reveal much about where they stand on vouchers or merit pay, much less tenure, testing, or spending. Responses across these issues are weakly correlated (the average pairwise correlation is 0.16 and all but a handful fall below 0.25).
Finally, many individuals change their opinion quickly. Each year, the Education Next surveys include a sample of respondents from the previous survey. With one important exception (Common Core), aggregate opinion is relatively stable. Yet, this aggregate stability masks flux at the individual level. For example, on merit pay and charter schools just 60 percent and 57 percent, respectively, come down on the same side in 2014 as they did in 2013. Only 51 percent take the same side on vouchers for students in lowincome families in both years. These changes appear to be random. People are not changing their minds so much as just changing their responses without giving the issue much “mind” in the first place.
These are the trademarks of what public opinion scholars call “nonattitudes,” uninformed and haphazard responses without any real underlying opinion. This occurs when the public has not given an issue much attention. Americans may value education, but as an issue it is not at the forefront of their minds. When asked what they think is the most important issue facing the nation, only about five percent say education.
This murky ground of confusion is unlikely to make a solid foundation for consensus. Typically when the public starts paying attention to an issue, they look to their party leaders and fall in line accordingly. As they learn about the debate, confusion turns into polarization. It is unsurprising that the biggest partisan gap here concerns spending, an issue that easily taps into a familiar broader debate between parties. We are now seeing parties polarize over the Common Core as well. If issues such as testing, charters, or preschool seize the public mind, they may soon follow the same path.
Many contemporary education reform efforts attempt to leverage teacher evaluation policy to improve teacher quality, by making the evaluation process more rigorous or by tying results more directly to student learning outcomes, for example. By increasing the demand for highquality teaching and teachers, these reforms have had some success. However, insufficient attention to the supply of teachers may be preventing many teacher quality and evaluation reforms from realizing their full potential.
To be clear, there is preliminary evidence suggesting that contemporary evaluation reforms may in at least some cases have the desired effects. For example, raising the rigor and stakes of teacher evaluations in New York City and Washington, DC seems to have improved teacher quality in both locations, whether through teacher improvement or selective attrition of weaker teachers.
At the same time, however, many other contentious efforts to reform teacher evaluation have resulted in little change to teacher evaluation outcomes. Recent statewide efforts to make evaluations more rigorous and meaningful in New Jersey, New Mexico, New York, Florida, Indiana, Rhode Island, Maryland, and Hawaii, for example, have resulted in the vast majority of teachers— often well over 90%— continuing to receive ratings of “effective” or “highly effective.” These reforms may have been valuable, but they have disappointed reformers who are skeptical that the results accurately reflect the quality of the teaching force in schools where large numbers of students are not academically proficient.
If we are surprised by the often muted effects of teacher evaluation reform, that is perhaps because we are insufficiently sensitive to the forces that contribute to seemingly inflated teacher evaluations. And there are many reasons why managers might—and often do— tend to evaluate their employees highly, including: aversion to interpersonal conflict; genuine beliefs about the quality of the employees they’ve hired; and the maintenance of workplace morale.
Additionally, many teacher evaluation reform efforts may be focused too heavily on the demand side of teacher evaluation. That is, many reform efforts tend to assume that principals are overly generous with their evaluations because they lack either the motivation or the information to demand better performance from their teachers. There may be something to this, but it is important not to ignore the supply side of the teacher quality problem. After all, the extent to which a principal is willing to dismiss (or give a poor evaluation to) a teacher will likely depend in part upon her beliefs about the probability of finding a superior replacement in a reasonable period of time.
The extent to which principals today are constrained in their evaluation and dismissal decisions by the quality and size of the teacher labor supply is not obvious and probably varies by grade level, content area, and geographic location. There are, however, reasons to suspect that teacher supply constraints are real and may be getting worse.
As an example, consider my own state of California. A number of states have seen steep drops in enrollment in teacher preparation programs in recent years, and the declines are particularly stark in California. The number of new teaching credentials issued in the Golden State has fallen for ten consecutive years, for a total drop of 53 percent from the peak in 2004. (Total K12 public school enrollment in California is up nine percent since 1998 and has declined one percent since 2004.)
The vast majority of this decline is attributable to a sharp drop in the number of multiple subject credentials (usually granted to elementary school teachers), which may suggest that elementary teachers have historically been overproduced in California. However, single subject credentials— generally required for middle and high school teachers— have fallen 41 percent since 2004 as well.
There is anecdotal evidence that these declines in new certifications are putting pressure on districts and administrators, with many schools reporting difficulty finding qualified applicants for some open teaching positions. California has also seen an increase in temporary teaching permits and waivers of normal credentialing requirements, possibly an indication of teacher supply constraints.
Ultimately, the extent to which the decline in new credentials is impacting districts’ hiring, principals’ evaluation processes, or students’ learning in California is not clear. It is nevertheless plausible that a shrinking supply of teachers increases principals’ uncertainty about their prospects for finding superior replacements for unsatisfactory members of their staffs, especially for certain hardertostaff teaching positions. As a result, principals may be less inclined to dismiss their weak teachers or even to risk offending them with low evaluation scores.
Because the causes of our shrinking teacher supply are not entirely clear, it is difficult to know how best to respond. Still, some general—if mostly untested— principles suggest themselves.
First, stricter evaluation requirements should probably be coupled with more aggressive teacher recruitment and retention policies. Higher salaries or more pleasant working conditions could go a long way toward growing the supply of teachers and toward making sure that good teachers aren’t leaving voluntarily. There is evidence that teacher turnover is harmful to student achievement, and a principal faced with replacing several voluntarilydeparting teachers may be less inclined to evaluate the teachers who remain more harshly.
Second, policymakers should consider systems of differential compensation and evaluation that allow districts and administrators to be more generous or flexible with hardertostaff teaching positions. This may mean higher salaries for teachers in certain schools, subject areas, or grade levels. It could also mean somewhat looser evaluation requirements for teachers who, realistically, would be more difficult to replace. Allowing administrators to waive subsequent observations and evaluations for math teachers who perform well early in the year, for example, could both gratify math teachers and free principals to focus their evaluation efforts where they may be more useful.
The fact that the supply of teachers can change dramatically suggests that we should perhaps focus more on the supply side of teacher quality issues. Researchers should examine more carefully the extent to which teacher supply can be manipulated and the ways in which it impacts (or doesn’t impact) student learning and human resources for schools. Reformers and policymakers, meanwhile, should be mindful of whether their preferred reforms depend on—or have implications for— the quantity and quality of the teacher supply.
Many contemporary education reform efforts attempt to leverage teacher evaluation policy to improve teacher quality, by making the evaluation process more rigorous or by tying results more directly to student learning outcomes, for example. By increasing the demand for highquality teaching and teachers, these reforms have had some success. However, insufficient attention to the supply of teachers may be preventing many teacher quality and evaluation reforms from realizing their full potential.
To be clear, there is preliminary evidence suggesting that contemporary evaluation reforms may in at least some cases have the desired effects. For example, raising the rigor and stakes of teacher evaluations in New York City and Washington, DC seems to have improved teacher quality in both locations, whether through teacher improvement or selective attrition of weaker teachers.
At the same time, however, many other contentious efforts to reform teacher evaluation have resulted in little change to teacher evaluation outcomes. Recent statewide efforts to make evaluations more rigorous and meaningful in New Jersey, New Mexico, New York, Florida, Indiana, Rhode Island, Maryland, and Hawaii, for example, have resulted in the vast majority of teachers— often well over 90%— continuing to receive ratings of “effective” or “highly effective.” These reforms may have been valuable, but they have disappointed reformers who are skeptical that the results accurately reflect the quality of the teaching force in schools where large numbers of students are not academically proficient.
If we are surprised by the often muted effects of teacher evaluation reform, that is perhaps because we are insufficiently sensitive to the forces that contribute to seemingly inflated teacher evaluations. And there are many reasons why managers might—and often do— tend to evaluate their employees highly, including: aversion to interpersonal conflict; genuine beliefs about the quality of the employees they’ve hired; and the maintenance of workplace morale.
Additionally, many teacher evaluation reform efforts may be focused too heavily on the demand side of teacher evaluation. That is, many reform efforts tend to assume that principals are overly generous with their evaluations because they lack either the motivation or the information to demand better performance from their teachers. There may be something to this, but it is important not to ignore the supply side of the teacher quality problem. After all, the extent to which a principal is willing to dismiss (or give a poor evaluation to) a teacher will likely depend in part upon her beliefs about the probability of finding a superior replacement in a reasonable period of time.
The extent to which principals today are constrained in their evaluation and dismissal decisions by the quality and size of the teacher labor supply is not obvious and probably varies by grade level, content area, and geographic location. There are, however, reasons to suspect that teacher supply constraints are real and may be getting worse.
As an example, consider my own state of California. A number of states have seen steep drops in enrollment in teacher preparation programs in recent years, and the declines are particularly stark in California. The number of new teaching credentials issued in the Golden State has fallen for ten consecutive years, for a total drop of 53 percent from the peak in 2004. (Total K12 public school enrollment in California is up nine percent since 1998 and has declined one percent since 2004.)
The vast majority of this decline is attributable to a sharp drop in the number of multiple subject credentials (usually granted to elementary school teachers), which may suggest that elementary teachers have historically been overproduced in California. However, single subject credentials— generally required for middle and high school teachers— have fallen 41 percent since 2004 as well.
There is anecdotal evidence that these declines in new certifications are putting pressure on districts and administrators, with many schools reporting difficulty finding qualified applicants for some open teaching positions. California has also seen an increase in temporary teaching permits and waivers of normal credentialing requirements, possibly an indication of teacher supply constraints.
Ultimately, the extent to which the decline in new credentials is impacting districts’ hiring, principals’ evaluation processes, or students’ learning in California is not clear. It is nevertheless plausible that a shrinking supply of teachers increases principals’ uncertainty about their prospects for finding superior replacements for unsatisfactory members of their staffs, especially for certain hardertostaff teaching positions. As a result, principals may be less inclined to dismiss their weak teachers or even to risk offending them with low evaluation scores.
Because the causes of our shrinking teacher supply are not entirely clear, it is difficult to know how best to respond. Still, some general—if mostly untested— principles suggest themselves.
First, stricter evaluation requirements should probably be coupled with more aggressive teacher recruitment and retention policies. Higher salaries or more pleasant working conditions could go a long way toward growing the supply of teachers and toward making sure that good teachers aren’t leaving voluntarily. There is evidence that teacher turnover is harmful to student achievement, and a principal faced with replacing several voluntarilydeparting teachers may be less inclined to evaluate the teachers who remain more harshly.
Second, policymakers should consider systems of differential compensation and evaluation that allow districts and administrators to be more generous or flexible with hardertostaff teaching positions. This may mean higher salaries for teachers in certain schools, subject areas, or grade levels. It could also mean somewhat looser evaluation requirements for teachers who, realistically, would be more difficult to replace. Allowing administrators to waive subsequent observations and evaluations for math teachers who perform well early in the year, for example, could both gratify math teachers and free principals to focus their evaluation efforts where they may be more useful.
The fact that the supply of teachers can change dramatically suggests that we should perhaps focus more on the supply side of teacher quality issues. Researchers should examine more carefully the extent to which teacher supply can be manipulated and the ways in which it impacts (or doesn’t impact) student learning and human resources for schools. Reformers and policymakers, meanwhile, should be mindful of whether their preferred reforms depend on—or have implications for— the quantity and quality of the teacher supply.
Welcome to the new home of the Brown Center Chalkboard, a trusted source of evidencebased research and analysis related to U.S. education policy and practice.
The Chalkboard began in 2013 as a weekly publication and quickly became a goto destination for those seeking empirical discussions of education policy. Today, we are pleased to announce the expansion of the Chalkboard to provide more frequent, timely, and diverse content. Now a part of the Brookings Institution’s roster of policy blogs, the Chalkboard will be home to original analyses and commentary by Brown Center scholars and expert analysts.
Chalkboard contributors will continue to bring evidence to bear on the pressing policy questions facing all levels and facets of American education. We hope you check back often to read what’s new.
Welcome to the new home of the Brown Center Chalkboard, a trusted source of evidencebased research and analysis related to U.S. education policy and practice.
The Chalkboard began in 2013 as a weekly publication and quickly became a goto destination for those seeking empirical discussions of education policy. Today, we are pleased to announce the expansion of the Chalkboard to provide more frequent, timely, and diverse content. Now a part of the Brookings Institution’s roster of policy blogs, the Chalkboard will be home to original analyses and commentary by Brown Center scholars and expert analysts.
Chalkboard contributors will continue to bring evidence to bear on the pressing policy questions facing all levels and facets of American education. We hope you check back often to read what’s new.
This is part two of my analysis of instruction and Common Core’s implementation. I dubbed the threepart examination of instruction “The Good, The Bad, and the Ugly.” Having discussed “the “good” in part one, I now turn to “the bad.” One particular aspect of the Common Core math standards—the treatment of standard algorithms in whole number arithmetic—will lead some teachers to waste instructional time.
In 1963, psychologist John B. Carroll published a short essay, “A Model of School Learning” in Teachers College Record. Carroll proposed a parsimonious model of learning that expressed the degree of learning (or what today is commonly called achievement) as a function of the ratio of time spent on learning to the time needed to learn.
The numerator, time spent learning, has also been given the term opportunity to learn. The denominator, time needed to learn, is synonymous with student aptitude. By expressing aptitude as time needed to learn, Carroll refreshingly broke through his era’s debate about the origins of intelligence (nature vs. nurture) and the vocabulary that labels students as having more or less intelligence. He also spoke directly to a primary challenge of teaching: how to effectively produce learning in classrooms populated by students needing vastly different amounts of time to learn the exact same content.^{[i]}^{ }
The source of that variation is largely irrelevant to the constraints placed on instructional decisions. Teachers obviously have limited control over the denominator of the ratio (they must take kids as they are) and less than one might think over the numerator. Teachers allot time to instruction only after educational authorities have decided the number of hours in the school day, the number of days in the school year, the number of minutes in class periods in middle and high schools, and the amount of time set aside for lunch, recess, passing periods, various pullout programs, pep rallies, and the like. There are also announcements over the PA system, stray dogs that may wander into the classroom, and other unscheduled encroachments on instructional time.
The model has had a profound influence on educational thought. As of July 5, 2015, Google Scholar reported 2,931 citations of Carroll’s article. Benjamin Bloom’s “mastery learning” was deeply influenced by Carroll. It is predicated on the idea that optimal learning occurs when time spent on learning—rather than content—is allowed to vary, providing to each student the individual amount of time he or she needs to learn a common curriculum. This is often referred to as “students working at their own pace,” and progress is measured by mastery of content rather than seat time. David C. Berliner’s 1990 discussion of time includes an analysis of mediating variables in the numerator of Carroll’s model, including the amount of time students are willing to spend on learning. Carroll called this persistence, and Berliner links the construct to student engagement and time on task—topics of keen interest to researchers today. Berliner notes that although both are typically described in terms of motivation, they can be measured empirically in increments of time.
Most applications of Carroll’s model have been interested in what happens when insufficient time is provided for learning—in other words, when the numerator of the ratio is significantly less than the denominator. When that happens, students don’t have an adequate opportunity to learn. They need more time.
As applied to Common Core and instruction, one should also be aware of problems that arise from the inefficient distribution of time. Time is a limited resource that teachers deploy in the production of learning. Below I discuss instances when the CCSSM may lead to the numerator in Carroll’s model being significantly larger than the denominator—when teachers spend more time teaching a concept or skill than is necessary. Because time is limited and fixed, wasted time on one topic will shorten the amount of time available to teach other topics. Excessive instructional time may also negatively affect student engagement. Students who have fully learned content that continues to be taught may become bored; they must endure instruction that they do not need.
Jason Zimba, one of the lead authors of the Common Core Math standards, and Barry Garelick, a critic of the standards, had a recent, interesting exchange about when standard algorithms are called for in the CCSSM. A standard algorithm is a series of steps designed to compute accurately and quickly. In the U.S., students are typically taught the standard algorithms of addition, subtraction, multiplication, and division with whole numbers. Most readers of this post will recognize the standard algorithm for addition. It involves lining up two or more multidigit numbers according to placevalue, with one number written over the other, and adding the columns from right to left with “carrying” (or regrouping) as needed.
The standard algorithm is the only algorithm required for students to learn, although others are mentioned beginning with the first grade standards. Curiously, though, CCSSM doesn’t require students to know the standard algorithms for addition and subtraction until fourth grade. This opens the door for a lot of wasted time. Garelick questioned the wisdom of teaching several alternative strategies for addition. He asked whether, under the Common Core, only the standard algorithm could be taught—or at least, could it be taught first. As he explains:
Delaying teaching of the standard algorithm until fourth grade and relying on place value “strategies” and drawings to add numbers is thought to provide students with the conceptual understanding of adding and subtracting multidigit numbers. What happens, instead, is that the means to help learn, explain or memorize the procedure become a procedure unto itself and students are required to use inefficient cumbersome methods for two years. This is done in the belief that the alternative approaches confer understanding, so are superior to the standard algorithm. To teach the standard algorithm first would in reformers’ minds be rote learning. Reformers believe that by having students using strategies in lieu of the standard algorithm, students are still learning “skills” (albeit inefficient and confusing ones), and these skills support understanding of the standard algorithm. Students are left with a panoply of methods (praised as a good thing because students should have more than one way to solve problems), that confuse more than enlighten.
Zimba responded that the standard algorithm could, indeed, be the only method taught because it meets a crucial test: reinforcing knowledge of place value and the properties of operations. He goes on to say that other algorithms also may be taught that are consistent with the standards, but that the decision to do so is left in the hands of local educators and curriculum designers:
In short, the Common Core requires the standard algorithm; additional algorithms aren’t named, and they aren’t required…Standards can’t settle every disagreement—nor should they. As this discussion of just a single slice of the math curriculum illustrates, teachers and curriculum authors following the standards still may, and still must, make an enormous range of decisions.
Zimba defends delaying mastery of the standard algorithm until fourth grade, referring to it as a “culminating” standard that he would, if he were teaching, introduce in earlier grades. Zimba illustrates the curricular progression he would employ in a table, showing that he would introduce the standard algorithm for addition late in first grade (with twodigit addends) and then extend the complexity of its use and provide practice towards fluency until reaching the culminating standard in fourth grade. Zimba would introduce the subtraction algorithm in second grade and similarly ramp up its complexity until fourth grade.
It is important to note that in CCSSM the word “algorithm” appears for the first time (in plural form) in the third grade standards:
3.NBT.2 Fluently add and subtract within 1000 using strategies and algorithms based on place value, properties of operations, and/or the relationship between addition and subtraction.
The term “strategies and algorithms” is curious. Zimba explains, “It is true that the word ‘algorithms’ here is plural, but that could be read as simply leaving more choice in the hands of the teacher about which algorithm(s) to teach—not as a requirement for each student to learn two or more general algorithms for each operation!”
I have described before the “dog whistles” embedded in the Common Core, signals to educational progressives—in this case, math reformers—that despite these being standards, the CCSSM will allow them great latitude. Using the plural “algorithms” in this third grade standard and not specifying the standard algorithm until fourth grade is a perfect example of such a dog whistle.
It appears that the Common Core authors wanted to reach a political compromise on standard algorithms.
Standard algorithms were a key point of contention in the “Math Wars” of the 1990s. The 1997 California Framework for Mathematics required that students know the standard algorithms for all four operations—addition, subtraction, multiplication, and division—by the end of fourth grade.^{[ii]} The 2000 Massachusetts Mathematics Curriculum Framework called for learning the standard algorithms for addition and subtraction by the end of second grade and for multiplication and division by the end of fourth grade. These two frameworks were heavily influenced by mathematicians (from Stanford in California and Harvard in Massachusetts) and quickly became favorites of math traditionalists. In both states’ frameworks, the standard algorithm requirements were in direct opposition to the reformoriented frameworks that preceded them—in which standard algorithms were barely mentioned and alternative algorithms or “strategies” were encouraged.
Now that the CCSSM has replaced these two frameworks, the requirement for knowing the standard algorithms in California and Massachusetts slips from third or fourth grade all the way to sixth grade. That’s what reformers get in the compromise. They are given a green light to continue teaching alternative algorithms, as long as the algorithms are consistent with teaching place value and properties of arithmetic. But the standard algorithm is the only one students are required to learn. And that exclusivity is intended to please the traditionalists.
I agree with Garelick that the compromise leads to problems. In a 2013 Chalkboard post, I described a first grade math program in which parents were explicitly requested not to teach the standard algorithm for addition when helping their children at home. The students were being taught how to represent addition with drawings that clustered objects into groups of ten. The exercises were both time consuming and tedious. When the parents met with the school principal to discuss the matter, the principal told them that the math program was following the Common Core by promoting deeper learning. The parents withdrew their child from the school and enrolled him in private school.
The value of standard algorithms is that they are efficient and packed with mathematics. Once students have mastered singledigit operations and the meaning of place value, the standard algorithms reveal to students that they can take procedures that they already know work well with one and twodigit numbers, and by applying them over and over again, solve problems with large numbers. Traditionalists and reformers have different goals. Reformers believe exposure to several algorithms encourages flexible thinking and the ability to draw on multiple strategies for solving problems. Traditionalists believe that a bigger problem than students learning too few algorithms is that too few students learn even one algorithm.
I have been a critic of the math reform movement since I taught in the 1980s. But some of their complaints have merit. All too often, instruction on standard algorithms has left out meaning. As Karen C. Fuson and Sybilla Beckmann point out, “an unfortunate dichotomy” emerged in math instruction: teachers taught “strategies” that implied understanding and “algorithms” that implied procedural steps that were to be memorized. Michael Battista’s research has provided many instances of students clinging to algorithms without understanding. He gives an example of a student who has not quite mastered the standard algorithm for addition and makes numerous errors on a worksheet. On one item, for example, the student forgets to carry and calculates that 19 + 6 = 15. In a postworksheet interview, the student counts 6 units from 19 and arrives at 25. Despite the obvious discrepancy—(25 is not 15, the student agrees)—he declares that his answers on the worksheet must be correct because the algorithm he used “always works.”^{[iii]}^{ }
Math reformers rightfully argue that blind faith in procedure has no place in a thinking mathematical classroom. Who can disagree with that? Students should be able to evaluate the validity of answers, regardless of the procedures used, and propose alternative solutions. Standard algorithms are tools to help them do that, but students must be able to apply them, not in a robotic way, but with understanding.
Let’s return to Carroll’s model of time and learning. I conclude by making two points—one about curriculum and instruction, the other about implementation.
In the study of numbers, a coherent K12 math curriculum, similar to that of the previous California and Massachusetts frameworks, can be sketched in a few short sentences. Addition with whole numbers (including the standard algorithm) is taught in first grade, subtraction in second grade, multiplication in third grade, and division in fourth grade. Thus, the study of whole number arithmetic is completed by the end of fourth grade. Grades five through seven focus on rational numbers (fractions, decimals, percentages), and grades eight through twelve study advanced mathematics. Proficiency is sought along three dimensions: 1) fluency with calculations, 2) conceptual understanding, 3) ability to solve problems.
Placing the CCSSM standard for knowing the standard algorithms of addition and subtraction in fourth grade delays this progression by two years. Placing the standard for the division algorithm in sixth grade continues the twoyear delay. For many fourth graders, time spent working on addition and subtraction will be wasted time. They already have a firm understanding of addition and subtraction. The same thing for many sixth graders—time devoted to the division algorithm will be wasted time that should be devoted to the study of rational numbers. The numerator in Carroll’s instructional time model will be greater than the denominator, indicating the inefficient allocation of time to instruction.
As Jason Zimba points out, not everyone agrees on when the standard algorithms should be taught, the alternative algorithms that should be taught, the manner in which any algorithm should be taught, or the amount of instructional time that should be spent on computational procedures. Such decisions are made by local educators. Variation in these decisions will introduce variation in the implementation of the math standards. It is true that standards, any standards, cannot control implementation, especially the twists and turns in how they are interpreted by educators and brought to life in classroom instruction. But in this case, the standards themselves are responsible for the myriad approaches, many unproductive, that we are sure to see as schools teach various algorithms under the Common Core.
[i] Tracking, ability grouping, differentiated learning, programmed learning, individualized instruction, and personalized learning (including today’s flipped classrooms) are all attempts to solve the challenge of student heterogeneity.
[ii] An earlier version of this post incorrectly stated that the California framework required that students know the standard algorithms for all four operations by the end of third grade. I regret the error.
[iii] Michael T. Battista (2001). “Research and Reform in Mathematics Education,” pp. 3284 in The Great Curriculum Debate: How Should We Teach Reading and Math? (T. Loveless, ed., Brookings Instiution Press).
This is part two of my analysis of instruction and Common Core’s implementation. I dubbed the threepart examination of instruction “The Good, The Bad, and the Ugly.” Having discussed “the “good” in part one, I now turn to “the bad.” One particular aspect of the Common Core math standards—the treatment of standard algorithms in whole number arithmetic—will lead some teachers to waste instructional time.
In 1963, psychologist John B. Carroll published a short essay, “A Model of School Learning” in Teachers College Record. Carroll proposed a parsimonious model of learning that expressed the degree of learning (or what today is commonly called achievement) as a function of the ratio of time spent on learning to the time needed to learn.
The numerator, time spent learning, has also been given the term opportunity to learn. The denominator, time needed to learn, is synonymous with student aptitude. By expressing aptitude as time needed to learn, Carroll refreshingly broke through his era’s debate about the origins of intelligence (nature vs. nurture) and the vocabulary that labels students as having more or less intelligence. He also spoke directly to a primary challenge of teaching: how to effectively produce learning in classrooms populated by students needing vastly different amounts of time to learn the exact same content.^{[i]}^{ }
The source of that variation is largely irrelevant to the constraints placed on instructional decisions. Teachers obviously have limited control over the denominator of the ratio (they must take kids as they are) and less than one might think over the numerator. Teachers allot time to instruction only after educational authorities have decided the number of hours in the school day, the number of days in the school year, the number of minutes in class periods in middle and high schools, and the amount of time set aside for lunch, recess, passing periods, various pullout programs, pep rallies, and the like. There are also announcements over the PA system, stray dogs that may wander into the classroom, and other unscheduled encroachments on instructional time.
The model has had a profound influence on educational thought. As of July 5, 2015, Google Scholar reported 2,931 citations of Carroll’s article. Benjamin Bloom’s “mastery learning” was deeply influenced by Carroll. It is predicated on the idea that optimal learning occurs when time spent on learning—rather than content—is allowed to vary, providing to each student the individual amount of time he or she needs to learn a common curriculum. This is often referred to as “students working at their own pace,” and progress is measured by mastery of content rather than seat time. David C. Berliner’s 1990 discussion of time includes an analysis of mediating variables in the numerator of Carroll’s model, including the amount of time students are willing to spend on learning. Carroll called this persistence, and Berliner links the construct to student engagement and time on task—topics of keen interest to researchers today. Berliner notes that although both are typically described in terms of motivation, they can be measured empirically in increments of time.
Most applications of Carroll’s model have been interested in what happens when insufficient time is provided for learning—in other words, when the numerator of the ratio is significantly less than the denominator. When that happens, students don’t have an adequate opportunity to learn. They need more time.
As applied to Common Core and instruction, one should also be aware of problems that arise from the inefficient distribution of time. Time is a limited resource that teachers deploy in the production of learning. Below I discuss instances when the CCSSM may lead to the numerator in Carroll’s model being significantly larger than the denominator—when teachers spend more time teaching a concept or skill than is necessary. Because time is limited and fixed, wasted time on one topic will shorten the amount of time available to teach other topics. Excessive instructional time may also negatively affect student engagement. Students who have fully learned content that continues to be taught may become bored; they must endure instruction that they do not need.
Jason Zimba, one of the lead authors of the Common Core Math standards, and Barry Garelick, a critic of the standards, had a recent, interesting exchange about when standard algorithms are called for in the CCSSM. A standard algorithm is a series of steps designed to compute accurately and quickly. In the U.S., students are typically taught the standard algorithms of addition, subtraction, multiplication, and division with whole numbers. Most readers of this post will recognize the standard algorithm for addition. It involves lining up two or more multidigit numbers according to placevalue, with one number written over the other, and adding the columns from right to left with “carrying” (or regrouping) as needed.
The standard algorithm is the only algorithm required for students to learn, although others are mentioned beginning with the first grade standards. Curiously, though, CCSSM doesn’t require students to know the standard algorithms for addition and subtraction until fourth grade. This opens the door for a lot of wasted time. Garelick questioned the wisdom of teaching several alternative strategies for addition. He asked whether, under the Common Core, only the standard algorithm could be taught—or at least, could it be taught first. As he explains:
Delaying teaching of the standard algorithm until fourth grade and relying on place value “strategies” and drawings to add numbers is thought to provide students with the conceptual understanding of adding and subtracting multidigit numbers. What happens, instead, is that the means to help learn, explain or memorize the procedure become a procedure unto itself and students are required to use inefficient cumbersome methods for two years. This is done in the belief that the alternative approaches confer understanding, so are superior to the standard algorithm. To teach the standard algorithm first would in reformers’ minds be rote learning. Reformers believe that by having students using strategies in lieu of the standard algorithm, students are still learning “skills” (albeit inefficient and confusing ones), and these skills support understanding of the standard algorithm. Students are left with a panoply of methods (praised as a good thing because students should have more than one way to solve problems), that confuse more than enlighten.
Zimba responded that the standard algorithm could, indeed, be the only method taught because it meets a crucial test: reinforcing knowledge of place value and the properties of operations. He goes on to say that other algorithms also may be taught that are consistent with the standards, but that the decision to do so is left in the hands of local educators and curriculum designers:
In short, the Common Core requires the standard algorithm; additional algorithms aren’t named, and they aren’t required…Standards can’t settle every disagreement—nor should they. As this discussion of just a single slice of the math curriculum illustrates, teachers and curriculum authors following the standards still may, and still must, make an enormous range of decisions.
Zimba defends delaying mastery of the standard algorithm until fourth grade, referring to it as a “culminating” standard that he would, if he were teaching, introduce in earlier grades. Zimba illustrates the curricular progression he would employ in a table, showing that he would introduce the standard algorithm for addition late in first grade (with twodigit addends) and then extend the complexity of its use and provide practice towards fluency until reaching the culminating standard in fourth grade. Zimba would introduce the subtraction algorithm in second grade and similarly ramp up its complexity until fourth grade.
It is important to note that in CCSSM the word “algorithm” appears for the first time (in plural form) in the third grade standards:
3.NBT.2 Fluently add and subtract within 1000 using strategies and algorithms based on place value, properties of operations, and/or the relationship between addition and subtraction.
The term “strategies and algorithms” is curious. Zimba explains, “It is true that the word ‘algorithms’ here is plural, but that could be read as simply leaving more choice in the hands of the teacher about which algorithm(s) to teach—not as a requirement for each student to learn two or more general algorithms for each operation!”
I have described before the “dog whistles” embedded in the Common Core, signals to educational progressives—in this case, math reformers—that despite these being standards, the CCSSM will allow them great latitude. Using the plural “algorithms” in this third grade standard and not specifying the standard algorithm until fourth grade is a perfect example of such a dog whistle.
It appears that the Common Core authors wanted to reach a political compromise on standard algorithms.
Standard algorithms were a key point of contention in the “Math Wars” of the 1990s. The 1997 California Framework for Mathematics required that students know the standard algorithms for all four operations—addition, subtraction, multiplication, and division—by the end of fourth grade.^{[ii]} The 2000 Massachusetts Mathematics Curriculum Framework called for learning the standard algorithms for addition and subtraction by the end of second grade and for multiplication and division by the end of fourth grade. These two frameworks were heavily influenced by mathematicians (from Stanford in California and Harvard in Massachusetts) and quickly became favorites of math traditionalists. In both states’ frameworks, the standard algorithm requirements were in direct opposition to the reformoriented frameworks that preceded them—in which standard algorithms were barely mentioned and alternative algorithms or “strategies” were encouraged.
Now that the CCSSM has replaced these two frameworks, the requirement for knowing the standard algorithms in California and Massachusetts slips from third or fourth grade all the way to sixth grade. That’s what reformers get in the compromise. They are given a green light to continue teaching alternative algorithms, as long as the algorithms are consistent with teaching place value and properties of arithmetic. But the standard algorithm is the only one students are required to learn. And that exclusivity is intended to please the traditionalists.
I agree with Garelick that the compromise leads to problems. In a 2013 Chalkboard post, I described a first grade math program in which parents were explicitly requested not to teach the standard algorithm for addition when helping their children at home. The students were being taught how to represent addition with drawings that clustered objects into groups of ten. The exercises were both time consuming and tedious. When the parents met with the school principal to discuss the matter, the principal told them that the math program was following the Common Core by promoting deeper learning. The parents withdrew their child from the school and enrolled him in private school.
The value of standard algorithms is that they are efficient and packed with mathematics. Once students have mastered singledigit operations and the meaning of place value, the standard algorithms reveal to students that they can take procedures that they already know work well with one and twodigit numbers, and by applying them over and over again, solve problems with large numbers. Traditionalists and reformers have different goals. Reformers believe exposure to several algorithms encourages flexible thinking and the ability to draw on multiple strategies for solving problems. Traditionalists believe that a bigger problem than students learning too few algorithms is that too few students learn even one algorithm.
I have been a critic of the math reform movement since I taught in the 1980s. But some of their complaints have merit. All too often, instruction on standard algorithms has left out meaning. As Karen C. Fuson and Sybilla Beckmann point out, “an unfortunate dichotomy” emerged in math instruction: teachers taught “strategies” that implied understanding and “algorithms” that implied procedural steps that were to be memorized. Michael Battista’s research has provided many instances of students clinging to algorithms without understanding. He gives an example of a student who has not quite mastered the standard algorithm for addition and makes numerous errors on a worksheet. On one item, for example, the student forgets to carry and calculates that 19 + 6 = 15. In a postworksheet interview, the student counts 6 units from 19 and arrives at 25. Despite the obvious discrepancy—(25 is not 15, the student agrees)—he declares that his answers on the worksheet must be correct because the algorithm he used “always works.”^{[iii]}^{ }
Math reformers rightfully argue that blind faith in procedure has no place in a thinking mathematical classroom. Who can disagree with that? Students should be able to evaluate the validity of answers, regardless of the procedures used, and propose alternative solutions. Standard algorithms are tools to help them do that, but students must be able to apply them, not in a robotic way, but with understanding.
Let’s return to Carroll’s model of time and learning. I conclude by making two points—one about curriculum and instruction, the other about implementation.
In the study of numbers, a coherent K12 math curriculum, similar to that of the previous California and Massachusetts frameworks, can be sketched in a few short sentences. Addition with whole numbers (including the standard algorithm) is taught in first grade, subtraction in second grade, multiplication in third grade, and division in fourth grade. Thus, the study of whole number arithmetic is completed by the end of fourth grade. Grades five through seven focus on rational numbers (fractions, decimals, percentages), and grades eight through twelve study advanced mathematics. Proficiency is sought along three dimensions: 1) fluency with calculations, 2) conceptual understanding, 3) ability to solve problems.
Placing the CCSSM standard for knowing the standard algorithms of addition and subtraction in fourth grade delays this progression by two years. Placing the standard for the division algorithm in sixth grade continues the twoyear delay. For many fourth graders, time spent working on addition and subtraction will be wasted time. They already have a firm understanding of addition and subtraction. The same thing for many sixth graders—time devoted to the division algorithm will be wasted time that should be devoted to the study of rational numbers. The numerator in Carroll’s instructional time model will be greater than the denominator, indicating the inefficient allocation of time to instruction.
As Jason Zimba points out, not everyone agrees on when the standard algorithms should be taught, the alternative algorithms that should be taught, the manner in which any algorithm should be taught, or the amount of instructional time that should be spent on computational procedures. Such decisions are made by local educators. Variation in these decisions will introduce variation in the implementation of the math standards. It is true that standards, any standards, cannot control implementation, especially the twists and turns in how they are interpreted by educators and brought to life in classroom instruction. But in this case, the standards themselves are responsible for the myriad approaches, many unproductive, that we are sure to see as schools teach various algorithms under the Common Core.
[i] Tracking, ability grouping, differentiated learning, programmed learning, individualized instruction, and personalized learning (including today’s flipped classrooms) are all attempts to solve the challenge of student heterogeneity.
[ii] An earlier version of this post incorrectly stated that the California framework required that students know the standard algorithms for all four operations by the end of third grade. I regret the error.
[iii] Michael T. Battista (2001). “Research and Reform in Mathematics Education,” pp. 3284 in The Great Curriculum Debate: How Should We Teach Reading and Math? (T. Loveless, ed., Brookings Instiution Press).
Efforts are moving ahead to reauthorize the Elementary and Secondary Education Act (ESEA) and its broad parameters are becoming clearer. The bill is likely to keep an annual testing requirement, but change accountability structures created under No Child Left Behind (NCLB), allowing states to create their own. But, like NCLB, the new bill will shortchange older students and those at risk of dropping out of school. The Alliance for Excellent Education has called this phenomenon the “missing middle." Federal education spending is Ushaped, with large amounts spent on programs for young students and large amounts spent on college students in the form of student financial aid. The middle is missing, especially for students in secondary schools.
Recently, the press has heralded increases in the rate of high school students graduating on time; it’s now above 80 percent. The ontime graduation rate going up is welcome news, and perhaps NCLB played a role. But 20 percent of students still do not graduate on time. Imagine if one in five new cell phones didn’t work. Policymakers would be deluged by complaints and critics would be decrying the cavalier way the industry treats its customers.
Some students who do not graduate on time may ultimately graduate, or receive a General Education Development certificate. But many never graduate. The National Center for Education Statistics reports that there were about 14.5 million high school students in 2012, and that four percent of them stopped attending high school that year. Four percent seems small but the base is large, and together they imply about 580,000 students stopped attending school. They dropped out. To put this number in perspective, there were 584,000 deaths from cancer last year in the U.S.
Dropping out is a stubborn problem. Even as the number of teen mothers has declined sharply, and juvenile arrests likewise have declined, the dropout rate—the percent of students who stop attending school in a year—has only fallen gradually from six percent to four percent in the last forty years. And the dropout rate is not equal for all students. In 2009, it was two percent for white students, five percent for black students, and six percent for Hispanic students. It was 1.4 percent for students from highincome households and 7.4 percent—five times larger—for students from lowincome households.
The American economy over the past three decades has moved toward highskill jobs. There is no sign that this trend is abating. Yet more than half a million young people will stop attending high school this year without a diploma. They are not ready for college or a career and face a rough road ahead for jobs and earnings. Based on research by Cecilia Rouse, the National Center for Education Statistics estimated that dropouts will earn $630,000 less than graduates in their working lives. So, every year, the economy loses more than $300 hundred billion in foregone earnings. Even in a big country, this is a big number.
And dropouts are more likely to receive public assistance from the alphabet soup of programs: TANF, SNAP, WIC, CHIP, and so on. They also are more likely to end up in jail or prison, which is hugely expensive. Texas alone spent more than $3 billion on prisons in 2010.
The onetwo combination of lower earnings and more public assistance means it is very much in the public interest to reduce dropping out. The federal government began actively studying programs to lower the dropout rate starting with the HawkinsStafford bill in 1988. Those research efforts wrapped up in 1998 with the release of two reports prepared by Mathematica Policy Research for the U.S. Department of Education (here and here). The reports summarized findings from rigorous studies in 21 school districts (I was the principal investigator for these studies). Not many programs improved outcomes. It was a starting point and clearly there was more to learn.
The federal government followed that research with, well, almost nothing. The What Works Clearinghouse did release a practice guide about dropout prevention in 2008. Most of the research that the guide reviewed was from the nineties. And the WWC has reviewed research for 30 dropoutprevention programs. Many had been studied as part of the same federal evaluation and other reviews found that there is not enough research meeting standards to know whether programs were effective.
More recently, the National Center for Education Statistics surveyed school districts about their dropout prevention programs. It was the first survey NCES had conducted of dropout prevention programs. (The Government Accountability Office surveyed programs in 1987.) A lot was learned from the NCES survey, including that nearly every school district was providing services and programs for its students at risk of dropping out. For example, 99 percent of large districts (ones that enroll more than 10,000 students), offered alternative high school programs.
Though not reported in the NCES survey, it’s evident that districts are funding dropout prevention services with their own resources or state resources. Not much ESEA Title I money reaches secondary schools—only 32 percent of secondary schools get any Title I funds, and they get about $500 per lowincome student. Direct federal spending on dropout prevention is only about $5 million a year. That’s less than ten dollars per dropout.
“Early intervention” is the idea that preventing problems can be more effective than treating them. Following that logic might suggest focusing on preK and elementary schools so that fewer students struggle in middle and high schools, which can lead to dropping out. But early intervention spends money on students who may not develop problems. Waiting until issues develop can be costeffective because which students to help becomes evident. Not many first graders are experimenting with drugs or being arrested for delinquency, but when it happens with an eighth grader, the problem is clear.
Spending more money on preK or elementary schools may be desirable for other reasons, but if reducing the dropout rate is the focus, middle schools are the key. Students who enter high school poorly prepared and with bad habits—missing a lot of school, acting out, failing subjects—are known to have a high likelihood of dropping out. The Bush Institute currently is spearheading an initiative to promote middle school reform with the University of Texas. The Johns Hopkins University continues to refine its ‘Talent Development’ model for middle schools, but, overall, there are not many efforts related to improving middle schools in order to reduce dropping out. The National Dropout Prevention Network at Clemson University has accumulated other resources and links on what states and districts are doing to reduce the dropout rate.
The lack of federal attention to dropping out is highlighted by the “here today, gone tomorrow” nature of early warning systems. Early warning systems are databases that flag students showing signs of difficulty in school. Using early warning systems allows resources to be targeted to students who need help. In 2008, drafts of a reauthorized ESEA included support for districts to develop earlywarning systems. Recent drafts of ESEA do not mention them at all. Where did they go? The federal government already has granted more than $600 million to states to support their efforts to develop student databases. It would be straightforward for it to provide additional impetus to refine those systems so that they provide early warning capabilities.
Graduating from high school is a winwin opportunity to improve earnings and reduce public assistance. There are ample reasons for the federal government to play a role and not leave it up to states and localities to help students graduate or to study ways to promote graduation. With $300 billion a year on the table, even small reductions in the dropout rate are likely to pay handsome returns.
Of course, there are priorities in policymaking, but reducing dropping out should be one of them. Senator Warren introduced an amendment to ESEA that would direct additional resources to high schools with very high dropout rates. Focusing on middle schools will be a better use of funds.
Efforts are moving ahead to reauthorize the Elementary and Secondary Education Act (ESEA) and its broad parameters are becoming clearer. The bill is likely to keep an annual testing requirement, but change accountability structures created under No Child Left Behind (NCLB), allowing states to create their own. But, like NCLB, the new bill will shortchange older students and those at risk of dropping out of school. The Alliance for Excellent Education has called this phenomenon the “missing middle." Federal education spending is Ushaped, with large amounts spent on programs for young students and large amounts spent on college students in the form of student financial aid. The middle is missing, especially for students in secondary schools.
Recently, the press has heralded increases in the rate of high school students graduating on time; it’s now above 80 percent. The ontime graduation rate going up is welcome news, and perhaps NCLB played a role. But 20 percent of students still do not graduate on time. Imagine if one in five new cell phones didn’t work. Policymakers would be deluged by complaints and critics would be decrying the cavalier way the industry treats its customers.
Some students who do not graduate on time may ultimately graduate, or receive a General Education Development certificate. But many never graduate. The National Center for Education Statistics reports that there were about 14.5 million high school students in 2012, and that four percent of them stopped attending high school that year. Four percent seems small but the base is large, and together they imply about 580,000 students stopped attending school. They dropped out. To put this number in perspective, there were 584,000 deaths from cancer last year in the U.S.
Dropping out is a stubborn problem. Even as the number of teen mothers has declined sharply, and juvenile arrests likewise have declined, the dropout rate—the percent of students who stop attending school in a year—has only fallen gradually from six percent to four percent in the last forty years. And the dropout rate is not equal for all students. In 2009, it was two percent for white students, five percent for black students, and six percent for Hispanic students. It was 1.4 percent for students from highincome households and 7.4 percent—five times larger—for students from lowincome households.
The American economy over the past three decades has moved toward highskill jobs. There is no sign that this trend is abating. Yet more than half a million young people will stop attending high school this year without a diploma. They are not ready for college or a career and face a rough road ahead for jobs and earnings. Based on research by Cecilia Rouse, the National Center for Education Statistics estimated that dropouts will earn $630,000 less than graduates in their working lives. So, every year, the economy loses more than $300 hundred billion in foregone earnings. Even in a big country, this is a big number.
And dropouts are more likely to receive public assistance from the alphabet soup of programs: TANF, SNAP, WIC, CHIP, and so on. They also are more likely to end up in jail or prison, which is hugely expensive. Texas alone spent more than $3 billion on prisons in 2010.
The onetwo combination of lower earnings and more public assistance means it is very much in the public interest to reduce dropping out. The federal government began actively studying programs to lower the dropout rate starting with the HawkinsStafford bill in 1988. Those research efforts wrapped up in 1998 with the release of two reports prepared by Mathematica Policy Research for the U.S. Department of Education (here and here). The reports summarized findings from rigorous studies in 21 school districts (I was the principal investigator for these studies). Not many programs improved outcomes. It was a starting point and clearly there was more to learn.
The federal government followed that research with, well, almost nothing. The What Works Clearinghouse did release a practice guide about dropout prevention in 2008. Most of the research that the guide reviewed was from the nineties. And the WWC has reviewed research for 30 dropoutprevention programs. Many had been studied as part of the same federal evaluation and other reviews found that there is not enough research meeting standards to know whether programs were effective.
More recently, the National Center for Education Statistics surveyed school districts about their dropout prevention programs. It was the first survey NCES had conducted of dropout prevention programs. (The Government Accountability Office surveyed programs in 1987.) A lot was learned from the NCES survey, including that nearly every school district was providing services and programs for its students at risk of dropping out. For example, 99 percent of large districts (ones that enroll more than 10,000 students), offered alternative high school programs.
Though not reported in the NCES survey, it’s evident that districts are funding dropout prevention services with their own resources or state resources. Not much ESEA Title I money reaches secondary schools—only 32 percent of secondary schools get any Title I funds, and they get about $500 per lowincome student. Direct federal spending on dropout prevention is only about $5 million a year. That’s less than ten dollars per dropout.
“Early intervention” is the idea that preventing problems can be more effective than treating them. Following that logic might suggest focusing on preK and elementary schools so that fewer students struggle in middle and high schools, which can lead to dropping out. But early intervention spends money on students who may not develop problems. Waiting until issues develop can be costeffective because which students to help becomes evident. Not many first graders are experimenting with drugs or being arrested for delinquency, but when it happens with an eighth grader, the problem is clear.
Spending more money on preK or elementary schools may be desirable for other reasons, but if reducing the dropout rate is the focus, middle schools are the key. Students who enter high school poorly prepared and with bad habits—missing a lot of school, acting out, failing subjects—are known to have a high likelihood of dropping out. The Bush Institute currently is spearheading an initiative to promote middle school reform with the University of Texas. The Johns Hopkins University continues to refine its ‘Talent Development’ model for middle schools, but, overall, there are not many efforts related to improving middle schools in order to reduce dropping out. The National Dropout Prevention Network at Clemson University has accumulated other resources and links on what states and districts are doing to reduce the dropout rate.
The lack of federal attention to dropping out is highlighted by the “here today, gone tomorrow” nature of early warning systems. Early warning systems are databases that flag students showing signs of difficulty in school. Using early warning systems allows resources to be targeted to students who need help. In 2008, drafts of a reauthorized ESEA included support for districts to develop earlywarning systems. Recent drafts of ESEA do not mention them at all. Where did they go? The federal government already has granted more than $600 million to states to support their efforts to develop student databases. It would be straightforward for it to provide additional impetus to refine those systems so that they provide early warning capabilities.
Graduating from high school is a winwin opportunity to improve earnings and reduce public assistance. There are ample reasons for the federal government to play a role and not leave it up to states and localities to help students graduate or to study ways to promote graduation. With $300 billion a year on the table, even small reductions in the dropout rate are likely to pay handsome returns.
Of course, there are priorities in policymaking, but reducing dropping out should be one of them. Senator Warren introduced an amendment to ESEA that would direct additional resources to high schools with very high dropout rates. Focusing on middle schools will be a better use of funds.
Student refusals to take standardized tests surged in New York State this spring, fueled by support from both parent activists and the state teacher’s union. According to the New York Times, the optout movement more than doubled the number of students who did not take federally mandated math and English Language Arts (ELA) tests, with 165,000 kids—about one in six—not taking at least one of the tests.
These total numbers mask enormous variation across communities. According to the Times analysis, barely any students opted out in some schools districts, while in other districts a majority of students refused the tests. Are these differences in optout rates random, or are they associated with the characteristics of the community served by each district? Do optouts tend to be concentrated among relatively affluent districts, or are they most common in schools that have historically performed poorly on state tests?
The data needed to best answer these questions are the studentlevel testscore and demographic information collected by the New York State Department of Education. They will allow the state education department to conduct finegrained analyses, such as comparing the characteristics of students who took the tests to those who did not. But while we are waiting for the state to publish summary information based on those data, some light can be shed on these questions using publicly available data.
My primary data source for the following analysis is a table indicating the number of students who opted out of the math and ELA tests out in each school district this spring. The data were compiled by United to Counter the Core, an optout advocacy organization, from a combination of media stories, freedom of information requests, and reports by administrators, teachers, and parents. Data are available for most but not all districts, and there is likely some misreporting.^{[1] }However, to my knowledge, it is the most comprehensive optout dataset currently publicly available.
I merged the optout information with the most recent available enrollment and demographic data from the U.S. Department of Education’s Common Core of Data.^{[2]} I use total enrollment in grades 38 to estimate the percentage of students who opted out (i.e. the number of optouts, which are presumably for tests in grades 38, divided by the number of students enrolled in those grades).^{[3]} I also calculate the percentage of students in all grades who were eligible for the federal free or reducedprice lunch program, an indicator of socioeconomic disadvantage. Finally, I obtained average student scores on the New York State tests last spring (2014), before optout became a widespread phenomenon.^{[4]}
The 648 districts with complete data available had an average optout rate of 28 percent (the rates are averaged across the math and ELA tests). But weighting each district by its enrollment shows that an estimated 21 percent of all students at these districts opted out. The difference between these numbers implies that larger districts tend to have lower optout rates.
The table below confirms that optout rates vary widely across districts, with 19 percent of districts having an optout rate below 10 percent, 30 percent of districts in the 1025 percent range, 38 percent in the 2550 percent category, and 13 percent seeing a majority of students opt out. Districts with higher optout rates tend to serve fewer disadvantaged students and have somewhat higher test scores (which is not surprising given the correlation between family income and test scores). District size is similar across three of the optout rate categories, but districts with the lowest optout rates tend to be substantially larger, on average.
District Characteristics, by OptOut Rate
Notes: Average test scores are reported in studentlevel standard deviations.
But no single student characteristic is a perfect predictor of optout rates. The figure below plots the optout rate and free/reduced lunch percentage of every district. There is a clear association, with more disadvantaged districts having lower optout rates, on average, but also a large amount of variation in the optout rate among districts with similar shares of students eligible for the subsidized lunch program. Could variation in other factors such as testscore performance explain some of those remaining differences?
Relationship between Opting Out and Percent Free/Reduced Lunch
Notes: Line is based on a Lowess smoother.
I address this question using a regression analysis that measures the degree to which percent free/reduced lunch and average test scores are associated with optout, controlling for the other factor. In order to make the two relationships comparable, I report the predicted change in optout rates expected based on a one standard deviation change in percent free/reduced lunch or average test scores. Among the districts in the data, one standard deviation in percent free/reduced lunch is 21 percentage points and one standard deviation in average test scores is 0.35 studentlevel standard deviations.
Perhaps the most surprising result of the analysis, reported in the figure below, is that the modest positive correlation between test scores and optout seen in the table above becomes negative once free/reduced lunch is taken into account.^{[5]} The results below indicate that a one standard deviation increase in test scores is associated with a sevenpercentagepoint decline in the optout rate. That is a large change given that the average optout rate is 28 percent. Another way to describe the same relationship is that districts with lower scores have higher optout rates.
This analysis confirms that districts serving more disadvantaged students have lower optout rates, even after test scores are taken into account. A one standard deviation increase in the share of students eligible for free/reduced lunch is associated with an 11percentagepoint decrease in the optout rate. These relationships are even stronger when districts are weighted proportional to their enrollment, as shown in the right pair of bars in the figure below. This may be because these relationships are stronger in larger districts, or because optout is measured with less error in larger districts.
Correlation between District Characteristics and OptOut Rates
These findings should be interpreted with two caveats in mind. First, the data are incomplete, preliminary, and likely suffer from some reporting errors. Second, the data are at the district level, limiting the ability to make inferences about individual students. For example, just because lowerscoring districts have higher optout rates (controlling for free/reduced lunch) does not mean that lowerscoring students are more likely to opt out. It could be the higherscoring students in those districts that are doing the opting out.
Studentlevel administrative data collected by the state will ultimately provide conclusive evidence not subject to these limitations, but the districtlevel data currently available yield two preliminary conclusions. First, relatively affluent districts tend to have higher optout rates, with optout less common in the disadvantaged districts that are often the target of reform efforts. Second, districts with lower test scores have higher optout rates after taking socioeconomic status into account. Potential explanation for this pattern include district administrators encouraging optouts in order to cover up poor performance, districts focusing on nontested subjects to satisfy parents who care less about standardized tests, and parents becoming more skeptical of the value of tests when their children do not score well. Rigorous testing of these and alternative explanation for optout behavior await better data.
[1] The number of optouts for math or ELA is available for 649 out of the 714 school districts on the United to Counter the Core list, which is listed as being current as of May 14, 2015.
[2] Specifically, I use schoollevel data from 201112 and aggregate it to the district level. Note that these data are three years old relative to the current school year, which will introduce some measurement error into the analysis to the extent that district enrollment and demographics have changed over the past three years.
[3] The number of math and ELA optouts are highly correlated (r=.99), so I average them when data for both subjects are present (for districts where optout is only reported for one subject, I use that data point).
[4] I convert scale scores to studentlevel standard deviation units using the mean of 300 and standard deviation of 35 reported in the 2013 technical report. I then average the scores across all grades in the district, using the number of students tested in each grade as the weights. Math and ELA scores are highly correlated at the district level (r=0.94), so I average them.
Student refusals to take standardized tests surged in New York State this spring, fueled by support from both parent activists and the state teacher’s union. According to the New York Times, the optout movement more than doubled the number of students who did not take federally mandated math and English Language Arts (ELA) tests, with 165,000 kids—about one in six—not taking at least one of the tests.
These total numbers mask enormous variation across communities. According to the Times analysis, barely any students opted out in some schools districts, while in other districts a majority of students refused the tests. Are these differences in optout rates random, or are they associated with the characteristics of the community served by each district? Do optouts tend to be concentrated among relatively affluent districts, or are they most common in schools that have historically performed poorly on state tests?
The data needed to best answer these questions are the studentlevel testscore and demographic information collected by the New York State Department of Education. They will allow the state education department to conduct finegrained analyses, such as comparing the characteristics of students who took the tests to those who did not. But while we are waiting for the state to publish summary information based on those data, some light can be shed on these questions using publicly available data.
My primary data source for the following analysis is a table indicating the number of students who opted out of the math and ELA tests out in each school district this spring. The data were compiled by United to Counter the Core, an optout advocacy organization, from a combination of media stories, freedom of information requests, and reports by administrators, teachers, and parents. Data are available for most but not all districts, and there is likely some misreporting.^{[1] }However, to my knowledge, it is the most comprehensive optout dataset currently publicly available.
I merged the optout information with the most recent available enrollment and demographic data from the U.S. Department of Education’s Common Core of Data.^{[2]} I use total enrollment in grades 38 to estimate the percentage of students who opted out (i.e. the number of optouts, which are presumably for tests in grades 38, divided by the number of students enrolled in those grades).^{[3]} I also calculate the percentage of students in all grades who were eligible for the federal free or reducedprice lunch program, an indicator of socioeconomic disadvantage. Finally, I obtained average student scores on the New York State tests last spring (2014), before optout became a widespread phenomenon.^{[4]}
The 648 districts with complete data available had an average optout rate of 28 percent (the rates are averaged across the math and ELA tests). But weighting each district by its enrollment shows that an estimated 21 percent of all students at these districts opted out. The difference between these numbers implies that larger districts tend to have lower optout rates.
The table below confirms that optout rates vary widely across districts, with 19 percent of districts having an optout rate below 10 percent, 30 percent of districts in the 1025 percent range, 38 percent in the 2550 percent category, and 13 percent seeing a majority of students opt out. Districts with higher optout rates tend to serve fewer disadvantaged students and have somewhat higher test scores (which is not surprising given the correlation between family income and test scores). District size is similar across three of the optout rate categories, but districts with the lowest optout rates tend to be substantially larger, on average.
District Characteristics, by OptOut Rate
Notes: Average test scores are reported in studentlevel standard deviations.
But no single student characteristic is a perfect predictor of optout rates. The figure below plots the optout rate and free/reduced lunch percentage of every district. There is a clear association, with more disadvantaged districts having lower optout rates, on average, but also a large amount of variation in the optout rate among districts with similar shares of students eligible for the subsidized lunch program. Could variation in other factors such as testscore performance explain some of those remaining differences?
Relationship between Opting Out and Percent Free/Reduced Lunch
Notes: Line is based on a Lowess smoother.
I address this question using a regression analysis that measures the degree to which percent free/reduced lunch and average test scores are associated with optout, controlling for the other factor. In order to make the two relationships comparable, I report the predicted change in optout rates expected based on a one standard deviation change in percent free/reduced lunch or average test scores. Among the districts in the data, one standard deviation in percent free/reduced lunch is 21 percentage points and one standard deviation in average test scores is 0.35 studentlevel standard deviations.
Perhaps the most surprising result of the analysis, reported in the figure below, is that the modest positive correlation between test scores and optout seen in the table above becomes negative once free/reduced lunch is taken into account.^{[5]} The results below indicate that a one standard deviation increase in test scores is associated with a sevenpercentagepoint decline in the optout rate. That is a large change given that the average optout rate is 28 percent. Another way to describe the same relationship is that districts with lower scores have higher optout rates.
This analysis confirms that districts serving more disadvantaged students have lower optout rates, even after test scores are taken into account. A one standard deviation increase in the share of students eligible for free/reduced lunch is associated with an 11percentagepoint decrease in the optout rate. These relationships are even stronger when districts are weighted proportional to their enrollment, as shown in the right pair of bars in the figure below. This may be because these relationships are stronger in larger districts, or because optout is measured with less error in larger districts.
Correlation between District Characteristics and OptOut Rates
These findings should be interpreted with two caveats in mind. First, the data are incomplete, preliminary, and likely suffer from some reporting errors. Second, the data are at the district level, limiting the ability to make inferences about individual students. For example, just because lowerscoring districts have higher optout rates (controlling for free/reduced lunch) does not mean that lowerscoring students are more likely to opt out. It could be the higherscoring students in those districts that are doing the opting out.
Studentlevel administrative data collected by the state will ultimately provide conclusive evidence not subject to these limitations, but the districtlevel data currently available yield two preliminary conclusions. First, relatively affluent districts tend to have higher optout rates, with optout less common in the disadvantaged districts that are often the target of reform efforts. Second, districts with lower test scores have higher optout rates after taking socioeconomic status into account. Potential explanation for this pattern include district administrators encouraging optouts in order to cover up poor performance, districts focusing on nontested subjects to satisfy parents who care less about standardized tests, and parents becoming more skeptical of the value of tests when their children do not score well. Rigorous testing of these and alternative explanation for optout behavior await better data.
[1] The number of optouts for math or ELA is available for 649 out of the 714 school districts on the United to Counter the Core list, which is listed as being current as of May 14, 2015.
[2] Specifically, I use schoollevel data from 201112 and aggregate it to the district level. Note that these data are three years old relative to the current school year, which will introduce some measurement error into the analysis to the extent that district enrollment and demographics have changed over the past three years.
[3] The number of math and ELA optouts are highly correlated (r=.99), so I average them when data for both subjects are present (for districts where optout is only reported for one subject, I use that data point).
[4] I convert scale scores to studentlevel standard deviation units using the mean of 300 and standard deviation of 35 reported in the 2013 technical report. I then average the scores across all grades in the district, using the number of students tested in each grade as the weights. Math and ELA scores are highly correlated at the district level (r=0.94), so I average them.
The Best Foot Forward project at the Center for Education Policy Research at Harvard has been investigating the use of digital video to make classroom observations more helpful and fair to teachers and less burdensome for supervisors. In a randomized field trial involving 347 teachers and 108 administrators in Delaware, Georgia, Colorado and Los Angeles, teachers were given a special video camera and invited to collect multiple lessons. They could then choose a subset of their lesson videos to submit for their classroom observations. A secure software platform allowed administrators as well as external observers (selected for their expertise in a teacher’s discipline) to watch the videos and provide timestamped comments aligned to specific moments in the videos.
In addition to giving teachers a reason and an opportunity to watch multiple instances of their own teaching, the videos served as the basis for oneonone discussions between teachers and administrators and between teachers and the external content experts. The comparison teachers and schools continued to do inperson classroom observations. Although we’re awaiting data from a second year of implementation, we can report five preliminary findings so far:
In sum, the use of teachercollected video in classroom observations did seem to improve the classroom observation process along a number of dimensions: it boosted teachers’ perception of fairness of classroom observations, reduced teacher defensiveness during postobservation conferences, led to greater self perception of the need for behavior change and allowed administrators to timeshift observation duties to quieter times of the day or week. In coming months, we will provide evidence on whether or not these apparent improvements in the observation process were sufficient to generate improvements in student achievement.
For more on the project, please see: http://cepr.harvard.edu/bestfootforwardproject.
The Best Foot Forward project at the Center for Education Policy Research at Harvard has been investigating the use of digital video to make classroom observations more helpful and fair to teachers and less burdensome for supervisors. In a randomized field trial involving 347 teachers and 108 administrators in Delaware, Georgia, Colorado and Los Angeles, teachers were given a special video camera and invited to collect multiple lessons. They could then choose a subset of their lesson videos to submit for their classroom observations. A secure software platform allowed administrators as well as external observers (selected for their expertise in a teacher’s discipline) to watch the videos and provide timestamped comments aligned to specific moments in the videos.
In addition to giving teachers a reason and an opportunity to watch multiple instances of their own teaching, the videos served as the basis for oneonone discussions between teachers and administrators and between teachers and the external content experts. The comparison teachers and schools continued to do inperson classroom observations. Although we’re awaiting data from a second year of implementation, we can report five preliminary findings so far:
In sum, the use of teachercollected video in classroom observations did seem to improve the classroom observation process along a number of dimensions: it boosted teachers’ perception of fairness of classroom observations, reduced teacher defensiveness during postobservation conferences, led to greater self perception of the need for behavior change and allowed administrators to timeshift observation duties to quieter times of the day or week. In coming months, we will provide evidence on whether or not these apparent improvements in the observation process were sufficient to generate improvements in student achievement.
For more on the project, please see: http://cepr.harvard.edu/bestfootforwardproject.
June 10, 2015
2:00 PM  3:00 PM EDT
Saul/Zilkha Rooms
Brookings Institution
1775 Massachusetts Avenue NW
Washington, DC 20036
The two most significant laws governing education in this nation, the Elementary and Secondary Education Act (ESEA) and the Higher Education Act (HEA), are both well past due for a congressional update. ESEA was last reauthorized in 2002, with the passage of the No Child Left Behind Act, and the latest iteration of HEA was authorized in 2008. Despite widespread consensus that these laws need to be rewritten, every attempt at reauthorization in recent years has been met with failure. Recent progress suggests that the current Congress may be poised to succeed where others have failed. But can Americans really expect new education laws this year?
On Wednesday, June 10, the Brown Center on Education Policy at Brookings took an insiders’ look at the efforts to reauthorize ESEA and HEA. Expert panelists who occupied key positions as legislative staffers during previous reauthorization attempted to provide an honest examination of the process, what led to success or failure in the past, and what’s most likely to lead to success this time around.
Join the conversation on Twitter at #EdBills
June 10, 2015
2:00 PM  3:00 PM EDT
Saul/Zilkha Rooms
Brookings Institution
1775 Massachusetts Avenue NW
Washington, DC 20036
The two most significant laws governing education in this nation, the Elementary and Secondary Education Act (ESEA) and the Higher Education Act (HEA), are both well past due for a congressional update. ESEA was last reauthorized in 2002, with the passage of the No Child Left Behind Act, and the latest iteration of HEA was authorized in 2008. Despite widespread consensus that these laws need to be rewritten, every attempt at reauthorization in recent years has been met with failure. Recent progress suggests that the current Congress may be poised to succeed where others have failed. But can Americans really expect new education laws this year?
On Wednesday, June 10, the Brown Center on Education Policy at Brookings took an insiders’ look at the efforts to reauthorize ESEA and HEA. Expert panelists who occupied key positions as legislative staffers during previous reauthorization attempted to provide an honest examination of the process, what led to success or failure in the past, and what’s most likely to lead to success this time around.
Join the conversation on Twitter at #EdBills
This post continues a series begun in 2014 on implementing the Common Core State Standards (CCSS). The first installment introduced an analytical scheme investigating CCSS implementation along four dimensions: curriculum, instruction, assessment, and accountability. Three posts focused on curriculum. This post turns to instruction. Although the impact of CCSS on how teachers teach is discussed, the post is also concerned with the inverse relationship, how decisions that teachers make about instruction shape the implementation of CCSS.
A couple of points before we get started. The previous posts on curriculum led readers from the upper levels of the educational system—federal and state policies—down to curricular decisions made “in the trenches”—in districts, schools, and classrooms. Standards emanate from the top of the system and are produced by politicians, policymakers, and experts. Curricular decisions are shared across education’s systemic levels. Instruction, on the other hand, is dominated by practitioners. The daily decisions that teachers make about how to teach under CCSS—and not the idealizations of instruction embraced by upperlevel authorities—will ultimately determine what “CCSS instruction” really means.
I ended the last post on CCSS by describing how curriculum and instruction can be so closely intertwined that the boundary between them is blurred. Sometimes stating a precise curricular objective dictates, or at least constrains, the range of instructional strategies that teachers may consider. That post focused on EnglishLanguage Arts. The current post focuses on mathematics in the elementary grades and describes examples of how CCSS will shape math instruction. As a former elementary school teacher, I offer my own personal opinion on these effects.
Certain aspects of the Common Core, when implemented, are likely to have a positive impact on the instruction of mathematics. For example, Common Core stresses that students recognize fractions as numbers on a number line. The emphasis begins in third grade:
CCSS.MATH.CONTENT.3.NF.A.2
Understand a fraction as a number on the number line; represent fractions on a number line diagram.
CCSS.MATH.CONTENT.3.NF.A.2.A
Represent a fraction 1/b on a number line diagram by defining the interval from 0 to 1 as the whole and partitioning it into b equal parts. Recognize that each part has size 1/b and that the endpoint of the part based at 0 locates the number 1/b on the number line.
CCSS.MATH.CONTENT.3.NF.A.2.B
Represent a fraction a/b on a number line diagram by marking off a lengths 1/b from 0. Recognize that the resulting interval has size a/b and that its endpoint locates the number a/b on the number line.
When I first read this section of the Common Core standards, I stood up and cheered. Berkeley mathematician HungHsi Wu has been working with teachers for years to get them to understand the importance of using number lines in teaching fractions.^{[1]} American textbooks rely heavily on partwhole representations to introduce fractions. Typically, students see pizzas and apples and other objects—typically other foods or money—that are divided up into equal parts. Such models are limited. They work okay with simple addition and subtraction. Common denominators present a bit of a challenge, but ½ pizza can be shown to be also 2/4, a half dollar equal to two quarters, and so on.
With multiplication and division, all the little tricks students learned with whole number arithmetic suddenly go haywire. Students are accustomed to the fact that multiplying two whole numbers yields a product that is larger than either number being multiplied: 4 X 5 = 20 and 20 is larger than both 4 and 5.^{[2]} How in the world can ¼ X 1/5 = 1/20, a number much smaller than either 1/4or 1/5? The partwhole representation has convinced many students that fractions are not numbers. Instead, they are seen as strange expressions comprising two numbers with a small horizontal bar separating them.
I taught sixth grade but occasionally visited my colleagues’ classes in the lower grades. I recall one exchange with second or third graders that went something like this:
“Give me a number between seven and nine.” Giggles.
“Eight!” they shouted.
“Give me a number between two and three.” Giggles.
“There isn’t one!” they shouted.
“Really?” I’d ask and draw a number line. After spending some time placing whole numbers on the number line, I’d observe, “There’s a lot of space between two and three. Is it just empty?”
Silence. Puzzled little faces. Then a quiet voice. “Two and a half?”
You have no idea how many children do not make the transition to understanding fractions as numbers and because of stumbling at this crucial stage, spend the rest of their careers as students of mathematics convinced that fractions are an impenetrable mystery. And that’s not true of just students. California adopted a test for teachers in the 1980s, the California Basic Educational Skills Test (CBEST). Beginning in 1982, even teachers already in the classroom had to pass it. I made a nice afterschool and summer income tutoring colleagues who didn’t know fractions from Fermat’s Last Theorem. To be fair, primary teachers, teaching kindergarten or grades 12, would not teach fractions as part of their math curriculum and probably hadn’t worked with a fraction in decades. So they are no different than nonliterary types who think Hamlet is just a play about a young guy who can’t make up his mind, has a weird relationship with his mother, and winds up dying at the end.
Division is the most difficult operation to grasp for those arrested at the partwhole stage of understanding fractions. A problem that Liping Ma posed to teachers is now legendary.^{[3]}
She asked small groups of American and Chinese elementary teachers to divide 1 ¾ by ½ and to create a word problem that illustrates the calculation. All 72 Chinese teachers gave the correct answer and 65 developed an appropriate word problem. Only nine of the 23 American teachers solved the problem correctly. A single American teacher was able to devise an appropriate word problem. Granted, the American sample was not selected to be representative of American teachers as a whole, but the stark findings of the exercise did not shock anyone who has worked closely with elementary teachers in the U.S. They are often weak at math. Many of the teachers in Ma’s study had vague ideas of an “invert and multiply” rule but lacked a conceptual understanding of why it worked.
A linguistic convention exacerbates the difficulty. Students may cling to the mistaken notion that “dividing in half” means “dividing by onehalf.” It does not. Dividing in half means dividing by two. The number line can help clear up such confusion. Consider a basic, wholenumber division problem for which third graders will already know the answer: 8 divided by 2 equals 4. It is evident that a segment 8 units in length (measured from 0 to 8) is divided by a segment 2 units in length (measured from 0 to 2) exactly 4 times. Modeling 12 divided by 2 and other basic facts with 2 as a divisor will convince students that whole number division works quite well on a number line.
Now consider the number ½ as a divisor. It will become clear to students that 8 divided by ½ equals 16, and they can illustrate that fact on a number line by showing how a segment ½ units in length divides a segment 8 units in length exactly 16 times; it divides a segment 12 units in length 24 times; and so on. Students will be relieved to discover that on a number line division with fractions works the same as division with whole numbers.
Now, let’s return to Liping Ma’s problem: 1 ¾ divided by ½. This problem would not be presented in third grade, but it might be in fifth or sixth grades. Students who have been working with fractions on a number line for two or three years will have little trouble solving it. They will see that the problem simply asks them to divide a line segment of 1 3/4 units by a segment of ½ units. The answer is 3 ½ . Some students might estimate that the solution is between 3 and 4 because 1 ¾ lies between 1 ½ and 2, which on the number line are the points at which the ½ unit segment, laid end on end, falls exactly three and four times. Other students will have learned about reciprocals and that multiplication and division are inverse operations. They will immediately grasp that dividing by ½ is the same as multiplying by 2—and since 1 ¾ x 2 = 3 ½, that is the answer. Creating a word problem involving string or rope or some other linearly measured object is also surely within their grasp.
I applaud the CCSS for introducing number lines and fractions in third grade. I believe it will instill in children an important idea: fractions are numbers. That foundational understanding will aid them as they work with more abstract representations of fractions in later grades. Fractions are a monumental barrier for kids who struggle with math, so the significance of this contribution should not be underestimated.
I mentioned above that instruction and curriculum are often intertwined. I began this series of posts by defining curriculum as the “stuff” of learning—the content of what is taught in school, especially as embodied in the materials used in instruction. Instruction refers to the “how” of teaching—how teachers organize, present, and explain those materials. It’s each teacher’s repertoire of instructional strategies and techniques that differentiates one teacher from another even as they teach the same content. Choosing to use a number line to teach fractions is obviously an instructional decision, but it also involves curriculum. The number line is mathematical content, not just a teaching tool.
Guiding third grade teachers towards using a number line does not guarantee effective instruction. In fact, it is reasonable to expect variation in how teachers will implement the CCSS standards listed above. A small body of research exists to guide practice. One of the best resources for teachers to consult is a practice guide published by the What Works Clearinghouse: Developing Effective Fractions Instruction for Kindergarten Through Eighth Grade (see full disclosure below).^{[4] } The guide recommends the use of number lines as its second recommendation, but it also states that the evidence supporting the effectiveness of number lines in teaching fractions is inferred from studies involving whole numbers and decimals. We need much more research on how and when number lines should be used in teaching fractions.
Professor Wu states the following, “The shift of emphasis from models of a fraction in the initial stage to an almost exclusive model of a fraction as a point on the number line can be done gradually and gracefully beginning somewhere in grade four. This shift is implicit in the Common Core Standards.”^{[5]} I agree, but the shift is also subtle. CCSS standards include the use of other representations—fraction strips, fraction bars, rectangles (which are excellent for showing multiplication of two fractions) and other graphical means of modeling fractions. Some teachers will manage the shift to number lines adroitly—and others will not. As a consequence, the quality of implementation will vary from classroom to classroom based on the instructional decisions that teachers make.
The current post has focused on what I believe to be a positive aspect of CCSS based on the implementation of the standards through instruction. Future posts in the series—covering the “bad” and the “ugly”—will describe aspects of instruction on which I am less optimistic.
[1] See H. Wu (2014). “Teaching Fractions According to the Common Core Standards,” https://math.berkeley.edu/~wu/CCSSFractions_1.pdf. Also see "What's Sophisticated about Elementary Mathematics?" http://www.aft.org/sites/default/files/periodicals/wu_0.pdf
[2] Students learn that 0 and 1 are exceptions and have their own special rules in multiplication.
[3] Liping Ma, Knowing and Teaching Elementary Mathematics.
[4] The practice guide can be found at: http://ies.ed.gov/ncee/wwc/pdf/practice_guides/fractions_pg_093010.pdf I serve as a content expert in elementary mathematics for the What Works Clearinghouse. I had nothing to do, however, with the publication cited.
[5] Wu, page 3.
This post continues a series begun in 2014 on implementing the Common Core State Standards (CCSS). The first installment introduced an analytical scheme investigating CCSS implementation along four dimensions: curriculum, instruction, assessment, and accountability. Three posts focused on curriculum. This post turns to instruction. Although the impact of CCSS on how teachers teach is discussed, the post is also concerned with the inverse relationship, how decisions that teachers make about instruction shape the implementation of CCSS.
A couple of points before we get started. The previous posts on curriculum led readers from the upper levels of the educational system—federal and state policies—down to curricular decisions made “in the trenches”—in districts, schools, and classrooms. Standards emanate from the top of the system and are produced by politicians, policymakers, and experts. Curricular decisions are shared across education’s systemic levels. Instruction, on the other hand, is dominated by practitioners. The daily decisions that teachers make about how to teach under CCSS—and not the idealizations of instruction embraced by upperlevel authorities—will ultimately determine what “CCSS instruction” really means.
I ended the last post on CCSS by describing how curriculum and instruction can be so closely intertwined that the boundary between them is blurred. Sometimes stating a precise curricular objective dictates, or at least constrains, the range of instructional strategies that teachers may consider. That post focused on EnglishLanguage Arts. The current post focuses on mathematics in the elementary grades and describes examples of how CCSS will shape math instruction. As a former elementary school teacher, I offer my own personal opinion on these effects.
Certain aspects of the Common Core, when implemented, are likely to have a positive impact on the instruction of mathematics. For example, Common Core stresses that students recognize fractions as numbers on a number line. The emphasis begins in third grade:
CCSS.MATH.CONTENT.3.NF.A.2
Understand a fraction as a number on the number line; represent fractions on a number line diagram.
CCSS.MATH.CONTENT.3.NF.A.2.A
Represent a fraction 1/b on a number line diagram by defining the interval from 0 to 1 as the whole and partitioning it into b equal parts. Recognize that each part has size 1/b and that the endpoint of the part based at 0 locates the number 1/b on the number line.
CCSS.MATH.CONTENT.3.NF.A.2.B
Represent a fraction a/b on a number line diagram by marking off a lengths 1/b from 0. Recognize that the resulting interval has size a/b and that its endpoint locates the number a/b on the number line.
When I first read this section of the Common Core standards, I stood up and cheered. Berkeley mathematician HungHsi Wu has been working with teachers for years to get them to understand the importance of using number lines in teaching fractions.^{[1]} American textbooks rely heavily on partwhole representations to introduce fractions. Typically, students see pizzas and apples and other objects—typically other foods or money—that are divided up into equal parts. Such models are limited. They work okay with simple addition and subtraction. Common denominators present a bit of a challenge, but ½ pizza can be shown to be also 2/4, a half dollar equal to two quarters, and so on.
With multiplication and division, all the little tricks students learned with whole number arithmetic suddenly go haywire. Students are accustomed to the fact that multiplying two whole numbers yields a product that is larger than either number being multiplied: 4 X 5 = 20 and 20 is larger than both 4 and 5.^{[2]} How in the world can ¼ X 1/5 = 1/20, a number much smaller than either 1/4or 1/5? The partwhole representation has convinced many students that fractions are not numbers. Instead, they are seen as strange expressions comprising two numbers with a small horizontal bar separating them.
I taught sixth grade but occasionally visited my colleagues’ classes in the lower grades. I recall one exchange with second or third graders that went something like this:
“Give me a number between seven and nine.” Giggles.
“Eight!” they shouted.
“Give me a number between two and three.” Giggles.
“There isn’t one!” they shouted.
“Really?” I’d ask and draw a number line. After spending some time placing whole numbers on the number line, I’d observe, “There’s a lot of space between two and three. Is it just empty?”
Silence. Puzzled little faces. Then a quiet voice. “Two and a half?”
You have no idea how many children do not make the transition to understanding fractions as numbers and because of stumbling at this crucial stage, spend the rest of their careers as students of mathematics convinced that fractions are an impenetrable mystery. And that’s not true of just students. California adopted a test for teachers in the 1980s, the California Basic Educational Skills Test (CBEST). Beginning in 1982, even teachers already in the classroom had to pass it. I made a nice afterschool and summer income tutoring colleagues who didn’t know fractions from Fermat’s Last Theorem. To be fair, primary teachers, teaching kindergarten or grades 12, would not teach fractions as part of their math curriculum and probably hadn’t worked with a fraction in decades. So they are no different than nonliterary types who think Hamlet is just a play about a young guy who can’t make up his mind, has a weird relationship with his mother, and winds up dying at the end.
Division is the most difficult operation to grasp for those arrested at the partwhole stage of understanding fractions. A problem that Liping Ma posed to teachers is now legendary.^{[3]}
She asked small groups of American and Chinese elementary teachers to divide 1 ¾ by ½ and to create a word problem that illustrates the calculation. All 72 Chinese teachers gave the correct answer and 65 developed an appropriate word problem. Only nine of the 23 American teachers solved the problem correctly. A single American teacher was able to devise an appropriate word problem. Granted, the American sample was not selected to be representative of American teachers as a whole, but the stark findings of the exercise did not shock anyone who has worked closely with elementary teachers in the U.S. They are often weak at math. Many of the teachers in Ma’s study had vague ideas of an “invert and multiply” rule but lacked a conceptual understanding of why it worked.
A linguistic convention exacerbates the difficulty. Students may cling to the mistaken notion that “dividing in half” means “dividing by onehalf.” It does not. Dividing in half means dividing by two. The number line can help clear up such confusion. Consider a basic, wholenumber division problem for which third graders will already know the answer: 8 divided by 2 equals 4. It is evident that a segment 8 units in length (measured from 0 to 8) is divided by a segment 2 units in length (measured from 0 to 2) exactly 4 times. Modeling 12 divided by 2 and other basic facts with 2 as a divisor will convince students that whole number division works quite well on a number line.
Now consider the number ½ as a divisor. It will become clear to students that 8 divided by ½ equals 16, and they can illustrate that fact on a number line by showing how a segment ½ units in length divides a segment 8 units in length exactly 16 times; it divides a segment 12 units in length 24 times; and so on. Students will be relieved to discover that on a number line division with fractions works the same as division with whole numbers.
Now, let’s return to Liping Ma’s problem: 1 ¾ divided by ½. This problem would not be presented in third grade, but it might be in fifth or sixth grades. Students who have been working with fractions on a number line for two or three years will have little trouble solving it. They will see that the problem simply asks them to divide a line segment of 1 3/4 units by a segment of ½ units. The answer is 3 ½ . Some students might estimate that the solution is between 3 and 4 because 1 ¾ lies between 1 ½ and 2, which on the number line are the points at which the ½ unit segment, laid end on end, falls exactly three and four times. Other students will have learned about reciprocals and that multiplication and division are inverse operations. They will immediately grasp that dividing by ½ is the same as multiplying by 2—and since 1 ¾ x 2 = 3 ½, that is the answer. Creating a word problem involving string or rope or some other linearly measured object is also surely within their grasp.
I applaud the CCSS for introducing number lines and fractions in third grade. I believe it will instill in children an important idea: fractions are numbers. That foundational understanding will aid them as they work with more abstract representations of fractions in later grades. Fractions are a monumental barrier for kids who struggle with math, so the significance of this contribution should not be underestimated.
I mentioned above that instruction and curriculum are often intertwined. I began this series of posts by defining curriculum as the “stuff” of learning—the content of what is taught in school, especially as embodied in the materials used in instruction. Instruction refers to the “how” of teaching—how teachers organize, present, and explain those materials. It’s each teacher’s repertoire of instructional strategies and techniques that differentiates one teacher from another even as they teach the same content. Choosing to use a number line to teach fractions is obviously an instructional decision, but it also involves curriculum. The number line is mathematical content, not just a teaching tool.
Guiding third grade teachers towards using a number line does not guarantee effective instruction. In fact, it is reasonable to expect variation in how teachers will implement the CCSS standards listed above. A small body of research exists to guide practice. One of the best resources for teachers to consult is a practice guide published by the What Works Clearinghouse: Developing Effective Fractions Instruction for Kindergarten Through Eighth Grade (see full disclosure below).^{[4] } The guide recommends the use of number lines as its second recommendation, but it also states that the evidence supporting the effectiveness of number lines in teaching fractions is inferred from studies involving whole numbers and decimals. We need much more research on how and when number lines should be used in teaching fractions.
Professor Wu states the following, “The shift of emphasis from models of a fraction in the initial stage to an almost exclusive model of a fraction as a point on the number line can be done gradually and gracefully beginning somewhere in grade four. This shift is implicit in the Common Core Standards.”^{[5]} I agree, but the shift is also subtle. CCSS standards include the use of other representations—fraction strips, fraction bars, rectangles (which are excellent for showing multiplication of two fractions) and other graphical means of modeling fractions. Some teachers will manage the shift to number lines adroitly—and others will not. As a consequence, the quality of implementation will vary from classroom to classroom based on the instructional decisions that teachers make.
The current post has focused on what I believe to be a positive aspect of CCSS based on the implementation of the standards through instruction. Future posts in the series—covering the “bad” and the “ugly”—will describe aspects of instruction on which I am less optimistic.
[1] See H. Wu (2014). “Teaching Fractions According to the Common Core Standards,” https://math.berkeley.edu/~wu/CCSSFractions_1.pdf. Also see "What's Sophisticated about Elementary Mathematics?" http://www.aft.org/sites/default/files/periodicals/wu_0.pdf
[2] Students learn that 0 and 1 are exceptions and have their own special rules in multiplication.
[3] Liping Ma, Knowing and Teaching Elementary Mathematics.
[4] The practice guide can be found at: http://ies.ed.gov/ncee/wwc/pdf/practice_guides/fractions_pg_093010.pdf I serve as a content expert in elementary mathematics for the What Works Clearinghouse. I had nothing to do, however, with the publication cited.
[5] Wu, page 3.
Current drafts of the reauthorized Elementary and Secondary Education Act (ESEA) fall short of a commitment to use research to improve education. The bills—the “Student Success Act” in the House and the “Every Child Achieves Act” in the Senate—no doubt represent compromises and tradeoffs as any major legislation would. But who is arguing for less research and innovation in education?
Much of what is debated about No Child Left Behind is its accountability structure—annual tests, “annual yearly progress,” and the goal of moving every student to proficiency by 2014. But another important theme in NCLB was using “scientificallybased research.” Its steady drumbeat of “use research, use evidence, use scientific methods,” represented an embrace of education research and especially the practice of using causal methods to study program effectiveness. NCLB did not go as far as requiring research evidence as a basis for program funding, and in 2002 not much evidence would have met that standard. Related legislation that year created the Institute of Education Sciences, which followed through on the vision of building and using evidence to improve education.
Evidence from studies showed that some of what had been thought did not prove to be true. For example, afterschool programs did not improve outcomes; using education software to support teaching did not raise test scores; teacher professional development did not raise test scores; voucher programs did not raise test scores; a range of programs to promote social and emotional learning had little effect on outcomes. It seems like a list of negatives, but evidence is useful one way or the other. And, as Tom Kane has argued, more than 80 percent of clinical trials fail to show effectiveness. Why would education be different? And some things that were not known became evident: for example, parents and students attending charter schools were more satisfied with the schools but the schools themselves ranged widely in their effectiveness; math textbooks can affect math skills differently, and NCLB itself raised test scores.
Research is included in the current ESEA drafts, but to no greater extent than it was for NCLB, and in some ways, it’s to a lesser extent. The House bill substitutes a new term, “evidence based,” for NCLB’s “scientifically based.” The substitution seems innocuous, but could prove problematic because the bill does not further define “evidence.” (NCLB defined “scientifically based” in Title IX.) If a state conducts a survey and finds that many students participating in a program think it is effective, is that evidence the program is effective? Under some definitions, yes. Under others, not so much. Opinions about effects are not the same as measures of effects.
Adding a definition of evidence will clarify what meets it and what does not. There is language in the House bill that says programs that receive funding under Title II to prepare new teachers need to “reflect evidencebased research, or in the absence of a strong research base, reflect effective strategies in the field, that provide evidence that the program or activity will improve student academic achievement.” So, what are “effective strategies in the field?” How is it determined that they improve academic achievement? Would it not be “evidencebased research” that shows the practices led to improvement?
The House and Senate bills also call for evaluations. The Senate bill calls for a national evaluation of its new literacy program, an evaluation of a program that serves students in foster care, and a demonstration program of innovative assessment systems that states can pilot. That demonstration is likely to be evaluated so it is mentioned here. The House bill calls for an evaluation of the charter school program and the magnet schools program, neither of which is new. It would be the third evaluation of magnet schools.
Of course, ESEA does not have to be directive about what should be studied. It can set aside money that can be used for studies, and allow their topics and focus to emerge elsewhere. Both bills include the key clause that funds research and evaluation. It states that the Secretary of Education can set aside to use for evaluation up to 0.5 percent of funds for all except the first title. Using 2014 appropriations, the setaside amount is roughly $35 million. IES also receives funding to carry out the National Assessment of Education Progress, to support state development of their data bases, and for other purposes such as studies of special education. The $33 million is to support studies that relate to ESEA.
This is not a lot of money for research, for three reasons. One is that research is a uniquely federal responsibility, not just for education but generally. Fiscal federalism assigns the federal government responsibility for research because states and localities have incentives to underinvest in research. Its costs accrue to them and its benefits accrue to everybody. When the federal government invests in research, costs and benefits align.
A second perspective for viewing the educationresearch investment as paltry is to compare it with the federal investment in the National Institutes of Health. In 2014, that investment was about $30 billion. It’s hard to argue that there is “too much” investment in health research. It’s vital to the nation’s population. But it’s also hard to argue that it is hundreds of times more important to invest in health research than education research for America’s lowincome students. Education is vital to the nation’s population too. There is much more federal spending on health than on education, through the Medicare and Medicaid programs, particularly, but that is not a basis for why federal spending for health research is so much larger than education research. The federal role in supporting research is primary regardless of which level of government spends on services.
A third perspective on research funding also points to enlarging it. On its own, the K12 public education system is static. It wants to do the same thing. Taxes flow in, students and teachers come to school in the morning, there are classes and graduation ceremonies and sports events, and it is repeated next year. The system wants to be in equilibrium, and when it is pushed out of equilibrium, it wants to get back to it.
For example, in the past decade, how states and districts evaluate the performance of their teachers has seen rapid changes. Many teachers are now being evaluated partly based on how their students score on tests. But has anything really changed? The new systems replaced previous systems believed to rate teachers too positively. Nearly all teachers were “effective.” After putting new systems in place, Rhode Island reported that 98 percent of its teachers were effective; Florida, 97 percent, New York, 96 percent. The system returned to where it was.
The point is not about teacher evaluation per se. It is that research has the potential to put energy into the system. What is thought to be best practice for teaching reading, or math, or any subject, might change if research shows that new methods improve on current methods. Or a program of social and emotional development might prove effective in reducing student behavior problems. Or an approach to teaching English to nonEnglish speakers might prove effective in promoting their language acquisition and academic achievement. The list is nearly endless.
Of course, research needs to be conducted and disseminated, and its timeframes can seem slow to policymakers. But here is a margin for bringing innovation to a system that does not have much incentive to innovate. It seems at least as useful to push on this margin as it is to study how education is delivered in Finland and Singapore, which happened after those two countries had the top scores on the Programme for International Student Assessment. Their systems might have some attractive features, but generalizing them to a vast country with a heterogeneous population and a highly decentralized education system is problematic. (Tom Loveless has warned of the perils of “edutourism.”)
America is entering a new phase in which the majority of its public school students are from lowincome households. That’s about 25 million students. Suppose that for each of these students, ESEA set aside $10 a year—a dollar for each month they are in school—for federal research to improve education. That’s $250 million. It sounds like a lot of money, but the scale of the K12 enterprise is vast and seemingly large numbers can be misleading. Comparing it to the more than $600 billion we are spending each year for K12 education, it’s fourtenths of a percent.
The draft ESEA legislation will be modified as it moves to the floor of the respective chambers and then to conference. Increasing funds for research could be done two ways. One would be simply to have the setaside apply to all spending under the bill. It then would include Title I, which is larger than all the other titles combined. For the president’s 2015 budget request, the change would increase the amount set aside for research to $90 million. Or the set aside itself could be increased, to, say, three percent. It’s not getting to the onedollaramonth setaside, but it’s something.
In the 13 years since NCLB was passed, we’ve seen more clearly that research is essential to improve education, just as clinical trials are essential to improve health care. A commitment to equity lies at the heart of ESEA, and spending $10 a year on research for each of America’s lowincome students will help meet that commitment.
Current drafts of the reauthorized Elementary and Secondary Education Act (ESEA) fall short of a commitment to use research to improve education. The bills—the “Student Success Act” in the House and the “Every Child Achieves Act” in the Senate—no doubt represent compromises and tradeoffs as any major legislation would. But who is arguing for less research and innovation in education?
Much of what is debated about No Child Left Behind is its accountability structure—annual tests, “annual yearly progress,” and the goal of moving every student to proficiency by 2014. But another important theme in NCLB was using “scientificallybased research.” Its steady drumbeat of “use research, use evidence, use scientific methods,” represented an embrace of education research and especially the practice of using causal methods to study program effectiveness. NCLB did not go as far as requiring research evidence as a basis for program funding, and in 2002 not much evidence would have met that standard. Related legislation that year created the Institute of Education Sciences, which followed through on the vision of building and using evidence to improve education.
Evidence from studies showed that some of what had been thought did not prove to be true. For example, afterschool programs did not improve outcomes; using education software to support teaching did not raise test scores; teacher professional development did not raise test scores; voucher programs did not raise test scores; a range of programs to promote social and emotional learning had little effect on outcomes. It seems like a list of negatives, but evidence is useful one way or the other. And, as Tom Kane has argued, more than 80 percent of clinical trials fail to show effectiveness. Why would education be different? And some things that were not known became evident: for example, parents and students attending charter schools were more satisfied with the schools but the schools themselves ranged widely in their effectiveness; math textbooks can affect math skills differently, and NCLB itself raised test scores.
Research is included in the current ESEA drafts, but to no greater extent than it was for NCLB, and in some ways, it’s to a lesser extent. The House bill substitutes a new term, “evidence based,” for NCLB’s “scientifically based.” The substitution seems innocuous, but could prove problematic because the bill does not further define “evidence.” (NCLB defined “scientifically based” in Title IX.) If a state conducts a survey and finds that many students participating in a program think it is effective, is that evidence the program is effective? Under some definitions, yes. Under others, not so much. Opinions about effects are not the same as measures of effects.
Adding a definition of evidence will clarify what meets it and what does not. There is language in the House bill that says programs that receive funding under Title II to prepare new teachers need to “reflect evidencebased research, or in the absence of a strong research base, reflect effective strategies in the field, that provide evidence that the program or activity will improve student academic achievement.” So, what are “effective strategies in the field?” How is it determined that they improve academic achievement? Would it not be “evidencebased research” that shows the practices led to improvement?
The House and Senate bills also call for evaluations. The Senate bill calls for a national evaluation of its new literacy program, an evaluation of a program that serves students in foster care, and a demonstration program of innovative assessment systems that states can pilot. That demonstration is likely to be evaluated so it is mentioned here. The House bill calls for an evaluation of the charter school program and the magnet schools program, neither of which is new. It would be the third evaluation of magnet schools.
Of course, ESEA does not have to be directive about what should be studied. It can set aside money that can be used for studies, and allow their topics and focus to emerge elsewhere. Both bills include the key clause that funds research and evaluation. It states that the Secretary of Education can set aside to use for evaluation up to 0.5 percent of funds for all except the first title. Using 2014 appropriations, the setaside amount is roughly $35 million. IES also receives funding to carry out the National Assessment of Education Progress, to support state development of their data bases, and for other purposes such as studies of special education. The $33 million is to support studies that relate to ESEA.
This is not a lot of money for research, for three reasons. One is that research is a uniquely federal responsibility, not just for education but generally. Fiscal federalism assigns the federal government responsibility for research because states and localities have incentives to underinvest in research. Its costs accrue to them and its benefits accrue to everybody. When the federal government invests in research, costs and benefits align.
A second perspective for viewing the educationresearch investment as paltry is to compare it with the federal investment in the National Institutes of Health. In 2014, that investment was about $30 billion. It’s hard to argue that there is “too much” investment in health research. It’s vital to the nation’s population. But it’s also hard to argue that it is hundreds of times more important to invest in health research than education research for America’s lowincome students. Education is vital to the nation’s population too. There is much more federal spending on health than on education, through the Medicare and Medicaid programs, particularly, but that is not a basis for why federal spending for health research is so much larger than education research. The federal role in supporting research is primary regardless of which level of government spends on services.
A third perspective on research funding also points to enlarging it. On its own, the K12 public education system is static. It wants to do the same thing. Taxes flow in, students and teachers come to school in the morning, there are classes and graduation ceremonies and sports events, and it is repeated next year. The system wants to be in equilibrium, and when it is pushed out of equilibrium, it wants to get back to it.
For example, in the past decade, how states and districts evaluate the performance of their teachers has seen rapid changes. Many teachers are now being evaluated partly based on how their students score on tests. But has anything really changed? The new systems replaced previous systems believed to rate teachers too positively. Nearly all teachers were “effective.” After putting new systems in place, Rhode Island reported that 98 percent of its teachers were effective; Florida, 97 percent, New York, 96 percent. The system returned to where it was.
The point is not about teacher evaluation per se. It is that research has the potential to put energy into the system. What is thought to be best practice for teaching reading, or math, or any subject, might change if research shows that new methods improve on current methods. Or a program of social and emotional development might prove effective in reducing student behavior problems. Or an approach to teaching English to nonEnglish speakers might prove effective in promoting their language acquisition and academic achievement. The list is nearly endless.
Of course, research needs to be conducted and disseminated, and its timeframes can seem slow to policymakers. But here is a margin for bringing innovation to a system that does not have much incentive to innovate. It seems at least as useful to push on this margin as it is to study how education is delivered in Finland and Singapore, which happened after those two countries had the top scores on the Programme for International Student Assessment. Their systems might have some attractive features, but generalizing them to a vast country with a heterogeneous population and a highly decentralized education system is problematic. (Tom Loveless has warned of the perils of “edutourism.”)
America is entering a new phase in which the majority of its public school students are from lowincome households. That’s about 25 million students. Suppose that for each of these students, ESEA set aside $10 a year—a dollar for each month they are in school—for federal research to improve education. That’s $250 million. It sounds like a lot of money, but the scale of the K12 enterprise is vast and seemingly large numbers can be misleading. Comparing it to the more than $600 billion we are spending each year for K12 education, it’s fourtenths of a percent.
The draft ESEA legislation will be modified as it moves to the floor of the respective chambers and then to conference. Increasing funds for research could be done two ways. One would be simply to have the setaside apply to all spending under the bill. It then would include Title I, which is larger than all the other titles combined. For the president’s 2015 budget request, the change would increase the amount set aside for research to $90 million. Or the set aside itself could be increased, to, say, three percent. It’s not getting to the onedollaramonth setaside, but it’s something.
In the 13 years since NCLB was passed, we’ve seen more clearly that research is essential to improve education, just as clinical trials are essential to improve health care. A commitment to equity lies at the heart of ESEA, and spending $10 a year on research for each of America’s lowincome students will help meet that commitment.
Since 2009, more than 40 states have rewritten their teacher evaluation policies. Given that school systems have neglected to manage classroom instruction for decades, it was inevitable that many schools would struggle to implement them. New York Governor Andrew Cuomo reignited the controversy by including a second round of teacher evaluation reforms in his budget this year. Below, I describe the most promising opportunities in the new law. Hopefully, New York will provide a blueprint for other states as they tweak their own systems in the coming years.
Traditionally, principals have used much too low a standard when granting tenure, viewing the probationary period merely as an opportunity to weed out the worst malpractice. Under the new law in New York, the length of the probationary period will be lengthened from three to four years and no teacher rated “ineffective” in their fourth year would be able to earn tenure.
Therefore, much depends on what it means to be designated “ineffective.” As New York learned last year when 96 percent of teachers were rated “effective” or “highly effective”, a vague standard is equivalent to no standard. The department should specify that a probationary teacher is “ineffective” during their fourth year of teaching if: (i) a teacher’s average student achievement gain during their second through fourth year of teaching falls below that of the average firstyear teacher in their district or (ii) the classroom observations done by external observers during their second through fourth year of teaching falls below that of the average firstyear teacher.^{[1] }
Most teachers improve their practice during their initial years of teaching. However, if, by their fourth year of teaching, a probationary teacher has not moved beyond the performance of the average novice in their district in terms of student achievement growth and measured classroom practice, students would be better off on average if the district were to commit to fill that teacher’s assignment with a novice teacher every year instead. A fourth year probationary teacher who has been no more effective than a novice teacher should not receive the longterm commitment which accompanies tenure.
Such a standard would have a number of advantages: First, it reminds principals that a promotion decision involves a choice (albeit usually implicit) between two teachers—the probationary teacher and an anonymous novice. Would an NFL coach forego 25 years of future draft picks in order to sign a mediocre player to a longterm contract? No. Yet principals in New York and elsewhere have done so every spring. Linking the standard for tenure to the effectiveness of the average firstyear teacher would remind everyone of the opportunity cost involved in every tenure decision.
Second, it would be a selfadjusting standard: if classroom observation scores become inflated or if the quality of those willing to enter teaching were to decline (or rise), the threshold for tenure would adjust accordingly.
Third, by relying on the scores given by external observers, the tenure decision would no longer be at the sole discretion of the local principal. Because a tenure decision involves thousands of future students as well as future colleagues and supervisors at other schools in a district where a teacher might work, it makes no sense to leave the decision in the hands of their current supervisor alone.
If tenure protections were reserved only for accomplished teachers, just imagine how different our schools would be.
Rather than focus solely on a teacher’s performance during the most recent academic year, the teacher evaluation system should allow tenured teachers to accumulate a longerterm track record of excellence.^{[2] }
After the tenure decision, a teacher’s evaluation each year should depend on four parts: 40 percent of the weight should be placed on student achievement gains in all available prior school years, 40 percent should be placed on prior classroom observations and the remaining 20 percent should be split between their student achievement gains and classroom observations in the most recent year.
As in many professions (including higher education), a past history of success signals that a teacher has the talent and accumulated skill to be successful in the future. The only reason to place greater than proportional weight on the most recent performance is to preserve teachers’ incentive to maintain effort, and not simply to rest on their laurels. Only in professions such as sales, where it is more important to incentivize current effort than to retain talent, is it necessary to ask, “What have you done for us lately?” Therefore, it does not make sense to limit evaluations to the current (or most recent) year.
Aside from recognizing the importance of talent and accumulated skill, another advantage of a longer term perspective is that it frees up teachers with a strong track record to separate their own interests from those of their weakest colleagues. Reform advocates mistakenly believe that the vast majority of teachers have nothing to fear from efforts to root out “grossly ineffective” teachers. They say, “Only the weakest one or two percent of teachers have anything to fear from new teacher evaluations.” However, they forget that the absence of any meaningful differentiation in the past has meant that many teachers do not know where they stand. When a majority of teachers think they could be in the bottom two percent under an unfamiliar and unspecified system, they will resist change. However, as teachers develop a track record and become less vulnerable to a single bad year, they will be more supportive of efforts to police their own ranks.
Children will not succeed until all teachers — both tenured and untenured — adjust what and how they teach. Therefore, a successful teacher evaluation system must also support adult behavior change, and we must not underestimate how difficult that will be.
No one would launch a Weight Watchers club without any bathroom scales or mirrors. Student achievement gains are the bathroom scale, but classroom observations must be the mirror.
Under the new law in New York, one of a teacher’s observers must be drawn from outside a teacher’s school — someone with no personal axe to grind, whose only role is to comment on teaching. A few other districts—such as Washington, DC and Hillsborough County Florida—have been incorporating outside observers in recent years. However, New York is the first state to require outside observers.
No school community can change the way they teach without starting an honest conversation about their own instruction. When 96% of teachers are rated effective or better despite high student failure rates, it is a sure sign that principals have not been honest. An external perspective will make it easier for longtime colleagues to have a frank conversation about each other’s instruction.
Yet, as valuable as they might be, external observations will also present significant logistical challenges. A lot of time could be wasted as observers drive from school to school. One alternative would be to allow teachers to submit videos to an external observer in lieu of inperson classroom observations. (For similar practical reasons, the National Board for Professional Teaching Standards has been allowing teachers to submit videos for more than 20 years.)
Doing so would have a number of advantages. For instance, teachers usually struggle because of the clues they are not noticing, or because they lose track of time. It is difficult for such teachers to recognize their mistakes by reading an observer’s written notes after class. In fact, it’s biologically impossible for someone to recall cues they did not notice in the moment.
Giving teachers control of a camera, the opportunity to watch themselves teach, and allowing them to discuss their videos talk with external observers, peers and supervisors will provide be a more effective mirror than any observer’s written notes.
There would be other advantages as well. Harried principals could do their observations during quieter times of the day or week. And when principals do not have sufficient content expertise, they could solicit the views of content experts.
Finally, video evidence would level the playing field if a teacher ever has to defend their teaching against a principal’s written notes at a dismissal hearing—a teacher’s video vs. an observer’s written notes. Video is now widely used to coach improvements in activities such as athletics and dance and public speaking. The state department of education should encourage districts to use technology to meet the external observer requirement.
New York has not been the only state to struggle with the implementation of teacher evaluation systems. Many systems are still failing to set a high standard for teaching. Despite the controversy, let’s hope that Andrew Cuomo is not the only governor with the courage to revisit the issue. Students will not achieve at higher levels until teachers teach at higher levels—and that’s simply not going to happen without quality feedback and evaluation.
References:
Thomas J. Kane and Douglas O. Staiger “Improving School Accountability Systems” Working Paper, May 2002 http://www.dartmouth.edu/~dstaiger/Papers/WP/2002/KaneStaiger_2002.pdf
Thomas J. Kane and Douglas O. Staiger “The Promise and Pitfalls of Using Imprecise School Accountability Measures” Journal of Economic Perspectives (Fall, 2002b), Vol. 16, No. 4, pp. 91114.
Gibbons, Robert and Kevin Murphy “Optimal Incentive Contracts in the Presence of Career Concerns: Theory and Evidence” Journal of Political Economy Vol. 100, No. 3 (Jun 1992): 468505.
[1] If the student growth data from the fourth year are not available in time (given the 60 day notification required in a tenure denial), then the average from their second and third year of teaching should be used.
[2] Doug Staiger and I discuss the idea of basing school effectiveness ratings on a combination of longterm and short term track records in Kane and Staiger (2002a) and Kane and Staiger (2002b). We drew upon earlier work by Gibbons and Murphy (1992) related to CEO compensation.
Since 2009, more than 40 states have rewritten their teacher evaluation policies. Given that school systems have neglected to manage classroom instruction for decades, it was inevitable that many schools would struggle to implement them. New York Governor Andrew Cuomo reignited the controversy by including a second round of teacher evaluation reforms in his budget this year. Below, I describe the most promising opportunities in the new law. Hopefully, New York will provide a blueprint for other states as they tweak their own systems in the coming years.
Traditionally, principals have used much too low a standard when granting tenure, viewing the probationary period merely as an opportunity to weed out the worst malpractice. Under the new law in New York, the length of the probationary period will be lengthened from three to four years and no teacher rated “ineffective” in their fourth year would be able to earn tenure.
Therefore, much depends on what it means to be designated “ineffective.” As New York learned last year when 96 percent of teachers were rated “effective” or “highly effective”, a vague standard is equivalent to no standard. The department should specify that a probationary teacher is “ineffective” during their fourth year of teaching if: (i) a teacher’s average student achievement gain during their second through fourth year of teaching falls below that of the average firstyear teacher in their district or (ii) the classroom observations done by external observers during their second through fourth year of teaching falls below that of the average firstyear teacher.^{[1] }
Most teachers improve their practice during their initial years of teaching. However, if, by their fourth year of teaching, a probationary teacher has not moved beyond the performance of the average novice in their district in terms of student achievement growth and measured classroom practice, students would be better off on average if the district were to commit to fill that teacher’s assignment with a novice teacher every year instead. A fourth year probationary teacher who has been no more effective than a novice teacher should not receive the longterm commitment which accompanies tenure.
Such a standard would have a number of advantages: First, it reminds principals that a promotion decision involves a choice (albeit usually implicit) between two teachers—the probationary teacher and an anonymous novice. Would an NFL coach forego 25 years of future draft picks in order to sign a mediocre player to a longterm contract? No. Yet principals in New York and elsewhere have done so every spring. Linking the standard for tenure to the effectiveness of the average firstyear teacher would remind everyone of the opportunity cost involved in every tenure decision.
Second, it would be a selfadjusting standard: if classroom observation scores become inflated or if the quality of those willing to enter teaching were to decline (or rise), the threshold for tenure would adjust accordingly.
Third, by relying on the scores given by external observers, the tenure decision would no longer be at the sole discretion of the local principal. Because a tenure decision involves thousands of future students as well as future colleagues and supervisors at other schools in a district where a teacher might work, it makes no sense to leave the decision in the hands of their current supervisor alone.
If tenure protections were reserved only for accomplished teachers, just imagine how different our schools would be.
Rather than focus solely on a teacher’s performance during the most recent academic year, the teacher evaluation system should allow tenured teachers to accumulate a longerterm track record of excellence.^{[2] }
After the tenure decision, a teacher’s evaluation each year should depend on four parts: 40 percent of the weight should be placed on student achievement gains in all available prior school years, 40 percent should be placed on prior classroom observations and the remaining 20 percent should be split between their student achievement gains and classroom observations in the most recent year.
As in many professions (including higher education), a past history of success signals that a teacher has the talent and accumulated skill to be successful in the future. The only reason to place greater than proportional weight on the most recent performance is to preserve teachers’ incentive to maintain effort, and not simply to rest on their laurels. Only in professions such as sales, where it is more important to incentivize current effort than to retain talent, is it necessary to ask, “What have you done for us lately?” Therefore, it does not make sense to limit evaluations to the current (or most recent) year.
Aside from recognizing the importance of talent and accumulated skill, another advantage of a longer term perspective is that it frees up teachers with a strong track record to separate their own interests from those of their weakest colleagues. Reform advocates mistakenly believe that the vast majority of teachers have nothing to fear from efforts to root out “grossly ineffective” teachers. They say, “Only the weakest one or two percent of teachers have anything to fear from new teacher evaluations.” However, they forget that the absence of any meaningful differentiation in the past has meant that many teachers do not know where they stand. When a majority of teachers think they could be in the bottom two percent under an unfamiliar and unspecified system, they will resist change. However, as teachers develop a track record and become less vulnerable to a single bad year, they will be more supportive of efforts to police their own ranks.
Children will not succeed until all teachers — both tenured and untenured — adjust what and how they teach. Therefore, a successful teacher evaluation system must also support adult behavior change, and we must not underestimate how difficult that will be.
No one would launch a Weight Watchers club without any bathroom scales or mirrors. Student achievement gains are the bathroom scale, but classroom observations must be the mirror.
Under the new law in New York, one of a teacher’s observers must be drawn from outside a teacher’s school — someone with no personal axe to grind, whose only role is to comment on teaching. A few other districts—such as Washington, DC and Hillsborough County Florida—have been incorporating outside observers in recent years. However, New York is the first state to require outside observers.
No school community can change the way they teach without starting an honest conversation about their own instruction. When 96% of teachers are rated effective or better despite high student failure rates, it is a sure sign that principals have not been honest. An external perspective will make it easier for longtime colleagues to have a frank conversation about each other’s instruction.
Yet, as valuable as they might be, external observations will also present significant logistical challenges. A lot of time could be wasted as observers drive from school to school. One alternative would be to allow teachers to submit videos to an external observer in lieu of inperson classroom observations. (For similar practical reasons, the National Board for Professional Teaching Standards has been allowing teachers to submit videos for more than 20 years.)
Doing so would have a number of advantages. For instance, teachers usually struggle because of the clues they are not noticing, or because they lose track of time. It is difficult for such teachers to recognize their mistakes by reading an observer’s written notes after class. In fact, it’s biologically impossible for someone to recall cues they did not notice in the moment.
Giving teachers control of a camera, the opportunity to watch themselves teach, and allowing them to discuss their videos talk with external observers, peers and supervisors will provide be a more effective mirror than any observer’s written notes.
There would be other advantages as well. Harried principals could do their observations during quieter times of the day or week. And when principals do not have sufficient content expertise, they could solicit the views of content experts.
Finally, video evidence would level the playing field if a teacher ever has to defend their teaching against a principal’s written notes at a dismissal hearing—a teacher’s video vs. an observer’s written notes. Video is now widely used to coach improvements in activities such as athletics and dance and public speaking. The state department of education should encourage districts to use technology to meet the external observer requirement.
New York has not been the only state to struggle with the implementation of teacher evaluation systems. Many systems are still failing to set a high standard for teaching. Despite the controversy, let’s hope that Andrew Cuomo is not the only governor with the courage to revisit the issue. Students will not achieve at higher levels until teachers teach at higher levels—and that’s simply not going to happen without quality feedback and evaluation.
References:
Thomas J. Kane and Douglas O. Staiger “Improving School Accountability Systems” Working Paper, May 2002 http://www.dartmouth.edu/~dstaiger/Papers/WP/2002/KaneStaiger_2002.pdf
Thomas J. Kane and Douglas O. Staiger “The Promise and Pitfalls of Using Imprecise School Accountability Measures” Journal of Economic Perspectives (Fall, 2002b), Vol. 16, No. 4, pp. 91114.
Gibbons, Robert and Kevin Murphy “Optimal Incentive Contracts in the Presence of Career Concerns: Theory and Evidence” Journal of Political Economy Vol. 100, No. 3 (Jun 1992): 468505.
[1] If the student growth data from the fourth year are not available in time (given the 60 day notification required in a tenure denial), then the average from their second and third year of teaching should be used.
[2] Doug Staiger and I discuss the idea of basing school effectiveness ratings on a combination of longterm and short term track records in Kane and Staiger (2002a) and Kane and Staiger (2002b). We drew upon earlier work by Gibbons and Murphy (1992) related to CEO compensation.
Over the past year, the Common Core State Standards have risen from a topic of interest mainly to educators and school reformers to a toptier issue in national politics. With likely Republican presidential candidates already staking out divergent positions, the standards and the federal government’s role in promoting them show strong signs of emerging as a key point of contention in the Republican primaries. How might this growing salience shape public opinion on the standards? A comparison of polls we’ve conducted nationally and in the state of Louisiana is instructive—and discouraging.
Several states have endured political battles over the Common Core State Standards, but none has matched the drama and intensity of Louisiana’s. Like most states, Louisiana adopted Common Core in 2010 without fanfare or controversy. The state began introducing the standards the following year with an eye toward full implementation this school year. At the time, the standards had the fullthroated support of Republican Governor Bobby Jindal. Since then, Governor Jindal has positioned himself against the Common Core and his former ally, State Superintendent John White. In last year’s legislative session, Governor Jindal pushed bills to pull Louisiana out of the Common Core. When those efforts failed, largely due to the state’s powerful business lobby, the governor issued an executive order to pull the state out of a consortium of states using a Common Corealigned test. After Superintendent White balked, arguing the governor lacked the legal authority to withdraw the state, Jindal joined a lawsuit against the state board of education (which has since been dismissed). The fray has been marked with recriminations and personal attacks befitting the political traditions of the Pelican State. Although a state court recently dismissed his lawsuit, Governor Jindal has announced plans to go after Common Core again in this year’s session.
Louisiana’s flareup over Common Core is set upon a broader, national kindling of rising public discontent. Last summer’s Education Next Poll of a representative sample of American adults showed that although a majority of the public supported the standards, that support had seriously eroded. In 2013, 65 percent of the general public favored the standards, but that portion fell to 53 percent in 2014 (see Figure 1).
Figure 1. Percent Supporting the Common Core State Standards, Overall and by Party, 2013 and 2014
Note. The question read as follows: As you may know, in the last few years states have been deciding whether or not to use the Common Core, which are standards for reading and math that are the same across states. In the states that have these standards, they will be used to hold public schools accountable for their performance. Do you support or oppose the use of the Common Core in your state?
The Education Next poll revealed partisan polarization and widespread misperception, as well. In the 2013 Education Next Poll, Common Core gathered backers from across the political spectrum. By 2014, support among Republicans fell from 57 percent to 43 percent, even as support among Democrats remained nearly unchanged (64 percent in 2013 and 63 percent in 2014). That year, the majority of respondents were misinformed on several important elements of the Common Core State Standards Initiative. Barely more than one third said it was false that the federal government requires all states to use the Common Core standards (it does not), just 15 percent said it was false that the federal government will receive detailed data on the test performance of individual students in participating states (it will not), and fewer than half said it was true that states and local school districts can decide which textbooks to use under Common Core (they can).
How does public opinion and understanding change when Common Core becomes a highly visible focal point—as has been the case in Louisiana? The extra attention given to the issue by political leaders and the press does not appear to resolve debates into consensus or clear up public confusion. If anything, it is quite the opposite.
To examine how the heated rhetoric now attached to Common Core influences opinion, the 2015 Louisiana Survey, an annual survey of the state’s adult residents sponsored by Louisiana State University’s Reilly Center for Media and Public Affairs, used an experiment featuring two versions of a question assessing support for common educational standards.
When the Common Core label is dropped from the question, support for the concept leaps from 39 percent to 67 percent. With the label, a majority of the public (51 percent) opposes Common Core, but without the label a majority (67 percent) supports common math and reading standards (see Figure 2).
Figure 2. Percent Supporting the “Common Core” and “standards that are the same,” Louisiana (2015) and National Public (2014)
The “Common Core” label also evokes significant partisan polarization in Louisiana. There is no difference in support of Democrats and Republicans when the program is not labeled as “Common Core,” 72 percent and 71 percent respectively. However, when the phrase is used, a majority of Democrats (57 percent) support while a majority of Republicans (62 percent) oppose it. Only 22 percent of Republicans support the standards with the label.
This labeling effect occurs despite the fact that people in Louisiana claim to be more informed about Common Core. In 2014 nearly half of the public (49 percent) described themselves as “very familiar” or “somewhat familiar” with Common Core. This year, 62 percent of Louisiana residents say they are “very familiar” or “somewhat familiar.” (By way of comparison, just 43 percent of Americans surveyed by Education Next in 2014 had even heard of Common Core; among those who had, a majority said they were “not very knowledgeable about it.”)
An examination of Louisianans’ perceptions about Common Core reveals continued confusion. For example, some respondents were asked whether the following statement is true or false: “Under the Common Core, the state of Louisiana and its local school districts decide which textbooks and educational materials to use in their schools.” One fifth said this statement is false, and 43 percent said it is true. However, another set of respondents were asked whether the opposite statement is true or false: “Under the Common Core, the federal government decides which textbooks and educational materials to use in schools.” If 43 percent believe it is true that states and local districts retain this authority, then it is reasonable to expect that at least as many would say the statement about federal authority is false. However, that was not the case. Just 12 percent said the second statement is false. Indeed, 59 percent said that it is true that the federal government makes these decisions. When differences in how a question is phrased produce significant inconsistencies—as they do here—that suggests many responses may be based on little more than guesswork.
The toxic effect of the “Common Core” name is not a Louisiana phenomenon. The survey experiment in the Louisiana Survey is based upon a similar experiment in the 2014 Education Next Poll. When the “Common Core” label was dropped from the question in the national survey, support for the concept among the general public rose from 53 percent to 68 percent (see Figure 2). Additionally, the pronounced partisan polarization evoked by the label “Common Core” disappeared when the question did not include those seemingly toxic words. In other words, a broad consensus remains with respect to common standards, despite the fact that public debate over Common Core polarizes the public.
Could the choice of the name “Common Core” specifically explain these patterns? Conservative columnist Peggy Noonan wrote that it “sounds common—except for the part that sounds soviet.” It is possible that Republicans share her sensitivities. But we suspect that something deeper is at work. A similar dynamic was evident in public opinion on No Child Left Behind in the waning years of the George W. Bush administration. Support for the law dropped markedly in questions that referred to it by name, as compared to questions that used otherwise identical language to describe its key provisions. These effects were observed mainly among Democrats, suggesting that the law, despite its bipartisan roots, had come to be closely associated in the public mind with the Republican president. The consistency of patterns highlights a key tension facing education advocates seeking to use federal policy to advance their goals: Any benefits from federal involvement may come at the cost of heightened partisan polarization.
The Education Next and Louisiana Survey polls differ in timing, target population, and mode. Nevertheless, the pairing is suggestive. In Louisiana, where the fight over Common Core has been particularly salient, the effect of the “Common Core” label was even more negative than in the American public as whole, and the impact on polarization was greater. As the conflict heats up nationwide, the American public may move further from a latent consensus about common standards toward ever more confused divisions over “Common Core.”
Over the past year, the Common Core State Standards have risen from a topic of interest mainly to educators and school reformers to a toptier issue in national politics. With likely Republican presidential candidates already staking out divergent positions, the standards and the federal government’s role in promoting them show strong signs of emerging as a key point of contention in the Republican primaries. How might this growing salience shape public opinion on the standards? A comparison of polls we’ve conducted nationally and in the state of Louisiana is instructive—and discouraging.
Several states have endured political battles over the Common Core State Standards, but none has matched the drama and intensity of Louisiana’s. Like most states, Louisiana adopted Common Core in 2010 without fanfare or controversy. The state began introducing the standards the following year with an eye toward full implementation this school year. At the time, the standards had the fullthroated support of Republican Governor Bobby Jindal. Since then, Governor Jindal has positioned himself against the Common Core and his former ally, State Superintendent John White. In last year’s legislative session, Governor Jindal pushed bills to pull Louisiana out of the Common Core. When those efforts failed, largely due to the state’s powerful business lobby, the governor issued an executive order to pull the state out of a consortium of states using a Common Corealigned test. After Superintendent White balked, arguing the governor lacked the legal authority to withdraw the state, Jindal joined a lawsuit against the state board of education (which has since been dismissed). The fray has been marked with recriminations and personal attacks befitting the political traditions of the Pelican State. Although a state court recently dismissed his lawsuit, Governor Jindal has announced plans to go after Common Core again in this year’s session.
Louisiana’s flareup over Common Core is set upon a broader, national kindling of rising public discontent. Last summer’s Education Next Poll of a representative sample of American adults showed that although a majority of the public supported the standards, that support had seriously eroded. In 2013, 65 percent of the general public favored the standards, but that portion fell to 53 percent in 2014 (see Figure 1).
Figure 1. Percent Supporting the Common Core State Standards, Overall and by Party, 2013 and 2014
Note. The question read as follows: As you may know, in the last few years states have been deciding whether or not to use the Common Core, which are standards for reading and math that are the same across states. In the states that have these standards, they will be used to hold public schools accountable for their performance. Do you support or oppose the use of the Common Core in your state?
The Education Next poll revealed partisan polarization and widespread misperception, as well. In the 2013 Education Next Poll, Common Core gathered backers from across the political spectrum. By 2014, support among Republicans fell from 57 percent to 43 percent, even as support among Democrats remained nearly unchanged (64 percent in 2013 and 63 percent in 2014). That year, the majority of respondents were misinformed on several important elements of the Common Core State Standards Initiative. Barely more than one third said it was false that the federal government requires all states to use the Common Core standards (it does not), just 15 percent said it was false that the federal government will receive detailed data on the test performance of individual students in participating states (it will not), and fewer than half said it was true that states and local school districts can decide which textbooks to use under Common Core (they can).
How does public opinion and understanding change when Common Core becomes a highly visible focal point—as has been the case in Louisiana? The extra attention given to the issue by political leaders and the press does not appear to resolve debates into consensus or clear up public confusion. If anything, it is quite the opposite.
To examine how the heated rhetoric now attached to Common Core influences opinion, the 2015 Louisiana Survey, an annual survey of the state’s adult residents sponsored by Louisiana State University’s Reilly Center for Media and Public Affairs, used an experiment featuring two versions of a question assessing support for common educational standards.
When the Common Core label is dropped from the question, support for the concept leaps from 39 percent to 67 percent. With the label, a majority of the public (51 percent) opposes Common Core, but without the label a majority (67 percent) supports common math and reading standards (see Figure 2).
Figure 2. Percent Supporting the “Common Core” and “standards that are the same,” Louisiana (2015) and National Public (2014)
The “Common Core” label also evokes significant partisan polarization in Louisiana. There is no difference in support of Democrats and Republicans when the program is not labeled as “Common Core,” 72 percent and 71 percent respectively. However, when the phrase is used, a majority of Democrats (57 percent) support while a majority of Republicans (62 percent) oppose it. Only 22 percent of Republicans support the standards with the label.
This labeling effect occurs despite the fact that people in Louisiana claim to be more informed about Common Core. In 2014 nearly half of the public (49 percent) described themselves as “very familiar” or “somewhat familiar” with Common Core. This year, 62 percent of Louisiana residents say they are “very familiar” or “somewhat familiar.” (By way of comparison, just 43 percent of Americans surveyed by Education Next in 2014 had even heard of Common Core; among those who had, a majority said they were “not very knowledgeable about it.”)
An examination of Louisianans’ perceptions about Common Core reveals continued confusion. For example, some respondents were asked whether the following statement is true or false: “Under the Common Core, the state of Louisiana and its local school districts decide which textbooks and educational materials to use in their schools.” One fifth said this statement is false, and 43 percent said it is true. However, another set of respondents were asked whether the opposite statement is true or false: “Under the Common Core, the federal government decides which textbooks and educational materials to use in schools.” If 43 percent believe it is true that states and local districts retain this authority, then it is reasonable to expect that at least as many would say the statement about federal authority is false. However, that was not the case. Just 12 percent said the second statement is false. Indeed, 59 percent said that it is true that the federal government makes these decisions. When differences in how a question is phrased produce significant inconsistencies—as they do here—that suggests many responses may be based on little more than guesswork.
The toxic effect of the “Common Core” name is not a Louisiana phenomenon. The survey experiment in the Louisiana Survey is based upon a similar experiment in the 2014 Education Next Poll. When the “Common Core” label was dropped from the question in the national survey, support for the concept among the general public rose from 53 percent to 68 percent (see Figure 2). Additionally, the pronounced partisan polarization evoked by the label “Common Core” disappeared when the question did not include those seemingly toxic words. In other words, a broad consensus remains with respect to common standards, despite the fact that public debate over Common Core polarizes the public.
Could the choice of the name “Common Core” specifically explain these patterns? Conservative columnist Peggy Noonan wrote that it “sounds common—except for the part that sounds soviet.” It is possible that Republicans share her sensitivities. But we suspect that something deeper is at work. A similar dynamic was evident in public opinion on No Child Left Behind in the waning years of the George W. Bush administration. Support for the law dropped markedly in questions that referred to it by name, as compared to questions that used otherwise identical language to describe its key provisions. These effects were observed mainly among Democrats, suggesting that the law, despite its bipartisan roots, had come to be closely associated in the public mind with the Republican president. The consistency of patterns highlights a key tension facing education advocates seeking to use federal policy to advance their goals: Any benefits from federal involvement may come at the cost of heightened partisan polarization.
The Education Next and Louisiana Survey polls differ in timing, target population, and mode. Nevertheless, the pairing is suggestive. In Louisiana, where the fight over Common Core has been particularly salient, the effect of the “Common Core” label was even more negative than in the American public as whole, and the impact on polarization was greater. As the conflict heats up nationwide, the American public may move further from a latent consensus about common standards toward ever more confused divisions over “Common Core.”
A nasty political fight between New York Governor Andrew Cuomo and the state teachers’ union has embroiled the state since Cuomo announced a set of education policy proposals in January that led the state’s union president to declare that Cuomo had “declared war on the public schools.” In the end, Cuomo got much (but not all) of what he wanted, including changes to teacher evaluation and tenure policies, which the State Senate and Assembly approved last month.
The state teachers’ union has not taken defeat lying down, responding in part by encouraging parents to “optout” their children from the standardized exams that are used as a factor in the evaluation system. Union president Karen Magee argued that a large number of optouts could sabotage the evaluation system: “Statistically, if you take out enough, it has no merit or value whatsoever.”
Is Magee’s statistical argument correct? Taken to enough of an extreme, it surely is. If all students opted out, there would be no test data to include in teacher evaluations. But what would be the likely impact of some, or even many, but not all students refusing to take the tests? My colleague Katharine Lindquist and I used statewide data from North Carolina to simulate the impact of optout on testscorebased measures of teacher performance.^{[1]} We ran two sets of simulations: one where students optout randomly, and another in which optout occurs among the highestperforming students in each classroom (as measured by their prior test scores).
Opting out adds noise to the data, which increases the amount of variability in the teacher performance measures because each teacher’s score is based on fewer students. A teacher faces a higher risk of being labelled lowperforming (or highperforming) as the number of optouts in her classroom increases. But the effect of optout is quite small unless a large number of students do so.
A teacher in New York State is considered to be ineffective based on her students’ test score growth if her valueadded score is more than 1.5 standard deviations below average (i.e., in the bottom seven percent of teachers). If a handful of students opt out, little changes. The risk of getting the lowest score barely changes even if five students in the class opt out—more than 20 percent of the typical classroom.
But if enough of a teacher’s students opt out, her risk of getting a bad score increases.^{[2]} For example, imagine a teacher who strongly encourages her students to opt out of the tests and succeeds in getting 15 students—a majority of the class—to opt out. That teacher would have a significantly elevated risk of getting an ineffective score: 1113 percent depending on the simulation.
Percent of Teachers Rated Ineffective, by Number of Students Opting Out Per Classroom
A large number of optouts in a classroom also increases the teacher’s chance of getting a high score, as shown in the simulated ratings using the New York scoring system in the figure below.^{[3]} However, in New York, the punishments for a low score are more significant than the reward for a high score. New teachers must now receive good scores in three out of their first four years in order to be eligible for tenure, and teachers with tenure can now be terminated after two years of low scores.
Distribution of Teacher Ratings, with and without 15 optouts
Reducing the number of students who contribute to a teacher’s valueadded score not only changes the chance that a teacher will receive a particular rating; it also increases the likelihood that she will receive the wrong rating. One way to assess the potential impact on the fairness of the resulting teacher ratings is to calculate the correlation between teachers’ valueadded scores with and without optout. If only one student in the class opts out, valueadded scores barely change at all—the correlation is 0.99 (on a scale from 0 to 1). With five and ten optouts per class, the correlation remains high—0.97 and 0.91, respectively. The correlation eventually starts to break down if a large number of students opt out—it is 0.77 if 15 students do so—indicating a measurably less fair evaluation system than if all students take the standardized tests.^{[4]}
Governor Cuomo has sharply criticized the evaluation system as “baloney” for classifying 99 percent of teachers as effective. The strange irony is that teachers who convince many of their students to optout are likely to help achieve Cuomo’s goal of increasing the share of teachers judged to be lowperforming. These teachers would also receive evaluation scores that are less fair than the ones produced without optout in a system their union already believes is unfair.
But in the majority of classrooms, where optout appears likely to remain at low levels, the data strongly suggest that students sitting out of standardized testing will have only a trivial impact on the ratings received by their teachers. The broader lessons is that while optout may have some success as a political strategy, it is unlikely to have much of a direct, broadbased impact on the teacher evaluation system in New York or any other state.
Note: After publication of this post, it was brought to my attention that New York State does not report growth ratings for teachers with fewer than 16 students with test scores. The impact that optout in conjunction with this rule has on teacher evaluations in New York in the future will depend on whether the rule remains part of the newly revised evaluation system and on the specifications of the performance measures used for teachers without growth ratings. To the extent that these measures are more lenient than growth ratings (i.e. the nongrowth measures are less likely to produce a low score than the growth ratings), then optout could produce higher ratings for some teachers who have enough optouts to push them below the reporting threshold.
[1] Specifically, we analyzed data on the math achievement of fourth and fifthgrade students in 200910 who were in classrooms of 1630 students. Our valueadded measure of teacher performance was estimated as the average residuals from a regression of math scores on prior scores in both math and reading (including squared and cubed terms) and indicators of free/reducedprice lunch eligibility, limited English proficiency status, and disabilities. This model is conceptually similar to but much less complicated than New York State’s model.
[2] This analysis assumes that optout is relatively localized, so that while it affects an individual teacher’s estimate it does not affect the statewide distribution against which the teacher is being compared. If optout occurred uniformly across the state, then it would have no impact on the share of teachers classified as lowperforming because it would shift the distribution for the entire state. Of course it would still increase the mismeasurement of teacher performance, just not the percent in a given category (e.g., more than 1.5 standard deviations below the mean).
[3] The simulation makes an important simplification by using only the teacher’s estimated valueadded score and not the confidence range of that estimate (see details on page 6 of this document). This simplification was made for computational reasons; incorporating the confidence intervals into the analysis would likely weaken the simulated impact of optout on the share of teachers rated in the highest and lowest categories (because more optouts would increase the size of the confidence intervals).
[4] These correlations are from the simulation in which students randomly optout. The correlations for the simulation in which higherperforming students optout are 1.00, 0.97, 0.90, and 0.72 for one, five, 10, and 15 optouts, respectively.
A nasty political fight between New York Governor Andrew Cuomo and the state teachers’ union has embroiled the state since Cuomo announced a set of education policy proposals in January that led the state’s union president to declare that Cuomo had “declared war on the public schools.” In the end, Cuomo got much (but not all) of what he wanted, including changes to teacher evaluation and tenure policies, which the State Senate and Assembly approved last month.
The state teachers’ union has not taken defeat lying down, responding in part by encouraging parents to “optout” their children from the standardized exams that are used as a factor in the evaluation system. Union president Karen Magee argued that a large number of optouts could sabotage the evaluation system: “Statistically, if you take out enough, it has no merit or value whatsoever.”
Is Magee’s statistical argument correct? Taken to enough of an extreme, it surely is. If all students opted out, there would be no test data to include in teacher evaluations. But what would be the likely impact of some, or even many, but not all students refusing to take the tests? My colleague Katharine Lindquist and I used statewide data from North Carolina to simulate the impact of optout on testscorebased measures of teacher performance.^{[1]} We ran two sets of simulations: one where students optout randomly, and another in which optout occurs among the highestperforming students in each classroom (as measured by their prior test scores).
Opting out adds noise to the data, which increases the amount of variability in the teacher performance measures because each teacher’s score is based on fewer students. A teacher faces a higher risk of being labelled lowperforming (or highperforming) as the number of optouts in her classroom increases. But the effect of optout is quite small unless a large number of students do so.
A teacher in New York State is considered to be ineffective based on her students’ test score growth if her valueadded score is more than 1.5 standard deviations below average (i.e., in the bottom seven percent of teachers). If a handful of students opt out, little changes. The risk of getting the lowest score barely changes even if five students in the class opt out—more than 20 percent of the typical classroom.
But if enough of a teacher’s students opt out, her risk of getting a bad score increases.^{[2]} For example, imagine a teacher who strongly encourages her students to opt out of the tests and succeeds in getting 15 students—a majority of the class—to opt out. That teacher would have a significantly elevated risk of getting an ineffective score: 1113 percent depending on the simulation.
Percent of Teachers Rated Ineffective, by Number of Students Opting Out Per Classroom
A large number of optouts in a classroom also increases the teacher’s chance of getting a high score, as shown in the simulated ratings using the New York scoring system in the figure below.^{[3]} However, in New York, the punishments for a low score are more significant than the reward for a high score. New teachers must now receive good scores in three out of their first four years in order to be eligible for tenure, and teachers with tenure can now be terminated after two years of low scores.
Distribution of Teacher Ratings, with and without 15 optouts
Reducing the number of students who contribute to a teacher’s valueadded score not only changes the chance that a teacher will receive a particular rating; it also increases the likelihood that she will receive the wrong rating. One way to assess the potential impact on the fairness of the resulting teacher ratings is to calculate the correlation between teachers’ valueadded scores with and without optout. If only one student in the class opts out, valueadded scores barely change at all—the correlation is 0.99 (on a scale from 0 to 1). With five and ten optouts per class, the correlation remains high—0.97 and 0.91, respectively. The correlation eventually starts to break down if a large number of students opt out—it is 0.77 if 15 students do so—indicating a measurably less fair evaluation system than if all students take the standardized tests.^{[4]}
Governor Cuomo has sharply criticized the evaluation system as “baloney” for classifying 99 percent of teachers as effective. The strange irony is that teachers who convince many of their students to optout are likely to help achieve Cuomo’s goal of increasing the share of teachers judged to be lowperforming. These teachers would also receive evaluation scores that are less fair than the ones produced without optout in a system their union already believes is unfair.
But in the majority of classrooms, where optout appears likely to remain at low levels, the data strongly suggest that students sitting out of standardized testing will have only a trivial impact on the ratings received by their teachers. The broader lessons is that while optout may have some success as a political strategy, it is unlikely to have much of a direct, broadbased impact on the teacher evaluation system in New York or any other state.
Note: After publication of this post, it was brought to my attention that New York State does not report growth ratings for teachers with fewer than 16 students with test scores. The impact that optout in conjunction with this rule has on teacher evaluations in New York in the future will depend on whether the rule remains part of the newly revised evaluation system and on the specifications of the performance measures used for teachers without growth ratings. To the extent that these measures are more lenient than growth ratings (i.e. the nongrowth measures are less likely to produce a low score than the growth ratings), then optout could produce higher ratings for some teachers who have enough optouts to push them below the reporting threshold.
[1] Specifically, we analyzed data on the math achievement of fourth and fifthgrade students in 200910 who were in classrooms of 1630 students. Our valueadded measure of teacher performance was estimated as the average residuals from a regression of math scores on prior scores in both math and reading (including squared and cubed terms) and indicators of free/reducedprice lunch eligibility, limited English proficiency status, and disabilities. This model is conceptually similar to but much less complicated than New York State’s model.
[2] This analysis assumes that optout is relatively localized, so that while it affects an individual teacher’s estimate it does not affect the statewide distribution against which the teacher is being compared. If optout occurred uniformly across the state, then it would have no impact on the share of teachers classified as lowperforming because it would shift the distribution for the entire state. Of course it would still increase the mismeasurement of teacher performance, just not the percent in a given category (e.g., more than 1.5 standard deviations below the mean).
[3] The simulation makes an important simplification by using only the teacher’s estimated valueadded score and not the confidence range of that estimate (see details on page 6 of this document). This simplification was made for computational reasons; incorporating the confidence intervals into the analysis would likely weaken the simulated impact of optout on the share of teachers rated in the highest and lowest categories (because more optouts would increase the size of the confidence intervals).
[4] These correlations are from the simulation in which students randomly optout. The correlations for the simulation in which higherperforming students optout are 1.00, 0.97, 0.90, and 0.72 for one, five, 10, and 15 optouts, respectively.
March 26, 2015
2:00 PM  2:30 PM EDT
Online Only
Live Webcast
And more from the Brown Center Report on American Education
Girls outscore boys on practically every reading test given to a large population. And they have for a long time. A 1942 Iowa study found girls performing better than boys on tests of reading comprehension, vocabulary, and basic language skills, and girls have outscored boys on every reading test ever given by the National Assessment of Educational Progress (NAEP). This gap is not confined to the U.S. Reading tests administered as part of the Progress in International Reading Literacy Study (PIRLS) and the Program for International Student Assessment (PISA) reveal that the gender gap is a worldwide phenomenon.
On March 26, join Brown Center experts Tom Loveless and Matthew Chingos as they discuss the latest Brown Center Report on American Education, which examines this phenomenon. Hear what Loveless's analysis revealed about where the gender gap stands today and how it's trended over the past several decades  in the U.S. and around the world.
Tune in below or via Spreecast where you can submit questions.
Spreecast is the social video platform that connects people.
Check out Girls, Boys, and Reading on Spreecast.
March 26, 2015
2:00 PM  2:30 PM EDT
Online Only
Live Webcast
And more from the Brown Center Report on American Education
Girls outscore boys on practically every reading test given to a large population. And they have for a long time. A 1942 Iowa study found girls performing better than boys on tests of reading comprehension, vocabulary, and basic language skills, and girls have outscored boys on every reading test ever given by the National Assessment of Educational Progress (NAEP). This gap is not confined to the U.S. Reading tests administered as part of the Progress in International Reading Literacy Study (PIRLS) and the Program for International Student Assessment (PISA) reveal that the gender gap is a worldwide phenomenon.
On March 26, join Brown Center experts Tom Loveless and Matthew Chingos as they discuss the latest Brown Center Report on American Education, which examines this phenomenon. Hear what Loveless's analysis revealed about where the gender gap stands today and how it's trended over the past several decades  in the U.S. and around the world.
Tune in below or via Spreecast where you can submit questions.
Spreecast is the social video platform that connects people.
Check out Girls, Boys, and Reading on Spreecast.
The Brown Center on Education Policy at Brookings has published the 14th report in its series, "The Brown Center Report on American Education: How Well Are American Students Learning?" In this 2015 edition, author Tom Loveless, a senior fellow, examines three subjects:
Watch this video about the gender gap in reading:
On Thursday, March 26, at 2:00 p.m., join Brown Center experts Loveless and Matt Chingos for a Brookings Live discussion on Spreecast about the gender gap in reading:
Spreecast is the social video platform that connects people.
Check out Girls, Boys, and Reading on Spreecast.
The Brown Center on Education Policy at Brookings has published the 14th report in its series, "The Brown Center Report on American Education: How Well Are American Students Learning?" In this 2015 edition, author Tom Loveless, a senior fellow, examines three subjects:
Watch this video about the gender gap in reading:
On Thursday, March 26, at 2:00 p.m., join Brown Center experts Loveless and Matt Chingos for a Brookings Live discussion on Spreecast about the gender gap in reading:
Spreecast is the social video platform that connects people.
Check out Girls, Boys, and Reading on Spreecast.
Part II of the 2015 Brown Center Report on American Education
Over the next several years, policy analysts will evaluate the impact of the Common Core State Standards (CCSS) on U.S. education. The task promises to be challenging. The question most analysts will focus on is whether the CCSS is good or bad policy. This section of the Brown Center Report (BCR) tackles a set of seemingly innocuous questions compared to the hotbutton question of whether Common Core is wise or foolish. The questions all have to do with when Common Core actually started, or more precisely, when the Common Core started having an effect on student learning. And if it hasn’t yet had an effect, how will we know that CCSS has started to influence student achievement?
The analysis below probes this issue empirically, hopefully persuading readers that deciding when a policy begins is elemental to evaluating its effects. The question of a policy’s starting point is not always easy to answer. Yet the answer has consequences. You can’t figure out whether a policy worked or not unless you know when it began.^{[i]}
The analysis uses surveys of state implementation to model different CCSS starting points for states and produces a second early report card on how CCSS is doing. The first report card, focusing on math, was presented in last year’s BCR. The current study updates state implementation ratings that were presented in that report and extends the analysis to achievement in reading. The goal is not only to estimate CCSS’s early impact, but also to lay out a fair approach for establishing when the Common Core’s impact began—and to do it now before data are generated that either critics or supporters can use to bolster their arguments. The experience of No Child Left Behind (NCLB) illustrates this necessity.
After the 2008 National Assessment of Educational Progress (NAEP) scores were released, former Secretary of Education Margaret Spellings claimed that the new scores showed “we are on the right track.”^{[ii]} She pointed out that NAEP gains in the previous decade, 19992009, were much larger than in prior decades. Mark Schneider of the American Institutes of Research (and a former Commissioner of the National Center for Education Statistics [NCES]) reached a different conclusion. He compared NAEP gains from 19962003 to 20032009 and declared NCLB’s impact disappointing. “The preNCLB gains were greater than the postNCLB gains.”^{[iii]} It is important to highlight that Schneider used the 2003 NAEP scores as the starting point for assessing NCLB. A report from FairTest on the tenth anniversary of NCLB used the same demarcation for pre and postNCLB time frames.^{[iv]} FairTest is an advocacy group critical of high stakes testing—and harshly critical of NCLB—but if the 2003 starting point for NAEP is accepted, its conclusion is indisputable, “NAEP score improvement slowed or stopped in both reading and math after NCLB was implemented.”
Choosing 2003 as NCLB’s starting date is intuitively appealing. The law was introduced, debated, and passed by Congress in 2001. President Bush signed NCLB into law on January 8, 2002. It takes time to implement any law. The 2003 NAEP is arguably the first chance that the assessment had to register NCLB’s effects.
Selecting 2003 is consequential, however. Some of the largest gains in NAEP’s history were registered between 2000 and 2003. Once 2003 is established as a starting point (or baseline), pre2003 gains become “preNCLB.” But what if the 2003 NAEP scores were influenced by NCLB? Experiments evaluating the effects of new drugs collect baseline data from subjects before treatment, not after the treatment has begun. Similarly, evaluating the effects of public policies require that baseline data are not influenced by the policies under evaluation.
Avoiding such problems is particularly difficult when state or local policies are adopted nationally. The federal effort to establish a speed limit of 55 miles per hour in the 1970s is a good example. Several states already had speed limits of 55 mph or lower prior to the federal law’s enactment. Moreover, a few states lowered speed limits in anticipation of the federal limit while the bill was debated in Congress. On the day President Nixon signed the bill into law—January 2, 1974—the Associated Press reported that only 29 states would be required to lower speed limits. Evaluating the effects of the 1974 law with national data but neglecting to adjust for what states were already doing would obviously yield tainted baseline data.
There are comparable reasons for questioning 2003 as a good baseline for evaluating NCLB’s effects. The key components of NCLB’s accountability provisions—testing students, publicizing the results, and holding schools accountable for results—were already in place in nearly half the states. In some states they had been in place for several years. The 1999 iteration of Quality Counts, Education Week’s annual report on statelevel efforts to improve public education, entitled Rewarding Results, Punishing Failure, was devoted to state accountability systems and the assessments underpinning them. Testing and accountability are especially important because they have drawn fire from critics of NCLB, a law that wasn’t passed until years later.
The Congressional debate of NCLB legislation took all of 2001, allowing states to pass anticipatory policies. Derek Neal and Diane Whitmore Schanzenbach reported that “with the passage of NCLB lurking on the horizon,” Illinois placed hundreds of schools on a watch list and declared that future state testing would be high stakes.^{[v]} In the summer and fall of 2002, with NCLB now the law of the land, state after state released lists of schools falling short of NCLB’s requirements. Then the 20022003 school year began, during which the 2003 NAEP was administered. Using 2003 as a NAEP baseline assumes that none of these activities—previous accountability systems, public lists of schools in need of improvement, anticipatory policy shifts—influenced achievement. That is unlikely.^{[vi]}
Unlike NCLB, there was no “preCCSS” state version of Common Core. States vary in how quickly and aggressively they have implemented CCSS. For the BCR analyses, two indexes were constructed to model CCSS implementation. They are based on surveys of state education agencies and named for the two years that the surveys were conducted. The 2011 survey reported the number of programs (e.g., professional development, new materials) on which states reported spending federal funds to implement CCSS. Strong implementers spent money on more activities. The 2011 index was used to investigate eighth grade math achievement in the 2014 BCR. A new implementation index was created for this year’s study of reading achievement. The 2013 index is based on a survey asking states when they planned to complete full implementation of CCSS in classrooms. Strong states aimed for full implementation by 20122013 or earlier.
Fourth grade NAEP reading scores serve as the achievement measure. Why fourth grade and not eighth? Reading instruction is a key activity of elementary classrooms but by eighth grade has all but disappeared. What remains of “reading” as an independent subject, which has typically morphed into the study of literature, is subsumed under the EnglishLanguage Arts curriculum, a catchall term that also includes writing, vocabulary, listening, and public speaking. Most students in fourth grade are in selfcontained classes; they receive instruction in all subjects from one teacher. The impact of CCSS on reading instruction—the recommendation that nonfiction take a larger role in reading materials is a good example—will be concentrated in the activities of a single teacher in elementary schools. The burden for meeting CCSS’s press for nonfiction, on the other hand, is expected to be shared by all middle and high school teachers.^{[vii] }
Table 21 displays NAEP gains using the 2011 implementation index. The four year period between 2009 and 2013 is broken down into two parts: 20092011 and 20112013. Nineteen states are categorized as “strong” implementers of CCSS on the 2011 index, and from 20092013, they outscored the four states that did not adopt CCSS by a little more than one scale score point (0.87 vs. 0.24 for a 1.11 difference). The nonadopters are the logical control group for CCSS, but with only four states in that category—Alaska, Nebraska, Texas, and Virginia—it is sensitive to big changes in one or two states. Alaska and Texas both experienced a decline in fourth grade reading scores from 20092013.
The 1.11 point advantage in reading gains for strong CCSS implementers is similar to the 1.27 point advantage reported last year for eighth grade math. Both are small. The reading difference in favor of CCSS is equal to approximately 0.03 standard deviations of the 2009 baseline reading score. Also note that the differences were greater in 20092011 than in 20112013 and that the “medium” implementers performed as well as or better than the strong implementers over the entire four year period (gain of 0.99).
Table 22 displays calculations using the 2013 implementation index. Twelve states are rated as strong CCSS implementers, seven fewer than on the 2011 index.^{[viii]} Data for the nonadopters are the same as in the previous table. In 20092013, the strong implementers gained 1.27 NAEP points compared to 0.24 among the nonadopters, a difference of 1.51 points. The thirtyfour states rated as medium implementers gained 0.82. The strong implementers on this index are states that reported full implementation of CCSSELA by 2013. Their larger gain in 20112013 (1.08 points) distinguishes them from the strong implementers in the previous table. The overall advantage of 1.51 points over nonadopters represents about 0.04 standard deviations of the 2009 NAEP reading score, not a difference with real world significance. Taken together, the 2011 and 2013 indexes estimate that NAEP reading gains from 20092013 were one to one and onehalf scale score points larger in the strong CCSS implementation states compared to the states that did not adopt CCSS.
As noted above, the 2013 implementation index is based on when states scheduled full implementation of CCSS in classrooms. Other than reading achievement, does the index seem to reflect changes in any other classroom variable believed to be related to CCSS implementation? If the answer is “yes,” that would bolster confidence that the index is measuring changes related to CCSS implementation.
Let’s examine the types of literature that students encounter during instruction. Perhaps the most controversial recommendation in the CCSSELA standards is the call for teachers to shift the content of reading materials away from stories and other fictional forms of literature in favor of more nonfiction. NAEP asks fourth grade teachers the extent to which they teach fiction and nonfiction over the course of the school year (see Figure 21).
Historically, fiction dominates fourth grade reading instruction. It still does. The percentage of teachers reporting that they teach fiction to a “large extent” exceeded the percentage answering “large extent” for nonfiction by 23 points in 2009 and 25 points in 2011. In 2013, the difference narrowed to only 15 percentage points, primarily because of nonfiction’s increased use. Fiction still dominated in 2013, but not by as much as in 2009.
The differences reported in Table 23 are national indicators of fiction’s declining prominence in fourth grade reading instruction. What about the states? We know that they were involved to varying degrees with the implementation of Common Core from 20092013. Is there evidence that fiction’s prominence was more likely to weaken in states most aggressively pursuing CCSS implementation?
Table 23 displays the data tackling that question. Fourth grade teachers in strong implementation states decisively favored the use of fiction over nonfiction in 2009 and 2011. But the prominence of fiction in those states experienced a large decline in 2013 (12.4 percentage points). The decline for the entire four year period, 20092013, was larger in the strong implementation states (10.8) than in the medium implementation (7.5) or nonadoption states (9.8).
This section of the Brown Center Report analyzed NAEP data and two indexes of CCSS implementation, one based on data collected in 2011, the second from data collected in 2013. NAEP scores for 20092013 were examined. Fourth grade reading scores improved by 1.11 scale score points in states with strong implementation of CCSS compared to states that did not adopt CCSS. A similar comparison in last year’s BCR found a 1.27 point difference on NAEP’s eighth grade math test, also in favor of states with strong implementation of CCSS. These differences, although certainly encouraging to CCSS supporters, are quite small, amounting to (at most) 0.04 standard deviations (SD) on the NAEP scale. A threshold of 0.20 SD—five times larger—is often invoked as the minimum size for a test score change to be regarded as noticeable. The current study’s findings are also merely statistical associations and cannot be used to make causal claims. Perhaps other factors are driving test score changes, unmeasured by NAEP or the other sources of data analyzed here.
The analysis also found that fourth grade teachers in strong implementation states are more likely to be shifting reading instruction from fiction to nonfiction texts. That trend should be monitored closely to see if it continues. Other events to keep an eye on as the Common Core unfolds include the following:
1. The 2015 NAEP scores, typically released in the late fall, will be important for the Common Core. In most states, the first CCSSaligned state tests will be given in the spring of 2015. Based on the earlier experiences of Kentucky and New York, results are expected to be disappointing. Common Core supporters can respond by explaining that assessments given for the first time often produce disappointing results. They will also claim that the tests are more rigorous than previous state assessments. But it will be difficult to explain stagnant or falling NAEP scores in an era when implementing CCSS commands so much attention.
2. Assessment will become an important implementation variable in 2015 and subsequent years. For analysts, the strategy employed here, modeling different indicators based on information collected at different stages of implementation, should become even more useful. Some states are planning to use Smarter Balanced Assessments, others are using the Partnership for Assessment of Readiness for College and Careers (PARCC), and still others are using their own homegrown tests. To capture variation among the states on this important dimension of implementation, analysts will need to use indicators that are uptodate.
3. The politics of Common Core injects a dynamic element into implementation. The status of implementation is constantly changing. States may choose to suspend, to delay, or to abandon CCSS. That will require analysts to regularly reconfigure which states are considered “in” Common Core and which states are “out.” To further complicate matters, states may be “in” some years and “out” in others.
A final word. When the 2014 BCR was released, many CCSS supporters commented that it is too early to tell the effects of Common Core. The point that states may need more time operating under CCSS to realize its full effects certainly has merit. But that does not discount everything states have done so far—including professional development, purchasing new textbooks and other instructional materials, designing new assessments, buying and installing computer systems, and conducting hearings and public outreach—as part of implementing the standards. Some states are in their fifth year of implementation. It could be that states need more time, but innovations can also produce their biggest “pop” earlier in implementation rather than later. Kentucky was one of the earliest states to adopt and implement CCSS. That state’s NAEP fourth grade reading score declined in both 20092011 and 20112013. The optimism of CCSS supporters is understandable, but a one and a half point NAEP gain might be as good as it gets for CCSS.
[i] These ideas were first introduced in a 2013 Brown Center Chalkboard post I authored, entitled, “When Does a Policy Start?”
[ii] Maria Glod, “Since NCLB, Math and Reading Scores Rise for Ages 9 and 13,” Washington Post, April 29, 2009.
[iii] Mark Schneider, “NAEP Math Results Hold Bad News for NCLB,” AEIdeas (Washington, D.C.: American Enterprise Institute, 2009).
[iv] Lisa Guisbond with Monty Neill and Bob Schaeffer, NCLB’s Lost Decade for Educational Progress: What Can We Learn from this Policy Failure? (Jamaica Plain, MA: FairTest, 2012).
[v] Derek Neal and Diane Schanzenbach, “Left Behind by Design: Proficiency Counts and TestBased Accountability,” NBER Working Paper No. W13293 (Cambridge: National Bureau of Economic Research, 2007), 13.
[vi] Careful analysts of NCLB have allowed different states to have different starting dates: see Thomas Dee and Brian A. Jacob, “Evaluating NCLB,” Education Next 10, no. 3 (Summer 2010); Manyee Wong, Thomas D. Cook, and Peter M. Steiner, “No Child Left Behind: An Interim Evaluation of Its Effects on Learning Using Two Interrupted Time Series Each with Its Own NonEquivalent Comparison Series,” Working Paper 0911 (Evanston, IL: Northwestern University Institute for Policy Research, 2009).
[vii] Common Core State Standards Initiative. “English Language Arts Standards, Key Design Consideration.” Retrieved from: http://www.corestandards.org/ELALiteracy/introduction/keydesignconsideration/
[viii] Twelve states shifted downward from strong to medium and five states shifted upward from medium to strong, netting out to a seven state swing.
« Part I: Girls, boys, and reading  Part III: Student Engagement » 
Part II of the 2015 Brown Center Report on American Education
Over the next several years, policy analysts will evaluate the impact of the Common Core State Standards (CCSS) on U.S. education. The task promises to be challenging. The question most analysts will focus on is whether the CCSS is good or bad policy. This section of the Brown Center Report (BCR) tackles a set of seemingly innocuous questions compared to the hotbutton question of whether Common Core is wise or foolish. The questions all have to do with when Common Core actually started, or more precisely, when the Common Core started having an effect on student learning. And if it hasn’t yet had an effect, how will we know that CCSS has started to influence student achievement?
The analysis below probes this issue empirically, hopefully persuading readers that deciding when a policy begins is elemental to evaluating its effects. The question of a policy’s starting point is not always easy to answer. Yet the answer has consequences. You can’t figure out whether a policy worked or not unless you know when it began.^{[i]}
The analysis uses surveys of state implementation to model different CCSS starting points for states and produces a second early report card on how CCSS is doing. The first report card, focusing on math, was presented in last year’s BCR. The current study updates state implementation ratings that were presented in that report and extends the analysis to achievement in reading. The goal is not only to estimate CCSS’s early impact, but also to lay out a fair approach for establishing when the Common Core’s impact began—and to do it now before data are generated that either critics or supporters can use to bolster their arguments. The experience of No Child Left Behind (NCLB) illustrates this necessity.
After the 2008 National Assessment of Educational Progress (NAEP) scores were released, former Secretary of Education Margaret Spellings claimed that the new scores showed “we are on the right track.”^{[ii]} She pointed out that NAEP gains in the previous decade, 19992009, were much larger than in prior decades. Mark Schneider of the American Institutes of Research (and a former Commissioner of the National Center for Education Statistics [NCES]) reached a different conclusion. He compared NAEP gains from 19962003 to 20032009 and declared NCLB’s impact disappointing. “The preNCLB gains were greater than the postNCLB gains.”^{[iii]} It is important to highlight that Schneider used the 2003 NAEP scores as the starting point for assessing NCLB. A report from FairTest on the tenth anniversary of NCLB used the same demarcation for pre and postNCLB time frames.^{[iv]} FairTest is an advocacy group critical of high stakes testing—and harshly critical of NCLB—but if the 2003 starting point for NAEP is accepted, its conclusion is indisputable, “NAEP score improvement slowed or stopped in both reading and math after NCLB was implemented.”
Choosing 2003 as NCLB’s starting date is intuitively appealing. The law was introduced, debated, and passed by Congress in 2001. President Bush signed NCLB into law on January 8, 2002. It takes time to implement any law. The 2003 NAEP is arguably the first chance that the assessment had to register NCLB’s effects.
Selecting 2003 is consequential, however. Some of the largest gains in NAEP’s history were registered between 2000 and 2003. Once 2003 is established as a starting point (or baseline), pre2003 gains become “preNCLB.” But what if the 2003 NAEP scores were influenced by NCLB? Experiments evaluating the effects of new drugs collect baseline data from subjects before treatment, not after the treatment has begun. Similarly, evaluating the effects of public policies require that baseline data are not influenced by the policies under evaluation.
Avoiding such problems is particularly difficult when state or local policies are adopted nationally. The federal effort to establish a speed limit of 55 miles per hour in the 1970s is a good example. Several states already had speed limits of 55 mph or lower prior to the federal law’s enactment. Moreover, a few states lowered speed limits in anticipation of the federal limit while the bill was debated in Congress. On the day President Nixon signed the bill into law—January 2, 1974—the Associated Press reported that only 29 states would be required to lower speed limits. Evaluating the effects of the 1974 law with national data but neglecting to adjust for what states were already doing would obviously yield tainted baseline data.
There are comparable reasons for questioning 2003 as a good baseline for evaluating NCLB’s effects. The key components of NCLB’s accountability provisions—testing students, publicizing the results, and holding schools accountable for results—were already in place in nearly half the states. In some states they had been in place for several years. The 1999 iteration of Quality Counts, Education Week’s annual report on statelevel efforts to improve public education, entitled Rewarding Results, Punishing Failure, was devoted to state accountability systems and the assessments underpinning them. Testing and accountability are especially important because they have drawn fire from critics of NCLB, a law that wasn’t passed until years later.
The Congressional debate of NCLB legislation took all of 2001, allowing states to pass anticipatory policies. Derek Neal and Diane Whitmore Schanzenbach reported that “with the passage of NCLB lurking on the horizon,” Illinois placed hundreds of schools on a watch list and declared that future state testing would be high stakes.^{[v]} In the summer and fall of 2002, with NCLB now the law of the land, state after state released lists of schools falling short of NCLB’s requirements. Then the 20022003 school year began, during which the 2003 NAEP was administered. Using 2003 as a NAEP baseline assumes that none of these activities—previous accountability systems, public lists of schools in need of improvement, anticipatory policy shifts—influenced achievement. That is unlikely.^{[vi]}
Unlike NCLB, there was no “preCCSS” state version of Common Core. States vary in how quickly and aggressively they have implemented CCSS. For the BCR analyses, two indexes were constructed to model CCSS implementation. They are based on surveys of state education agencies and named for the two years that the surveys were conducted. The 2011 survey reported the number of programs (e.g., professional development, new materials) on which states reported spending federal funds to implement CCSS. Strong implementers spent money on more activities. The 2011 index was used to investigate eighth grade math achievement in the 2014 BCR. A new implementation index was created for this year’s study of reading achievement. The 2013 index is based on a survey asking states when they planned to complete full implementation of CCSS in classrooms. Strong states aimed for full implementation by 20122013 or earlier.
Fourth grade NAEP reading scores serve as the achievement measure. Why fourth grade and not eighth? Reading instruction is a key activity of elementary classrooms but by eighth grade has all but disappeared. What remains of “reading” as an independent subject, which has typically morphed into the study of literature, is subsumed under the EnglishLanguage Arts curriculum, a catchall term that also includes writing, vocabulary, listening, and public speaking. Most students in fourth grade are in selfcontained classes; they receive instruction in all subjects from one teacher. The impact of CCSS on reading instruction—the recommendation that nonfiction take a larger role in reading materials is a good example—will be concentrated in the activities of a single teacher in elementary schools. The burden for meeting CCSS’s press for nonfiction, on the other hand, is expected to be shared by all middle and high school teachers.^{[vii] }
Table 21 displays NAEP gains using the 2011 implementation index. The four year period between 2009 and 2013 is broken down into two parts: 20092011 and 20112013. Nineteen states are categorized as “strong” implementers of CCSS on the 2011 index, and from 20092013, they outscored the four states that did not adopt CCSS by a little more than one scale score point (0.87 vs. 0.24 for a 1.11 difference). The nonadopters are the logical control group for CCSS, but with only four states in that category—Alaska, Nebraska, Texas, and Virginia—it is sensitive to big changes in one or two states. Alaska and Texas both experienced a decline in fourth grade reading scores from 20092013.
The 1.11 point advantage in reading gains for strong CCSS implementers is similar to the 1.27 point advantage reported last year for eighth grade math. Both are small. The reading difference in favor of CCSS is equal to approximately 0.03 standard deviations of the 2009 baseline reading score. Also note that the differences were greater in 20092011 than in 20112013 and that the “medium” implementers performed as well as or better than the strong implementers over the entire four year period (gain of 0.99).
Table 22 displays calculations using the 2013 implementation index. Twelve states are rated as strong CCSS implementers, seven fewer than on the 2011 index.^{[viii]} Data for the nonadopters are the same as in the previous table. In 20092013, the strong implementers gained 1.27 NAEP points compared to 0.24 among the nonadopters, a difference of 1.51 points. The thirtyfour states rated as medium implementers gained 0.82. The strong implementers on this index are states that reported full implementation of CCSSELA by 2013. Their larger gain in 20112013 (1.08 points) distinguishes them from the strong implementers in the previous table. The overall advantage of 1.51 points over nonadopters represents about 0.04 standard deviations of the 2009 NAEP reading score, not a difference with real world significance. Taken together, the 2011 and 2013 indexes estimate that NAEP reading gains from 20092013 were one to one and onehalf scale score points larger in the strong CCSS implementation states compared to the states that did not adopt CCSS.
As noted above, the 2013 implementation index is based on when states scheduled full implementation of CCSS in classrooms. Other than reading achievement, does the index seem to reflect changes in any other classroom variable believed to be related to CCSS implementation? If the answer is “yes,” that would bolster confidence that the index is measuring changes related to CCSS implementation.
Let’s examine the types of literature that students encounter during instruction. Perhaps the most controversial recommendation in the CCSSELA standards is the call for teachers to shift the content of reading materials away from stories and other fictional forms of literature in favor of more nonfiction. NAEP asks fourth grade teachers the extent to which they teach fiction and nonfiction over the course of the school year (see Figure 21).
Historically, fiction dominates fourth grade reading instruction. It still does. The percentage of teachers reporting that they teach fiction to a “large extent” exceeded the percentage answering “large extent” for nonfiction by 23 points in 2009 and 25 points in 2011. In 2013, the difference narrowed to only 15 percentage points, primarily because of nonfiction’s increased use. Fiction still dominated in 2013, but not by as much as in 2009.
The differences reported in Table 23 are national indicators of fiction’s declining prominence in fourth grade reading instruction. What about the states? We know that they were involved to varying degrees with the implementation of Common Core from 20092013. Is there evidence that fiction’s prominence was more likely to weaken in states most aggressively pursuing CCSS implementation?
Table 23 displays the data tackling that question. Fourth grade teachers in strong implementation states decisively favored the use of fiction over nonfiction in 2009 and 2011. But the prominence of fiction in those states experienced a large decline in 2013 (12.4 percentage points). The decline for the entire four year period, 20092013, was larger in the strong implementation states (10.8) than in the medium implementation (7.5) or nonadoption states (9.8).
This section of the Brown Center Report analyzed NAEP data and two indexes of CCSS implementation, one based on data collected in 2011, the second from data collected in 2013. NAEP scores for 20092013 were examined. Fourth grade reading scores improved by 1.11 scale score points in states with strong implementation of CCSS compared to states that did not adopt CCSS. A similar comparison in last year’s BCR found a 1.27 point difference on NAEP’s eighth grade math test, also in favor of states with strong implementation of CCSS. These differences, although certainly encouraging to CCSS supporters, are quite small, amounting to (at most) 0.04 standard deviations (SD) on the NAEP scale. A threshold of 0.20 SD—five times larger—is often invoked as the minimum size for a test score change to be regarded as noticeable. The current study’s findings are also merely statistical associations and cannot be used to make causal claims. Perhaps other factors are driving test score changes, unmeasured by NAEP or the other sources of data analyzed here.
The analysis also found that fourth grade teachers in strong implementation states are more likely to be shifting reading instruction from fiction to nonfiction texts. That trend should be monitored closely to see if it continues. Other events to keep an eye on as the Common Core unfolds include the following:
1. The 2015 NAEP scores, typically released in the late fall, will be important for the Common Core. In most states, the first CCSSaligned state tests will be given in the spring of 2015. Based on the earlier experiences of Kentucky and New York, results are expected to be disappointing. Common Core supporters can respond by explaining that assessments given for the first time often produce disappointing results. They will also claim that the tests are more rigorous than previous state assessments. But it will be difficult to explain stagnant or falling NAEP scores in an era when implementing CCSS commands so much attention.
2. Assessment will become an important implementation variable in 2015 and subsequent years. For analysts, the strategy employed here, modeling different indicators based on information collected at different stages of implementation, should become even more useful. Some states are planning to use Smarter Balanced Assessments, others are using the Partnership for Assessment of Readiness for College and Careers (PARCC), and still others are using their own homegrown tests. To capture variation among the states on this important dimension of implementation, analysts will need to use indicators that are uptodate.
3. The politics of Common Core injects a dynamic element into implementation. The status of implementation is constantly changing. States may choose to suspend, to delay, or to abandon CCSS. That will require analysts to regularly reconfigure which states are considered “in” Common Core and which states are “out.” To further complicate matters, states may be “in” some years and “out” in others.
A final word. When the 2014 BCR was released, many CCSS supporters commented that it is too early to tell the effects of Common Core. The point that states may need more time operating under CCSS to realize its full effects certainly has merit. But that does not discount everything states have done so far—including professional development, purchasing new textbooks and other instructional materials, designing new assessments, buying and installing computer systems, and conducting hearings and public outreach—as part of implementing the standards. Some states are in their fifth year of implementation. It could be that states need more time, but innovations can also produce their biggest “pop” earlier in implementation rather than later. Kentucky was one of the earliest states to adopt and implement CCSS. That state’s NAEP fourth grade reading score declined in both 20092011 and 20112013. The optimism of CCSS supporters is understandable, but a one and a half point NAEP gain might be as good as it gets for CCSS.
[i] These ideas were first introduced in a 2013 Brown Center Chalkboard post I authored, entitled, “When Does a Policy Start?”
[ii] Maria Glod, “Since NCLB, Math and Reading Scores Rise for Ages 9 and 13,” Washington Post, April 29, 2009.
[iii] Mark Schneider, “NAEP Math Results Hold Bad News for NCLB,” AEIdeas (Washington, D.C.: American Enterprise Institute, 2009).
[iv] Lisa Guisbond with Monty Neill and Bob Schaeffer, NCLB’s Lost Decade for Educational Progress: What Can We Learn from this Policy Failure? (Jamaica Plain, MA: FairTest, 2012).
[v] Derek Neal and Diane Schanzenbach, “Left Behind by Design: Proficiency Counts and TestBased Accountability,” NBER Working Paper No. W13293 (Cambridge: National Bureau of Economic Research, 2007), 13.
[vi] Careful analysts of NCLB have allowed different states to have different starting dates: see Thomas Dee and Brian A. Jacob, “Evaluating NCLB,” Education Next 10, no. 3 (Summer 2010); Manyee Wong, Thomas D. Cook, and Peter M. Steiner, “No Child Left Behind: An Interim Evaluation of Its Effects on Learning Using Two Interrupted Time Series Each with Its Own NonEquivalent Comparison Series,” Working Paper 0911 (Evanston, IL: Northwestern University Institute for Policy Research, 2009).
[vii] Common Core State Standards Initiative. “English Language Arts Standards, Key Design Consideration.” Retrieved from: http://www.corestandards.org/ELALiteracy/introduction/keydesignconsideration/
[viii] Twelve states shifted downward from strong to medium and five states shifted upward from medium to strong, netting out to a seven state swing.
« Part I: Girls, boys, and reading  Part III: Student Engagement » 
Part III of the 2015 Brown Center Report on American Education
Student engagement refers to the intensity with which students apply themselves to learning in school. Traits such as motivation, enjoyment, and curiosity—characteristics that have interested researchers for a long time—have been joined recently by new terms such as, “grit,” which now approaches cliché status. International assessments collect data from students on characteristics related to engagement. This study looks at data from the Program for International Student Assessment (PISA), an international test given to fifteenyearolds. In the U.S., most PISA students are in the fall of their sophomore year. The high school years are a time when many observers worry that students lose interest in school.
Compared to their peers around the world, how do U.S. students appear on measures of engagement? Are national indicators of engagement related to achievement? This analysis concludes that American students are about average in terms of engagement. Data reveal that several countries noted for their superior ranking on PISA—e.g., Korea, Japan, Finland, Poland, and the Netherlands—score below the U.S. on measures of student engagement. Thus, the relationship of achievement to student engagement is not clear cut, with some evidence pointing toward a weak positive relationship and other evidence indicating a modest negative relationship.
Education studies differ in units of analysis. Some studies report data on individuals, with each student serving as an observation. Studies of new reading or math programs, for example, usually report an average gain score or effect size representing the impact of the program on the average student. Others studies report aggregated data, in which test scores or other measurements are averaged to yield a group score. Test scores of schools, districts, states, or countries are constructed like that. These scores represent the performance of groups, with each group serving as a single observation, but they are really just data from individuals that have been aggregated to the group level.
Aggregated units are particularly useful for policy analysts. Analysts are interested in how Fairfax County or the state of Virginia or the United States is doing. Governmental bodies govern those jurisdictions and policymakers craft policy for all of the citizens within the political jurisdiction—not for an individual.
The analytical unit is especially important when investigating topics like student engagement and their relationships with achievement. Those relationships are inherently individual, focusing on the interaction of psychological characteristics. They are also prone to reverse causality, meaning that the direction of cause and effect cannot readily be determined. Consider selfesteem and academic achievement. Determining which one is cause and which is effect has been debated for decades. Students who are good readers enjoy books, feel pretty good about their reading abilities, and spend more time reading than other kids. The possibility of reverse causality is one reason that beginning statistics students learn an important rule: correlation is not causation.
Starting with the first international assessments in the 1960s, a curious pattern has emerged. Data on students’ attitudes toward studying school subjects, when examined on a national level, often exhibit the opposite relationship with achievement than one would expect. The 2006 Brown Center Report (BCR) investigated the phenomenon in a study of “the happiness factor” in learning.^{[i]} Test scores of fourth graders in 25 countries and eighth graders in 46 countries were analyzed. Students in countries with low math scores were more likely to report that they enjoyed math than students in highscoring countries. Correlation coefficients for the association of enjoyment and achievement were 0.67 at fourth grade and 0.75 at eighth grade.
Confidence in math performance was also inversely related to achievement. Correlation coefficients for national achievement and the percentage of students responding affirmatively to the statement, “I usually do well in mathematics,” were 0.58 among fourth graders and 0.64 among eighth graders. Nations with the most confident math students tend to perform poorly on math tests; nations with the least confident students do quite well.
That is odd. What’s going on? A comparison of Singapore and the U.S. helps unravel the puzzle. The data in figure 31 are for eighth graders on the 2003 Trends in Mathematics and Science Study (TIMSS). U.S. students were very confident—84% either agreed a lot or a little (39% + 45%) with the statement that they usually do well in mathematics. In Singapore, the figure was 64% (46% + 18%). With a score of 605, however, Singaporean students registered about one full standard deviation (80 points) higher on the TIMSS math test compared to the U.S. score of 504.
When withincountry data are examined, the relationship exists in the expected direction. In Singapore, highly confident students score 642, approximately 100 points above the leastconfident students (551). In the U.S., the gap between the most and leastconfident students was also about 100 points—but at a much lower level on the TIMSS scale, at 541 and 448. Note that the leastconfident Singaporean eighth grader still outscores the mostconfident American, 551 to 541.
The lesson is that the unit of analysis must be considered when examining data on students’ psychological characteristics and their relationship to achievement. If presented with countrylevel associations, one should wonder what the withincountry associations are. And vice versa. Let’s keep that caution in mind as we now turn to data on fifteenyearolds’ intrinsic motivation and how nations scored on the 2012 PISA.
PISA’s index of intrinsic motivation to learn mathematics comprises responses to four items on the student questionnaire: 1) I enjoy reading about mathematics; 2) I look forward to my mathematics lessons; 3) I do mathematics because I enjoy it; and 4) I am interested in the things I learn in mathematics. Figure 32 shows the percentage of students in OECD countries—thirty of the most economically developed nations in the world—responding that they agree or strongly agree with the statements. A little less than onethird (30.6%) of students responded favorably to reading about math, 35.5% responded favorably to looking forward to math lessons, 38.2% reported doing math because they enjoy it, and 52.9% said they were interested in the things they learn in math. A ballpark estimate, then, is that onethird to onehalf of students respond affirmatively to the individual components of PISA’s intrinsic motivation index.
Table 31 presents national scores on the 2012 index of intrinsic motivation to learn mathematics. The index is scaled with an average of 0.00 and a standard deviation of 1.00. Student index scores are averaged to produce a national score. The scores of 39 nations are reported—29 OECD countries and 10 partner countries.^{[ii]} Indonesia appears to have the most intrinsically motivated students in the world (0.80), followed by Thailand (0.77), Mexico (0.67), and Tunisia (0.59). It is striking that developing countries top the list. Universal education at the elementary level is only a recent reality in these countries, and they are still struggling to deliver universally accessible high schools, especially in rural areas and especially to girls. The students who sat for PISA may be an unusually motivated group. They also may be deeply appreciative of having an opportunity that their parents never had.
The U.S. scores about average (0.08) on the index, statistically about the same as New Zealand, Australia, Ireland, and Canada. The bottom of the table is extremely interesting. Among the countries with the least intrinsically motivated kids are some PISA high flyers. Austria has the least motivated students (0.35), but that is not statistically significantly different from the score for the Netherlands (0.33). What’s surprising is that Korea (0.20), Finland (0.22), Japan (0.23), and Belgium (0.24) score at the bottom of the intrinsic motivation index even though they historically do quite well on the PISA math test.
Let’s now dig a little deeper into the intrinsic motivation index. Two components of the index are how students respond to “I do mathematics because I enjoy it” and “I look forward to my mathematics lessons.” These sentiments are directly related to schooling. Whether students enjoy math or look forward to math lessons is surely influenced by factors such as teachers and curriculum. Table 32 rank orders PISA countries by the percentage of students who “agree” or “strongly agree” with the questionnaire prompts. The nations’ 2012 PISA math scores are also tabled. Indonesia scores at the top of both rankings, with 78.3% enjoying math and 72.3% looking forward to studying the subject. However, Indonesia’s PISA math score of 375 is more than one full standard deviation below the international mean of 494 (standard deviation of 92). The tops of the tables are primarily dominated by lowperforming countries, but not exclusively so. Denmark is an averageperforming nation that has high rankings on both sentiments. Liechtenstein, Hong KongChina, and Switzerland do well on the PISA math test and appear to have contented, positivelyoriented students.
Several nations of interest are shaded. The bar across the middle of the tables, encompassing Australia and Germany, demarcates the median of the two lists, with 19 countries above and 19 below that position. The United States registers above the median on looking forward to math lessons (45.4%) and a bit below the median on enjoyment (36.6%). A similar proportion of students in Poland—a country recently celebrated in popular media and in Amanda Ripley’s book, The Smartest Kids in the World,^{[iii] }for making great strides on PISA tests—enjoy math (36.1%), but only 21.3% of Polish kids look forward to their math lessons, very near the bottom of the list, anchored by Netherlands at 19.8%.
Korea also appears in Ripley’s book. It scores poorly on both items. Only 30.7% of Korean students enjoy math, and less than that, 21.8%, look forward to studying the subject. Korean education is depicted unflatteringly in Ripley’s book—as an academic pressure cooker lacking joy or purpose—so its standing here is not surprising. But Finland is another matter. It is portrayed as laidback and studentcentered, concerned with making students feel relaxed and engaged. Yet, only 28.8% of Finnish students say that they study mathematics because they enjoy it (among the bottom four countries) and only 24.8% report that they look forward to math lessons (among the bottom seven countries). Korea, the pressure cooker, and Finland, the laidback paradise, look about the same on these dimensions.
Another country that is admired for its educational system, Japan, does not fare well on these measures. Only 30.8% of students in Japan enjoy mathematics, despite the boisterous, enthusiastic classrooms that appear in Elizabeth Green’s recent book, Building a Better Teacher.^{[iv]} Japan does better on the percentage of students looking forward to their math lessons (33.7%), but still places far below the U.S. Green’s book describes classrooms with younger students, but even so, surveys of Japanese fourth and eighth graders’ attitudes toward studying mathematics report results similar to those presented here. American students say that they enjoy their math classes and studying math more than students in Finland, Japan, and Korea.
It is clear from Table 32 that at the national level, enjoying math is not positively related to math achievement. Nor is looking forward to one’s math lessons. The correlation coefficients reported in the last row of the table quantify the magnitude of the inverse relationships. The 0.58 and 0.57 coefficients indicate a moderately negative association, meaning, in plain English, that countries with students who enjoy math or look forward to math lessons tend to score below average on the PISA math test. And highscoring nations tend to register below average on these measures of student engagement. Countrylevel associations, however, should be augmented with studentlevel associations that are calculated within each country.
The 2012 PISA volume on student engagement does not present withincountry correlation coefficients on intrinsic motivation or its components. But it does offer withincountry correlations of math achievement with three other characteristics relevant to student engagement. Table 33 displays statistics for students’ responses to: 1) if they feel like they belong at school; 2) their attitudes toward school, an index composed of four factors;^{[v]} and 3) whether they had arrived late for school in the two weeks prior to the PISA test. These measures reflect an excellent mix of behaviors and dispositions.
The withincountry correlations trend in the direction expected but they are small in magnitude. Correlation coefficients for math performance and a sense of belonging at school range from 0.02 to 0.18, meaning that the country exhibiting the strongest relationship between achievement and a sense of belonging—Thailand, with a 0.18 correlation coefficient—isn’t registering a strong relationship at all. The OECD average is 0.08, which is trivial. The U.S. correlation coefficient, 0.07, is also trivial. The relationship of achievement with attitudes toward school is slightly stronger (OECD average of 0.11), but is still weak.
Of the three characteristics, arriving late for school shows the strongest correlation, an unsurprising inverse relationship of 0.14 in OECD countries and 0.20 in the U.S. Students who tend to be tardy also tend to score lower on math tests. But, again, the magnitude is surprisingly small. The coefficients are statistically significant because of large sample sizes, but in a real world “would I notice this if it were in my face?” sense, no, the correlation coefficients are suggesting not much of a relationship at all.
The PISA report presents withincountry effect sizes for the intrinsic motivation index, calculating the achievement gains associated with a one unit change in the index. One of several interesting findings is that intrinsic motivation is more strongly associated with gains at the top of the achievement distribution, among students at the 90^{th} percentile in math scores, than at the bottom of the distribution, among students at the 10^{th} percentile.
The report summarizes the withincountry effect sizes with this statement: “On average across OECD countries, a change of one unit in the index of intrinsic motivation to learn mathematics translates into a 19 scorepoint difference in mathematics performance.”^{[vi]} This sentence can be easily misinterpreted. It means that within each of the participating countries students who differ by one unit on PISA’s 2012 intrinsic motivation index score about 19 points apart on the 2012 math test. It does not mean that a country that gains one unit on the intrinsic motivation index can expect a 19 point score increase.^{[vii] }
Let’s now see what that association looks like at the national level.
PISA first reported national scores on the index of intrinsic motivation to learn mathematics in 2003. Are gains that countries made on the index associated with gains on PISA’s math test? Table 34 presents a score card on the question, reporting the changes that occurred in thirtynine nations—in both the index and math scores—from 2003 to 2012. Seventeen nations made statistically significant gains on the index; fourteen nations had gains that were, in a statistical sense, indistinguishable from zero—labeled “no change” in the table; and eight nations experienced statistically significant declines in index scores.
The U.S. scored 0.00 in 2003 and 0.08 in 2012, notching a gain of 0.08 on the index (statistically significant). Its PISA math score declined from 483 to 481, a decline of 2 scale score points (not statistically significant).
Table 34 makes it clear that national changes on PISA’s intrinsic motivation index are not associated with changes in math achievement. The countries registering gains on the index averaged a decline of 3.7 points on PISA’s math assessment. The countries that remained about the same on the index had math scores that also remain essentially unchanged (0.09) And the most striking finding: countries that declined on the index (average of 0.15) actually gained an average of 10.3 points on the PISA math scale. Intrinsic motivation went down; math scores went up. The correlation coefficient for the relationship over all, not shown in the table, is 0.30.
The analysis above investigated student engagement. International data from the 2012 PISA were examined on several dimensions of student engagement, focusing on a measure that PISA has employed since 2003, the index of intrinsic motivation to learn mathematics. The U.S. scored near the middle of the distribution on the 2012 index. PISA analysts calculated that, on average, a one unit change in the index was associated with a 19 point gain on the PISA math test. That is the average of withincountry calculations, using studentlevel data that measure the association of intrinsic motivation with PISA score. It represents an effect size of about 0.20—a positive effect, but one that is generally considered small in magnitude.^{[viii]}
The unit of analysis matters. Betweencountry associations often differ from withincountry associations. The current study used a difference in difference approach that calculated the correlation coefficient for two variables at the national level: the change in intrinsic motivation index from 20032012 and change in PISA score for the same time period. That analysis produced a correlation coefficient of 0.30, a negative relationship that is also generally considered small in magnitude.
Neither approach can justify causal claims nor address the possibility of reverse causality occurring—the possibility that high math achievement boosts intrinsic motivation to learn math, rather than, or even in addition to, high levels of motivation leading to greater learning. Poor math achievement may cause intrinsic motivation to fall. Taken together, the analyses lead to the conclusion that PISA provides, at best, weak evidence that raising student motivation is associated with achievement gains. Boosting motivation may even produce declines in achievement.
Here’s the bottom line for what PISA data recommends to policymakers: Programs designed to boost student engagement—perhaps a worthy pursuit even if unrelated to achievement—should be evaluated for their effects in small scale experiments before being adopted broadly. The international evidence does not justify widescale concern over current levels of student engagement in the U.S. or support the hypothesis that boosting student engagement would raise student performance nationally.
Let’s conclude by considering the advantages that nationallevel, difference in difference analyses provide that studentlevel analyses may overlook.
1. They depict policy interventions more accurately. Policies are actions of a political unit affecting all of its members. They do not simply affect the relationship of two characteristics within an individual’s psychology. Policymakers who ask the question, “What happens when a country boosts student engagement?” are asking about a countrylevel phenomenon.
2. Direction of causality can run differently at the individual and group levels. For example, we know that enjoying a school subject and achievement on tests of that subject are positively correlated at the individual level. But they are not always correlated—and can in fact be negatively correlated—at the group level.
3. By using multiple years of panel data and calculating change over time, a difference in difference analysis controls for unobserved variable bias by “baking into the cake” those unobserved variables at the baseline. The unobserved variables are assumed to remain stable over the time period of the analysis. For the cultural factors that many analysts suspect influence betweennation test score differences, stability may be a safe assumption. Difference in difference, then, would be superior to crosssectional analyses in controlling for cultural influences that are omitted from other models.
4. Testing artifacts from a cultural source can also be dampened. Characteristics such as enjoyment are culturally defined, and the language employed to describe them is also culturally bounded. Consider two of the questionnaire items examined above: whether kids “enjoy” math and how much they “look forward” to math lessons. Cultural differences in responding to these prompts will be reflected in betweencountry averages at the baseline, and any subsequent changes will reflect fluctuations net of those initial differences.
[i] Tom Loveless, “The Happiness Factor in Student Learning,” The 2006 Brown Center Report on American Education: How Well are American Students Learning? (Washington, D.C.: The Brookings Institution, 2006).
[ii] All countries with 2003 and 2012 data are included.
[iii] Amanda Ripley, The Smartest Kids in the World: And How They Got That Way (New York, NY: Simon & Schuster, 2013)
[iv] Elizabeth Green, Building a Better Teacher: How Teaching Works (and How to Teach It to Everyone) (New York, NY: W.W. Norton & Company, 2014).
[v] The attitude toward school index is based on responses to: 1) Trying hard at school will help me get a good job, 2) Trying hard at school will help me get into a good college, 3) I enjoy receiving good grades, 4) Trying hard at school is important. See: OECD, PISA 2012 Database, Table III.2.5a.
[vi] OECD, PISA 2012 Results: Ready to Learn: Students’ Engagement, Drive and SelfBeliefs (Volume III) (Paris: PISA, OECD Publishing, 2013), 77.
[vii] PISA originally called the index of intrinsic motivation the index of interest and enjoyment in mathematics, first constructed in 2003. The four questions comprising the index remain identical from 2003 to 212, allowing for comparability. Index values for 2003 scores were rescaled based on 2012 scaling (mean of 0.00 and SD of 1.00), meaning that index values published in PISA reports prior to 2012 will not agree with those published after 2012 (including those analyzed here). See: OECD, PISA 2012 Results: Ready to Learn: Students’ Engagement, Drive and SelfBeliefs (Volume III) (Paris: PISA, OECD Publishing, 2013), 54.
[viii] PISA math scores are scaled with a standard deviation of 100, but the average withincountry standard deviation for OECD nations was 92 on the 2012 math test.
« Part II: Measuring Effects of the Common Core 
Part III of the 2015 Brown Center Report on American Education
Student engagement refers to the intensity with which students apply themselves to learning in school. Traits such as motivation, enjoyment, and curiosity—characteristics that have interested researchers for a long time—have been joined recently by new terms such as, “grit,” which now approaches cliché status. International assessments collect data from students on characteristics related to engagement. This study looks at data from the Program for International Student Assessment (PISA), an international test given to fifteenyearolds. In the U.S., most PISA students are in the fall of their sophomore year. The high school years are a time when many observers worry that students lose interest in school.
Compared to their peers around the world, how do U.S. students appear on measures of engagement? Are national indicators of engagement related to achievement? This analysis concludes that American students are about average in terms of engagement. Data reveal that several countries noted for their superior ranking on PISA—e.g., Korea, Japan, Finland, Poland, and the Netherlands—score below the U.S. on measures of student engagement. Thus, the relationship of achievement to student engagement is not clear cut, with some evidence pointing toward a weak positive relationship and other evidence indicating a modest negative relationship.
Education studies differ in units of analysis. Some studies report data on individuals, with each student serving as an observation. Studies of new reading or math programs, for example, usually report an average gain score or effect size representing the impact of the program on the average student. Others studies report aggregated data, in which test scores or other measurements are averaged to yield a group score. Test scores of schools, districts, states, or countries are constructed like that. These scores represent the performance of groups, with each group serving as a single observation, but they are really just data from individuals that have been aggregated to the group level.
Aggregated units are particularly useful for policy analysts. Analysts are interested in how Fairfax County or the state of Virginia or the United States is doing. Governmental bodies govern those jurisdictions and policymakers craft policy for all of the citizens within the political jurisdiction—not for an individual.
The analytical unit is especially important when investigating topics like student engagement and their relationships with achievement. Those relationships are inherently individual, focusing on the interaction of psychological characteristics. They are also prone to reverse causality, meaning that the direction of cause and effect cannot readily be determined. Consider selfesteem and academic achievement. Determining which one is cause and which is effect has been debated for decades. Students who are good readers enjoy books, feel pretty good about their reading abilities, and spend more time reading than other kids. The possibility of reverse causality is one reason that beginning statistics students learn an important rule: correlation is not causation.
Starting with the first international assessments in the 1960s, a curious pattern has emerged. Data on students’ attitudes toward studying school subjects, when examined on a national level, often exhibit the opposite relationship with achievement than one would expect. The 2006 Brown Center Report (BCR) investigated the phenomenon in a study of “the happiness factor” in learning.^{[i]} Test scores of fourth graders in 25 countries and eighth graders in 46 countries were analyzed. Students in countries with low math scores were more likely to report that they enjoyed math than students in highscoring countries. Correlation coefficients for the association of enjoyment and achievement were 0.67 at fourth grade and 0.75 at eighth grade.
Confidence in math performance was also inversely related to achievement. Correlation coefficients for national achievement and the percentage of students responding affirmatively to the statement, “I usually do well in mathematics,” were 0.58 among fourth graders and 0.64 among eighth graders. Nations with the most confident math students tend to perform poorly on math tests; nations with the least confident students do quite well.
That is odd. What’s going on? A comparison of Singapore and the U.S. helps unravel the puzzle. The data in figure 31 are for eighth graders on the 2003 Trends in Mathematics and Science Study (TIMSS). U.S. students were very confident—84% either agreed a lot or a little (39% + 45%) with the statement that they usually do well in mathematics. In Singapore, the figure was 64% (46% + 18%). With a score of 605, however, Singaporean students registered about one full standard deviation (80 points) higher on the TIMSS math test compared to the U.S. score of 504.
When withincountry data are examined, the relationship exists in the expected direction. In Singapore, highly confident students score 642, approximately 100 points above the leastconfident students (551). In the U.S., the gap between the most and leastconfident students was also about 100 points—but at a much lower level on the TIMSS scale, at 541 and 448. Note that the leastconfident Singaporean eighth grader still outscores the mostconfident American, 551 to 541.
The lesson is that the unit of analysis must be considered when examining data on students’ psychological characteristics and their relationship to achievement. If presented with countrylevel associations, one should wonder what the withincountry associations are. And vice versa. Let’s keep that caution in mind as we now turn to data on fifteenyearolds’ intrinsic motivation and how nations scored on the 2012 PISA.
PISA’s index of intrinsic motivation to learn mathematics comprises responses to four items on the student questionnaire: 1) I enjoy reading about mathematics; 2) I look forward to my mathematics lessons; 3) I do mathematics because I enjoy it; and 4) I am interested in the things I learn in mathematics. Figure 32 shows the percentage of students in OECD countries—thirty of the most economically developed nations in the world—responding that they agree or strongly agree with the statements. A little less than onethird (30.6%) of students responded favorably to reading about math, 35.5% responded favorably to looking forward to math lessons, 38.2% reported doing math because they enjoy it, and 52.9% said they were interested in the things they learn in math. A ballpark estimate, then, is that onethird to onehalf of students respond affirmatively to the individual components of PISA’s intrinsic motivation index.
Table 31 presents national scores on the 2012 index of intrinsic motivation to learn mathematics. The index is scaled with an average of 0.00 and a standard deviation of 1.00. Student index scores are averaged to produce a national score. The scores of 39 nations are reported—29 OECD countries and 10 partner countries.^{[ii]} Indonesia appears to have the most intrinsically motivated students in the world (0.80), followed by Thailand (0.77), Mexico (0.67), and Tunisia (0.59). It is striking that developing countries top the list. Universal education at the elementary level is only a recent reality in these countries, and they are still struggling to deliver universally accessible high schools, especially in rural areas and especially to girls. The students who sat for PISA may be an unusually motivated group. They also may be deeply appreciative of having an opportunity that their parents never had.
The U.S. scores about average (0.08) on the index, statistically about the same as New Zealand, Australia, Ireland, and Canada. The bottom of the table is extremely interesting. Among the countries with the least intrinsically motivated kids are some PISA high flyers. Austria has the least motivated students (0.35), but that is not statistically significantly different from the score for the Netherlands (0.33). What’s surprising is that Korea (0.20), Finland (0.22), Japan (0.23), and Belgium (0.24) score at the bottom of the intrinsic motivation index even though they historically do quite well on the PISA math test.
Let’s now dig a little deeper into the intrinsic motivation index. Two components of the index are how students respond to “I do mathematics because I enjoy it” and “I look forward to my mathematics lessons.” These sentiments are directly related to schooling. Whether students enjoy math or look forward to math lessons is surely influenced by factors such as teachers and curriculum. Table 32 rank orders PISA countries by the percentage of students who “agree” or “strongly agree” with the questionnaire prompts. The nations’ 2012 PISA math scores are also tabled. Indonesia scores at the top of both rankings, with 78.3% enjoying math and 72.3% looking forward to studying the subject. However, Indonesia’s PISA math score of 375 is more than one full standard deviation below the international mean of 494 (standard deviation of 92). The tops of the tables are primarily dominated by lowperforming countries, but not exclusively so. Denmark is an averageperforming nation that has high rankings on both sentiments. Liechtenstein, Hong KongChina, and Switzerland do well on the PISA math test and appear to have contented, positivelyoriented students.
Several nations of interest are shaded. The bar across the middle of the tables, encompassing Australia and Germany, demarcates the median of the two lists, with 19 countries above and 19 below that position. The United States registers above the median on looking forward to math lessons (45.4%) and a bit below the median on enjoyment (36.6%). A similar proportion of students in Poland—a country recently celebrated in popular media and in Amanda Ripley’s book, The Smartest Kids in the World,^{[iii] }for making great strides on PISA tests—enjoy math (36.1%), but only 21.3% of Polish kids look forward to their math lessons, very near the bottom of the list, anchored by Netherlands at 19.8%.
Korea also appears in Ripley’s book. It scores poorly on both items. Only 30.7% of Korean students enjoy math, and less than that, 21.8%, look forward to studying the subject. Korean education is depicted unflatteringly in Ripley’s book—as an academic pressure cooker lacking joy or purpose—so its standing here is not surprising. But Finland is another matter. It is portrayed as laidback and studentcentered, concerned with making students feel relaxed and engaged. Yet, only 28.8% of Finnish students say that they study mathematics because they enjoy it (among the bottom four countries) and only 24.8% report that they look forward to math lessons (among the bottom seven countries). Korea, the pressure cooker, and Finland, the laidback paradise, look about the same on these dimensions.
Another country that is admired for its educational system, Japan, does not fare well on these measures. Only 30.8% of students in Japan enjoy mathematics, despite the boisterous, enthusiastic classrooms that appear in Elizabeth Green’s recent book, Building a Better Teacher.^{[iv]} Japan does better on the percentage of students looking forward to their math lessons (33.7%), but still places far below the U.S. Green’s book describes classrooms with younger students, but even so, surveys of Japanese fourth and eighth graders’ attitudes toward studying mathematics report results similar to those presented here. American students say that they enjoy their math classes and studying math more than students in Finland, Japan, and Korea.
It is clear from Table 32 that at the national level, enjoying math is not positively related to math achievement. Nor is looking forward to one’s math lessons. The correlation coefficients reported in the last row of the table quantify the magnitude of the inverse relationships. The 0.58 and 0.57 coefficients indicate a moderately negative association, meaning, in plain English, that countries with students who enjoy math or look forward to math lessons tend to score below average on the PISA math test. And highscoring nations tend to register below average on these measures of student engagement. Countrylevel associations, however, should be augmented with studentlevel associations that are calculated within each country.
The 2012 PISA volume on student engagement does not present withincountry correlation coefficients on intrinsic motivation or its components. But it does offer withincountry correlations of math achievement with three other characteristics relevant to student engagement. Table 33 displays statistics for students’ responses to: 1) if they feel like they belong at school; 2) their attitudes toward school, an index composed of four factors;^{[v]} and 3) whether they had arrived late for school in the two weeks prior to the PISA test. These measures reflect an excellent mix of behaviors and dispositions.
The withincountry correlations trend in the direction expected but they are small in magnitude. Correlation coefficients for math performance and a sense of belonging at school range from 0.02 to 0.18, meaning that the country exhibiting the strongest relationship between achievement and a sense of belonging—Thailand, with a 0.18 correlation coefficient—isn’t registering a strong relationship at all. The OECD average is 0.08, which is trivial. The U.S. correlation coefficient, 0.07, is also trivial. The relationship of achievement with attitudes toward school is slightly stronger (OECD average of 0.11), but is still weak.
Of the three characteristics, arriving late for school shows the strongest correlation, an unsurprising inverse relationship of 0.14 in OECD countries and 0.20 in the U.S. Students who tend to be tardy also tend to score lower on math tests. But, again, the magnitude is surprisingly small. The coefficients are statistically significant because of large sample sizes, but in a real world “would I notice this if it were in my face?” sense, no, the correlation coefficients are suggesting not much of a relationship at all.
The PISA report presents withincountry effect sizes for the intrinsic motivation index, calculating the achievement gains associated with a one unit change in the index. One of several interesting findings is that intrinsic motivation is more strongly associated with gains at the top of the achievement distribution, among students at the 90^{th} percentile in math scores, than at the bottom of the distribution, among students at the 10^{th} percentile.
The report summarizes the withincountry effect sizes with this statement: “On average across OECD countries, a change of one unit in the index of intrinsic motivation to learn mathematics translates into a 19 scorepoint difference in mathematics performance.”^{[vi]} This sentence can be easily misinterpreted. It means that within each of the participating countries students who differ by one unit on PISA’s 2012 intrinsic motivation index score about 19 points apart on the 2012 math test. It does not mean that a country that gains one unit on the intrinsic motivation index can expect a 19 point score increase.^{[vii] }
Let’s now see what that association looks like at the national level.
PISA first reported national scores on the index of intrinsic motivation to learn mathematics in 2003. Are gains that countries made on the index associated with gains on PISA’s math test? Table 34 presents a score card on the question, reporting the changes that occurred in thirtynine nations—in both the index and math scores—from 2003 to 2012. Seventeen nations made statistically significant gains on the index; fourteen nations had gains that were, in a statistical sense, indistinguishable from zero—labeled “no change” in the table; and eight nations experienced statistically significant declines in index scores.
The U.S. scored 0.00 in 2003 and 0.08 in 2012, notching a gain of 0.08 on the index (statistically significant). Its PISA math score declined from 483 to 481, a decline of 2 scale score points (not statistically significant).
Table 34 makes it clear that national changes on PISA’s intrinsic motivation index are not associated with changes in math achievement. The countries registering gains on the index averaged a decline of 3.7 points on PISA’s math assessment. The countries that remained about the same on the index had math scores that also remain essentially unchanged (0.09) And the most striking finding: countries that declined on the index (average of 0.15) actually gained an average of 10.3 points on the PISA math scale. Intrinsic motivation went down; math scores went up. The correlation coefficient for the relationship over all, not shown in the table, is 0.30.
The analysis above investigated student engagement. International data from the 2012 PISA were examined on several dimensions of student engagement, focusing on a measure that PISA has employed since 2003, the index of intrinsic motivation to learn mathematics. The U.S. scored near the middle of the distribution on the 2012 index. PISA analysts calculated that, on average, a one unit change in the index was associated with a 19 point gain on the PISA math test. That is the average of withincountry calculations, using studentlevel data that measure the association of intrinsic motivation with PISA score. It represents an effect size of about 0.20—a positive effect, but one that is generally considered small in magnitude.^{[viii]}
The unit of analysis matters. Betweencountry associations often differ from withincountry associations. The current study used a difference in difference approach that calculated the correlation coefficient for two variables at the national level: the change in intrinsic motivation index from 20032012 and change in PISA score for the same time period. That analysis produced a correlation coefficient of 0.30, a negative relationship that is also generally considered small in magnitude.
Neither approach can justify causal claims nor address the possibility of reverse causality occurring—the possibility that high math achievement boosts intrinsic motivation to learn math, rather than, or even in addition to, high levels of motivation leading to greater learning. Poor math achievement may cause intrinsic motivation to fall. Taken together, the analyses lead to the conclusion that PISA provides, at best, weak evidence that raising student motivation is associated with achievement gains. Boosting motivation may even produce declines in achievement.
Here’s the bottom line for what PISA data recommends to policymakers: Programs designed to boost student engagement—perhaps a worthy pursuit even if unrelated to achievement—should be evaluated for their effects in small scale experiments before being adopted broadly. The international evidence does not justify widescale concern over current levels of student engagement in the U.S. or support the hypothesis that boosting student engagement would raise student performance nationally.
Let’s conclude by considering the advantages that nationallevel, difference in difference analyses provide that studentlevel analyses may overlook.
1. They depict policy interventions more accurately. Policies are actions of a political unit affecting all of its members. They do not simply affect the relationship of two characteristics within an individual’s psychology. Policymakers who ask the question, “What happens when a country boosts student engagement?” are asking about a countrylevel phenomenon.
2. Direction of causality can run differently at the individual and group levels. For example, we know that enjoying a school subject and achievement on tests of that subject are positively correlated at the individual level. But they are not always correlated—and can in fact be negatively correlated—at the group level.
3. By using multiple years of panel data and calculating change over time, a difference in difference analysis controls for unobserved variable bias by “baking into the cake” those unobserved variables at the baseline. The unobserved variables are assumed to remain stable over the time period of the analysis. For the cultural factors that many analysts suspect influence betweennation test score differences, stability may be a safe assumption. Difference in difference, then, would be superior to crosssectional analyses in controlling for cultural influences that are omitted from other models.
4. Testing artifacts from a cultural source can also be dampened. Characteristics such as enjoyment are culturally defined, and the language employed to describe them is also culturally bounded. Consider two of the questionnaire items examined above: whether kids “enjoy” math and how much they “look forward” to math lessons. Cultural differences in responding to these prompts will be reflected in betweencountry averages at the baseline, and any subsequent changes will reflect fluctuations net of those initial differences.
[i] Tom Loveless, “The Happiness Factor in Student Learning,” The 2006 Brown Center Report on American Education: How Well are American Students Learning? (Washington, D.C.: The Brookings Institution, 2006).
[ii] All countries with 2003 and 2012 data are included.
[iii] Amanda Ripley, The Smartest Kids in the World: And How They Got That Way (New York, NY: Simon & Schuster, 2013)
[iv] Elizabeth Green, Building a Better Teacher: How Teaching Works (and How to Teach It to Everyone) (New York, NY: W.W. Norton & Company, 2014).
[v] The attitude toward school index is based on responses to: 1) Trying hard at school will help me get a good job, 2) Trying hard at school will help me get into a good college, 3) I enjoy receiving good grades, 4) Trying hard at school is important. See: OECD, PISA 2012 Database, Table III.2.5a.
[vi] OECD, PISA 2012 Results: Ready to Learn: Students’ Engagement, Drive and SelfBeliefs (Volume III) (Paris: PISA, OECD Publishing, 2013), 77.
[vii] PISA originally called the index of intrinsic motivation the index of interest and enjoyment in mathematics, first constructed in 2003. The four questions comprising the index remain identical from 2003 to 212, allowing for comparability. Index values for 2003 scores were rescaled based on 2012 scaling (mean of 0.00 and SD of 1.00), meaning that index values published in PISA reports prior to 2012 will not agree with those published after 2012 (including those analyzed here). See: OECD, PISA 2012 Results: Ready to Learn: Students’ Engagement, Drive and SelfBeliefs (Volume III) (Paris: PISA, OECD Publishing, 2013), 54.
[viii] PISA math scores are scaled with a standard deviation of 100, but the average withincountry standard deviation for OECD nations was 92 on the 2012 math test.
« Part II: Measuring Effects of the Common Core 

Editor's Note: The introduction to the 2015 Brown Center Report on American Education appears below. Use the Table of Contents to navigate through the report online, or download a PDF of the full report.
TABLE OF CONTENTS
Part I: Girls, Boys, and Reading
Part II: Measuring Effects of the Common Core
The 2015 Brown Center Report (BCR) represents the 14^{th} edition of the series since the first issue was published in 2000. It includes three studies. Like all previous BCRs, the studies explore independent topics but share two characteristics: they are empirical and based on the best evidence available. The studies in this edition are on the gender gap in reading, the impact of the Common Core State Standards  English Language Arts on reading achievement, and student engagement.
Part one examines the gender gap in reading. Girls outscore boys on practically every reading test given to a large population. And they have for a long time. A 1942 Iowa study found girls performing better than boys on tests of reading comprehension, vocabulary, and basic language skills. Girls have outscored boys on every reading test ever given by the National Assessment of Educational Progress (NAEP)—the first long term trend test was administered in 1971—at ages nine, 13, and 17. The gap is not confined to the U.S. Reading tests administered as part of the Progress in International Reading Literacy Study (PIRLS) and the Program for International Student Assessment (PISA) reveal that the gender gap is a worldwide phenomenon. In more than sixty countries participating in the two assessments, girls are better readers than boys.
Perhaps the most surprising finding is that Finland, celebrated for its extraordinary performance on PISA for over a decade, can take pride in its high standing on the PISA reading test solely because of the performance of that nation’s young women. With its 62 point gap, Finland has the largest gender gap of any PISA participant, with girls scoring 556 and boys scoring 494 points (the OECD average is 496, with a standard deviation of 94). If Finland were only a nation of young men, its PISA ranking would be mediocre.
Part two is about reading achievement, too. More specifically, it’s about reading and the English Language Arts standards of the Common Core (CCSSELA). It’s also about an important decision that policy analysts must make when evaluating public policies—the determination of when a policy begins. How can CCSS be properly evaluated?
Two different indexes of CCSSELA implementation are presented, one based on 2011 data and the other on data collected in 2013. In both years, state education officials were surveyed about their Common Core implementation efforts. Because fortysix states originally signed on to the CCSSELA—and with at least forty still on track for full implementation by 2016—little variability exists among the states in terms of standards policy. Of course, the four states that never adopted CCSSELA can serve as a small control group. But variation is also found in how the states are implementing CCSS. Some states are pursuing an array of activities and aiming for full implementation earlier rather than later. Others have a narrow, targeted implementation strategy and are proceeding more slowly.
The analysis investigates whether CCSSELA implementation is related to 20092013 gains on the fourth grade NAEP reading test. The analysis cannot verify causal relationships between the two variables, only correlations. States that have aggressively implemented CCSSELA (referred to as “strong” implementers in the study) evidence a one to one and onehalf point larger gain on the NAEP scale compared to nonadopters of the standards. This association is similar in magnitude to an advantage found in a study of eighth grade math achievement in last year’s BCR. Although positive, these effects are quite small. When the 2015 NAEP results are released this winter, it will be important for the fate of the Common Core project to see if strong implementers of the CCSSELA can maintain their momentum.
Part three is on student engagement. PISA tests fifteenyearolds on three subjects—reading, math, and science—every three years. It also collects a wealth of background information from students, including their attitudes toward school and learning. When the 2012 PISA results were released, PISA analysts published an accompanying volume, Ready to Learn: Students’ Engagement, Drive, and SelfBeliefs, exploring topics related to student engagement.
Part three provides secondary analysis of several dimensions of engagement found in the PISA report. Intrinsic motivation, the internal rewards that encourage students to learn, is an important component of student engagement. National scores on PISA’s index of intrinsic motivation to learn mathematics are compared to national PISA math scores. Surprisingly, the relationship is negative. Countries with highly motivated kids tend to score lower on the math test; conversely, higherscoring nations tend to have lessmotivated kids.
The same is true for responses to the statements, “I do mathematics because I enjoy it,” and “I look forward to my mathematics lessons.” Countries with students who say that they enjoy math or look forward to their math lessons tend to score lower on the PISA math test compared to countries where students respond negatively to the statements. These counterintuitive finding may be influenced by how terms such as “enjoy” and “looking forward” are interpreted in different cultures. Withincountry analyses address that problem. The correlation coefficients for withincountry, studentlevel associations of achievement and other components of engagement run in the anticipated direction—they are positive. But they are also modest in size, with correlation coefficients of 0.20 or less.
Policymakers are interested in questions requiring analysis of aggregated data—at the national level, that means betweencountry data. When countries increase their students’ intrinsic motivation to learn math, is there a concomitant increase in PISA math scores? Data from 2003 to 2012 are examined. Seventeen countries managed to increase student motivation, but their PISA math scores fell an average of 3.7 scale score points. Fourteen countries showed no change on the index of intrinsic motivation—and their PISA scores also evidenced little change. Eight countries witnessed a decline in intrinsic motivation. Inexplicably, their PISA math scores increased by an average of 10.3 scale score points. Motivation down, achievement up.
Correlation is not causation. Moreover, the absence of a positive correlation—or in this case, the presence of a negative correlation—is not refutation of a possible positive relationship. The lesson here is not that policymakers should adopt the most effective way of stamping out student motivation. The lesson is that the level of analysis matters when analyzing achievement data. Policy reports must be read warily—especially those freely offering policy recommendations. Beware of analyses that exclusively rely on within or betweencountry test data without making any attempt to reconcile discrepancies at other levels of analysis. Those analysts could be cherrypicking the data. Also, consumers of education research should grant more credence to approaches modeling change over time (as in difference in difference models) than to crosssectional analyses that only explore statistical relationships at a single point in time.
Part I: Girls, Boys, and Reading » 
Editor's Note: The introduction to the 2015 Brown Center Report on American Education appears below. Use the Table of Contents to navigate through the report online, or download a PDF of the full report.
TABLE OF CONTENTS
Part I: Girls, Boys, and Reading
Part II: Measuring Effects of the Common Core
The 2015 Brown Center Report (BCR) represents the 14^{th} edition of the series since the first issue was published in 2000. It includes three studies. Like all previous BCRs, the studies explore independent topics but share two characteristics: they are empirical and based on the best evidence available. The studies in this edition are on the gender gap in reading, the impact of the Common Core State Standards  English Language Arts on reading achievement, and student engagement.
Part one examines the gender gap in reading. Girls outscore boys on practically every reading test given to a large population. And they have for a long time. A 1942 Iowa study found girls performing better than boys on tests of reading comprehension, vocabulary, and basic language skills. Girls have outscored boys on every reading test ever given by the National Assessment of Educational Progress (NAEP)—the first long term trend test was administered in 1971—at ages nine, 13, and 17. The gap is not confined to the U.S. Reading tests administered as part of the Progress in International Reading Literacy Study (PIRLS) and the Program for International Student Assessment (PISA) reveal that the gender gap is a worldwide phenomenon. In more than sixty countries participating in the two assessments, girls are better readers than boys.
Perhaps the most surprising finding is that Finland, celebrated for its extraordinary performance on PISA for over a decade, can take pride in its high standing on the PISA reading test solely because of the performance of that nation’s young women. With its 62 point gap, Finland has the largest gender gap of any PISA participant, with girls scoring 556 and boys scoring 494 points (the OECD average is 496, with a standard deviation of 94). If Finland were only a nation of young men, its PISA ranking would be mediocre.
Part two is about reading achievement, too. More specifically, it’s about reading and the English Language Arts standards of the Common Core (CCSSELA). It’s also about an important decision that policy analysts must make when evaluating public policies—the determination of when a policy begins. How can CCSS be properly evaluated?
Two different indexes of CCSSELA implementation are presented, one based on 2011 data and the other on data collected in 2013. In both years, state education officials were surveyed about their Common Core implementation efforts. Because fortysix states originally signed on to the CCSSELA—and with at least forty still on track for full implementation by 2016—little variability exists among the states in terms of standards policy. Of course, the four states that never adopted CCSSELA can serve as a small control group. But variation is also found in how the states are implementing CCSS. Some states are pursuing an array of activities and aiming for full implementation earlier rather than later. Others have a narrow, targeted implementation strategy and are proceeding more slowly.
The analysis investigates whether CCSSELA implementation is related to 20092013 gains on the fourth grade NAEP reading test. The analysis cannot verify causal relationships between the two variables, only correlations. States that have aggressively implemented CCSSELA (referred to as “strong” implementers in the study) evidence a one to one and onehalf point larger gain on the NAEP scale compared to nonadopters of the standards. This association is similar in magnitude to an advantage found in a study of eighth grade math achievement in last year’s BCR. Although positive, these effects are quite small. When the 2015 NAEP results are released this winter, it will be important for the fate of the Common Core project to see if strong implementers of the CCSSELA can maintain their momentum.
Part three is on student engagement. PISA tests fifteenyearolds on three subjects—reading, math, and science—every three years. It also collects a wealth of background information from students, including their attitudes toward school and learning. When the 2012 PISA results were released, PISA analysts published an accompanying volume, Ready to Learn: Students’ Engagement, Drive, and SelfBeliefs, exploring topics related to student engagement.
Part three provides secondary analysis of several dimensions of engagement found in the PISA report. Intrinsic motivation, the internal rewards that encourage students to learn, is an important component of student engagement. National scores on PISA’s index of intrinsic motivation to learn mathematics are compared to national PISA math scores. Surprisingly, the relationship is negative. Countries with highly motivated kids tend to score lower on the math test; conversely, higherscoring nations tend to have lessmotivated kids.
The same is true for responses to the statements, “I do mathematics because I enjoy it,” and “I look forward to my mathematics lessons.” Countries with students who say that they enjoy math or look forward to their math lessons tend to score lower on the PISA math test compared to countries where students respond negatively to the statements. These counterintuitive finding may be influenced by how terms such as “enjoy” and “looking forward” are interpreted in different cultures. Withincountry analyses address that problem. The correlation coefficients for withincountry, studentlevel associations of achievement and other components of engagement run in the anticipated direction—they are positive. But they are also modest in size, with correlation coefficients of 0.20 or less.
Policymakers are interested in questions requiring analysis of aggregated data—at the national level, that means betweencountry data. When countries increase their students’ intrinsic motivation to learn math, is there a concomitant increase in PISA math scores? Data from 2003 to 2012 are examined. Seventeen countries managed to increase student motivation, but their PISA math scores fell an average of 3.7 scale score points. Fourteen countries showed no change on the index of intrinsic motivation—and their PISA scores also evidenced little change. Eight countries witnessed a decline in intrinsic motivation. Inexplicably, their PISA math scores increased by an average of 10.3 scale score points. Motivation down, achievement up.
Correlation is not causation. Moreover, the absence of a positive correlation—or in this case, the presence of a negative correlation—is not refutation of a possible positive relationship. The lesson here is not that policymakers should adopt the most effective way of stamping out student motivation. The lesson is that the level of analysis matters when analyzing achievement data. Policy reports must be read warily—especially those freely offering policy recommendations. Beware of analyses that exclusively rely on within or betweencountry test data without making any attempt to reconcile discrepancies at other levels of analysis. Those analysts could be cherrypicking the data. Also, consumers of education research should grant more credence to approaches modeling change over time (as in difference in difference models) than to crosssectional analyses that only explore statistical relationships at a single point in time.
Part I: Girls, Boys, and Reading » 
The Brown Center on Education Policy at Brookings recently released the fourth iteration of its annual Education Choice and Competition Index (ECCI). The 2014 ECCI examines the status of K12 school choice during the 20132014 school year in the 100+ largest school districts in the U.S.^{[i]} The ECCI describes the state of school choice based on data derived from the federal government’s National Center for Education Statistics, individual school district websites, surveys of district personnel, and performance by schools on state assessments of academic achievement. The data are organized based on a conceptual model in which good implementations of school choice provide parents with: many choices among types of school; a supply of comparatively higher performing schools; good information on school quality on which to base choice; a choice process that is efficient and equitable to all students; funding that follows students to their school of choice with policies to close unpopular schools; and free transportation for students from home to any school of choice.
In his keynote address at the release of the 2014 ECCI, Senator Lamar Alexander noted that he felt like a character in the movie Groundhog Day because he had been giving the same speech on school choice every ten years, predicting in 1992 that by the year 2000 school choice would no longer be an issue as all parents would be able to freely choose any K12 accredited school for their child.
Is the nation, in fact, stuck in a recurring scene in which parents awaken to the sounds of Sonny and Cher on their radio and trundle their children off to the public school that is closest to their home because that is their only option? Or have we broken out of the tradition of zip code education in ways that suggest that Senator Alexander’s rosy prediction on school choice, first given in 1992, is closer to realization than many would think?
Heretofore, our annual ECCI release has not covered a long enough period to detect meaningful trends. In this report, we lengthen our analysis as we introduce and utilize annual data based on the scoring rubrics in the ECCI that extends the series backward in time to the 20002001 school year.
Our present interests are descriptive. We address some of the dimensions on which school choice has changed in large school districts since the beginning of this century and some of the dimensions on which choice has been static. We believe this information, provided here as a preliminary first look at our newly constructed dataset, provides important context for several constituencies. Among them is the U.S. Congress, which is presently about the business of reauthorizing the Elementary and Secondary Education Act. It should be relevant for federal legislators to know whether school choice is an idiosyncratic policy preference that has always been scattered around America’s largest cities and school districts, or is something that is moving with a speed and direction that suggests a public appetite. If the latter, what should Congress do to address areas in which school choice is impacted by federal policy?
Information on change and stasis in school choice is also important to decisions by policymakers and voters at the state and local levels. Where do states and large school districts stand with respect to the counterparts against which they benchmark themselves, and with respect to general trend lines? Information on longterm trends in school choice also can inform the efforts of advocates (and opponents) of school choice by revealing features of school choice policies that seem amenable to or resistant to change.
An important component of choice is variety. To the extent that all schools provide the same curriculum to similar students with similar teachers and staff, choice is between Tweedledum and Tweedledee. We see modest growth over the time period we examine in enrollment in alternatives to traditional public schools in the form of magnet schools (up from seven percent to 10 percent) and charter schools. In contrast, enrollment in private schools has declined (from 13 to 11 percent). Thus, a parent living in a metropolitan region served by one of our large school districts has, on average, a bit more choice of an alternative school today than in 2000, though regular public schools are still the dominant service providers. There are substantial differences among districts on this variable, with, for example, a substantial majority of students served by alternative schools in New Orleans, LA and a near parity between alternative and traditional schools in Washington, D.C., whereas nearly all students are served by traditional public schools in districts such as Fort Worth, TX and Santa Ana, CA.
Student Enrollment
One of the most central dimensions we tracked is the extent to which school districts make choice easily available, either through a process in which parents have to choose, or through a process in which students receive a default assignment to a neighborhood school but parents can easily seek a transfer of their child to another school. As depicted in the following graph, changes over time in the availability of school choice have been dramatic.^{[ii]} In the 20002001 school year, only 24 percent of districts afforded parents school choice (20 percent through easy transfers from default schools and four percent through a fullfledged open enrollment process). Today, that number has more than doubled to 55 percent of districts allowing choice. Put another way, in 20002001, 75 percent of our districts made transferring out of one’s default assigned school difficult or nearly impossible. Today that number has dropped to 45 percent.
Change in Student Assignment to Schools Over Time
We see similar trends in other facets of choice. In 20002001, 13 percent of districts offered virtual programs or allowed their students to enroll in virtual classes that counted towards graduation or matriculation. Today, that has jumped to 88 percent.
Virtual Programs or Courses Allowed
In 20002001, 19 percent of the schools in our sample had a published policy to close or restructure schools based on declining enrollment. Today, that has more than doubled to 50 percent.
Policy to Restructure or Close Schools with Declining Enrollment
There are even larger changes (from about 10 percent to about 80 percent) in the proportion of districts that fund schools based on a formula in which a substantial portion of each school’s allocation of district funds is determined by enrollment, both in terms of size and student needs (i.e. special education, ELL, etc.). Such funding policies, when combined with easily available school choice, have the potential of putting competitive pressure on individual schools that are losing students to make themselves more attractive to parents and students.
Popularity of Schools Reflected in Funding
Whereas most of the components of school choice tracked by the ECCI have changed, sometimes substantially, since 20002001, the provision of transportation to students has not budged: 10 percent of districts then and now provide transportation for students to any public school of choice within the district. Limitations on transportation of students to and from school place severe practical constraints on the exercise of school choice for families in which all the adults hold down jobs with 95 workdays or do not have a car.
It may feel like Groundhog Day to Senator Alexander, but, in fact, since the 20002001 school year, the story of school choice in the nation’s largest districts has been one of change rather than repetition. The senator’s prediction of a day in which every parent chooses her child’s school is still far from realization, but over a quarter of children in large schools districts today are attending alternative schools that have been chosen by parents, roughly half of districts make it relatively easy for a parent to exercise choice among public schools, and districts are managing their portfolio of schools and budgets in ways that favor popular schools. For advocates of school choice, that is progress. And for policymakers, these shifts indicate that there is both public interest and political feasibility for school choice in the nation’s largest school districts. The stagnation of transportation options points to an area for policy improvement, and a possible reason why the parents who may benefit the most from school choice are the least able to choose schools outside of their neighborhood.
This quick look at trends in school choice does not address the impact of changes in school choice on outcomes such as student achievement and school productivity. But the data we have assembled should be useful to researchers in addressing these questions. This database is available to qualified researchers for a variety of analyses and potential projects. For more information please click here.
[i] The first release of the ECCI covered 30 districts, whereas the subsequent three releases covered more than 100.
[ii] The percent of school districts is derived from the number of districts for which we have complete data from 20002001 to present.
Authors’ Conflict of Interest Disclaimer: The Walton Family Foundation, which has a mission to enhance school choice, provided funding for the work reported herein. With the exception of its initial decision to fund the program of work of which this report is a part, the Walton Family Foundation has had no involvement in any aspect of the activities carried out by the authors relevant to this report.
The Brown Center on Education Policy at Brookings recently released the fourth iteration of its annual Education Choice and Competition Index (ECCI). The 2014 ECCI examines the status of K12 school choice during the 20132014 school year in the 100+ largest school districts in the U.S.^{[i]} The ECCI describes the state of school choice based on data derived from the federal government’s National Center for Education Statistics, individual school district websites, surveys of district personnel, and performance by schools on state assessments of academic achievement. The data are organized based on a conceptual model in which good implementations of school choice provide parents with: many choices among types of school; a supply of comparatively higher performing schools; good information on school quality on which to base choice; a choice process that is efficient and equitable to all students; funding that follows students to their school of choice with policies to close unpopular schools; and free transportation for students from home to any school of choice.
In his keynote address at the release of the 2014 ECCI, Senator Lamar Alexander noted that he felt like a character in the movie Groundhog Day because he had been giving the same speech on school choice every ten years, predicting in 1992 that by the year 2000 school choice would no longer be an issue as all parents would be able to freely choose any K12 accredited school for their child.
Is the nation, in fact, stuck in a recurring scene in which parents awaken to the sounds of Sonny and Cher on their radio and trundle their children off to the public school that is closest to their home because that is their only option? Or have we broken out of the tradition of zip code education in ways that suggest that Senator Alexander’s rosy prediction on school choice, first given in 1992, is closer to realization than many would think?
Heretofore, our annual ECCI release has not covered a long enough period to detect meaningful trends. In this report, we lengthen our analysis as we introduce and utilize annual data based on the scoring rubrics in the ECCI that extends the series backward in time to the 20002001 school year.
Our present interests are descriptive. We address some of the dimensions on which school choice has changed in large school districts since the beginning of this century and some of the dimensions on which choice has been static. We believe this information, provided here as a preliminary first look at our newly constructed dataset, provides important context for several constituencies. Among them is the U.S. Congress, which is presently about the business of reauthorizing the Elementary and Secondary Education Act. It should be relevant for federal legislators to know whether school choice is an idiosyncratic policy preference that has always been scattered around America’s largest cities and school districts, or is something that is moving with a speed and direction that suggests a public appetite. If the latter, what should Congress do to address areas in which school choice is impacted by federal policy?
Information on change and stasis in school choice is also important to decisions by policymakers and voters at the state and local levels. Where do states and large school districts stand with respect to the counterparts against which they benchmark themselves, and with respect to general trend lines? Information on longterm trends in school choice also can inform the efforts of advocates (and opponents) of school choice by revealing features of school choice policies that seem amenable to or resistant to change.
An important component of choice is variety. To the extent that all schools provide the same curriculum to similar students with similar teachers and staff, choice is between Tweedledum and Tweedledee. We see modest growth over the time period we examine in enrollment in alternatives to traditional public schools in the form of magnet schools (up from seven percent to 10 percent) and charter schools. In contrast, enrollment in private schools has declined (from 13 to 11 percent). Thus, a parent living in a metropolitan region served by one of our large school districts has, on average, a bit more choice of an alternative school today than in 2000, though regular public schools are still the dominant service providers. There are substantial differences among districts on this variable, with, for example, a substantial majority of students served by alternative schools in New Orleans, LA and a near parity between alternative and traditional schools in Washington, D.C., whereas nearly all students are served by traditional public schools in districts such as Fort Worth, TX and Santa Ana, CA.
Student Enrollment
One of the most central dimensions we tracked is the extent to which school districts make choice easily available, either through a process in which parents have to choose, or through a process in which students receive a default assignment to a neighborhood school but parents can easily seek a transfer of their child to another school. As depicted in the following graph, changes over time in the availability of school choice have been dramatic.^{[ii]} In the 20002001 school year, only 24 percent of districts afforded parents school choice (20 percent through easy transfers from default schools and four percent through a fullfledged open enrollment process). Today, that number has more than doubled to 55 percent of districts allowing choice. Put another way, in 20002001, 75 percent of our districts made transferring out of one’s default assigned school difficult or nearly impossible. Today that number has dropped to 45 percent.
Change in Student Assignment to Schools Over Time
We see similar trends in other facets of choice. In 20002001, 13 percent of districts offered virtual programs or allowed their students to enroll in virtual classes that counted towards graduation or matriculation. Today, that has jumped to 88 percent.
Virtual Programs or Courses Allowed
In 20002001, 19 percent of the schools in our sample had a published policy to close or restructure schools based on declining enrollment. Today, that has more than doubled to 50 percent.
Policy to Restructure or Close Schools with Declining Enrollment
There are even larger changes (from about 10 percent to about 80 percent) in the proportion of districts that fund schools based on a formula in which a substantial portion of each school’s allocation of district funds is determined by enrollment, both in terms of size and student needs (i.e. special education, ELL, etc.). Such funding policies, when combined with easily available school choice, have the potential of putting competitive pressure on individual schools that are losing students to make themselves more attractive to parents and students.
Popularity of Schools Reflected in Funding
Whereas most of the components of school choice tracked by the ECCI have changed, sometimes substantially, since 20002001, the provision of transportation to students has not budged: 10 percent of districts then and now provide transportation for students to any public school of choice within the district. Limitations on transportation of students to and from school place severe practical constraints on the exercise of school choice for families in which all the adults hold down jobs with 95 workdays or do not have a car.
It may feel like Groundhog Day to Senator Alexander, but, in fact, since the 20002001 school year, the story of school choice in the nation’s largest districts has been one of change rather than repetition. The senator’s prediction of a day in which every parent chooses her child’s school is still far from realization, but over a quarter of children in large schools districts today are attending alternative schools that have been chosen by parents, roughly half of districts make it relatively easy for a parent to exercise choice among public schools, and districts are managing their portfolio of schools and budgets in ways that favor popular schools. For advocates of school choice, that is progress. And for policymakers, these shifts indicate that there is both public interest and political feasibility for school choice in the nation’s largest school districts. The stagnation of transportation options points to an area for policy improvement, and a possible reason why the parents who may benefit the most from school choice are the least able to choose schools outside of their neighborhood.
This quick look at trends in school choice does not address the impact of changes in school choice on outcomes such as student achievement and school productivity. But the data we have assembled should be useful to researchers in addressing these questions. This database is available to qualified researchers for a variety of analyses and potential projects. For more information please click here.
[i] The first release of the ECCI covered 30 districts, whereas the subsequent three releases covered more than 100.
[ii] The percent of school districts is derived from the number of districts for which we have complete data from 20002001 to present.
Authors’ Conflict of Interest Disclaimer: The Walton Family Foundation, which has a mission to enhance school choice, provided funding for the work reported herein. With the exception of its initial decision to fund the program of work of which this report is a part, the Walton Family Foundation has had no involvement in any aspect of the activities carried out by the authors relevant to this report.
With both houses of Congress moving apace to reauthorize the Elementary and Secondary Education Act (ESEA), the question is not whether the new legislation will reduce the federal government’s footprint in K12 education; it assuredly will. The question is whether, in their understandable efforts to rein in Washington’s influence, legislators can preserve those elements of federal policy that stand to benefit students and taxpayers—particularly those that fulfill functions that would otherwise go unaddressed within our multilayered system of education governance.
One key unresolved issue involves the status of competitive grant programs, through which the Department of Education invites states and school districts to apply for funds to support programs that address federally identified priorities. In the current environment, Congress may be tempted to eschew all programs structured in this way, preferring to rely on formulas to ensure that schools receive their fair share of federal funds. That would be a mistake. Flexible competitive grant programs that encourage innovations in policy and practice and ensure that they are subjected to rigorous evaluation should remain a part of ESEA going forward. In particular, the Investing in Innovation (i3) fund, a program created through the American Reinvestment and Recovery Act that is not a part of the reauthorization bills now moving through Congress, deserves a second look.
Increased reliance on competitive grants has been arguably the defining feature of the Obama administration’s K12 education policy. Its signature Race to the Top program (RTT) asked states to compete for $4.35 billion in federal grants based on their commitment to implement a 19item reform agenda. Expansive in its scope, RTT quickly became a symbol of what Senate Health, Education, Labor and Pensions Committee chairman Lamar Alexander has characterized as the Department’s efforts to dictate to states and school districts the details of how best to improve local schools. Congressional discontent with RTTstyle policies is not limited to Republicans, however. Most legislators prefer to claim credit for funds allocated by formula rather than risk the ire of constituents whose applications are rejected, and rural members in particular often feel as if their districts are at a disadvantage when funding is competitive. Perhaps because of this discontent, President Obama’s 2016 budget proposal did not include funds for a new RTT competition.
But rather than paint all competitive grants with a broad brush, it is useful to consider differences in their structure. The table below shows that competitive grant programs can vary on at least two dimensions. First, the programs can be broad, aiming to incentivize policy changes in multiple areas in one fell swoop, or narrowly focused on a specific challenge facing most school systems. Second, grants can be awarded based on applicants’ willingness to commit to a detailed set of policy changes and program requirements prescribed by Washington, or they can be awarded based on past success, with funding levels tied to the strength of the evidence the applicant is able to present of their program’s effectiveness.
Federal Competitive Education Grants: A Typology with Examples


Selection Criteria 



Prescriptive and based on commitments 
Flexible and based on evidence 
Policy Focus 
Broad 
Race to the Top

Investing in Innovation 
Narrow 
Teacher Incentive Fund 
Replicating and Expanding HighQuality Charter Schools 
RTT epitomized the broad, prescriptive approach to competitive grants. Although presented by supporters as an opportunity for states to put forward their best and most innovative ideas, in fact the selection criteria amounted to a detailed list of commitments in areas ranging from state standards and data systems to teacher evaluation systems and strategies to turn around lowperforming schools. Because funding was based primarily on future commitments, the program did little to alter the complianceoriented relationship between federal officials and state and local educators once grants were awarded. As Rick Hess of the American Enterprise Institute has written, “the aftermath entailed years of invasive federal monitoring…during which junior staff at the U.S. Department of Education exerted remarkable influence over the states that received RTT funds.” While it is too soon to know whether states awarded RTT grants will see improvements in student outcomes, there is little hope that their efforts will be a source of rigorous evidence on the merits of specific policies they pursued. The sheer number of policies states were required to implement simultaneously makes it all but impossible to isolate the impact of any one.
Yet RTT was the exception, not the rule. Its scale reflected the unique circumstances of the postfinancial crisis stimulus package, and maintaining a single competitive grant program at this scale has already proven to be politically infeasible.
Other competitive grant programs are structured quite differently, with a narrow focus tied to a distinct federal purpose. For example, the Teacher Incentive Fund created by the second Bush administration asks school districts and charter schools to commit to implementing performancebased teacher compensation systems. The rationale is that local officials will be more likely to adopt politically controversial changes to how teachers are compensated when outside resources are available to support their efforts. For the past few years, the Department of Education has also offered grants directly to Charter Management Organizations seeking to expand or replicate highquality schools. Those schools need not adhere to a particular pedagogical model but must instead document a track record of improving student outcomes. The Teacher Incentive Fund and grants to expand and replicate highquality charter schools have been included in both the House committee’s bill and in Senator Alexander’s initial discussion draft.
Those bills do not, however, include the Investing in Innovation fund (i3), the second major competitive grant program created through the stimulus package. Initially funded at $650 million, i3 allowed school districts, charter schools, and nonprofit organizations working in partnership with one of those entities to apply for grants to support innovative programs aligned with one of four broadly defined federal priorities (e.g., supporting effective teachers and principals or improving the use of data). In other words, i3 was broad in its focus but avoided prescription with respect to the design of the programs eligible for federal support.
The origins and implementation of i3 have been ably chronicled by Ron Haskins and Greg Margolis, who present the program as a cornerstone of the Obama administration’s broader efforts to base spending on social programs on rigorous evidence. Two specific aspects of the design of i3 are especially noteworthy. First, the competition used a tiered evidence model to align the amount of funding a program could receive to the strength of the evidence to support its effectiveness. Second, grant winners were required to conduct rigorous evaluations and were selected in part based on the quality of their proposed evaluation design. Across the first four funding cohorts, i3 supported 53 randomizedcontrol trials—the goldstandard design for evaluations of program effectiveness and one that until recently was virtually unknown in the education sector. (Full disclosure: my primary employer, the Harvard Graduate School of Education, has benefited from i3 as a direct grantee and through evaluation contracts; I am the principal investigator on two of those contracts.)
A competitive grant program that includes these design elements need not be called i3. Indeed, it need not be drafted as a standalone program at all. The Coalition for EvidenceBased Policy has proposed language that would simply allow the Department of Education to reserve up to one percent of funding of all ESEA programs (except Title I) to award grants for innovation and research, with grant amounts based on the tiered evidence model used in i3. The proposal is modeled on the Small Business Innovation Research program under which 11 federal agencies since 1982 have set aside a small percentage of their budgets to award grants to small companies engaged in the development and evaluation of new technologies. As the Coalition notes, both the Government Accountability office and the National Academy of Sciences have offered consistently positive assessments of the program’s success. Importantly, the proposal like SBIR would include small businesses as eligible grantees, addressing a shortcoming of the original i3 program that arguably limited the types of innovations proposed.
The Coalition’s proposal could be strengthened by giving the Institute of Education Sciences the lead role in assessing the strength of applicants’ evidence of effectiveness and in supporting required evaluation activities. The risk with these competitions when carried out by the Office of the Secretary is that they become politicized, that they are judged by review panels without methodological competence, and that they are overseen, once awarded, by career staff in program offices that do not have the background to monitor what is, at root, a program evaluation grant. These risks could be substantially reduced if the competition were funded as a line item in the IES budget, with statutory language requiring that review panels include both practitioners and researchers.
Properly designed competitive grant programs provide an opportunity for Congress to target resources at federal priorities and encourage innovative problemsolving while avoiding federal mandates. They should avoid prescription and both reward and produce rigorous evidence, thus increasing the share of education dollars spent on evidencebased programs while at the same time fulfilling the federal government’s unique responsibility for producing and disseminating highquality evidence on the best ways to improve American schools. The i3 program was a promising step in this direction. It would be unfortunate if Congress were to miss the opportunity to make something similar a permanent feature of ESEA.
With both houses of Congress moving apace to reauthorize the Elementary and Secondary Education Act (ESEA), the question is not whether the new legislation will reduce the federal government’s footprint in K12 education; it assuredly will. The question is whether, in their understandable efforts to rein in Washington’s influence, legislators can preserve those elements of federal policy that stand to benefit students and taxpayers—particularly those that fulfill functions that would otherwise go unaddressed within our multilayered system of education governance.
One key unresolved issue involves the status of competitive grant programs, through which the Department of Education invites states and school districts to apply for funds to support programs that address federally identified priorities. In the current environment, Congress may be tempted to eschew all programs structured in this way, preferring to rely on formulas to ensure that schools receive their fair share of federal funds. That would be a mistake. Flexible competitive grant programs that encourage innovations in policy and practice and ensure that they are subjected to rigorous evaluation should remain a part of ESEA going forward. In particular, the Investing in Innovation (i3) fund, a program created through the American Reinvestment and Recovery Act that is not a part of the reauthorization bills now moving through Congress, deserves a second look.
Increased reliance on competitive grants has been arguably the defining feature of the Obama administration’s K12 education policy. Its signature Race to the Top program (RTT) asked states to compete for $4.35 billion in federal grants based on their commitment to implement a 19item reform agenda. Expansive in its scope, RTT quickly became a symbol of what Senate Health, Education, Labor and Pensions Committee chairman Lamar Alexander has characterized as the Department’s efforts to dictate to states and school districts the details of how best to improve local schools. Congressional discontent with RTTstyle policies is not limited to Republicans, however. Most legislators prefer to claim credit for funds allocated by formula rather than risk the ire of constituents whose applications are rejected, and rural members in particular often feel as if their districts are at a disadvantage when funding is competitive. Perhaps because of this discontent, President Obama’s 2016 budget proposal did not include funds for a new RTT competition.
But rather than paint all competitive grants with a broad brush, it is useful to consider differences in their structure. The table below shows that competitive grant programs can vary on at least two dimensions. First, the programs can be broad, aiming to incentivize policy changes in multiple areas in one fell swoop, or narrowly focused on a specific challenge facing most school systems. Second, grants can be awarded based on applicants’ willingness to commit to a detailed set of policy changes and program requirements prescribed by Washington, or they can be awarded based on past success, with funding levels tied to the strength of the evidence the applicant is able to present of their program’s effectiveness.
Federal Competitive Education Grants: A Typology with Examples


Selection Criteria 



Prescriptive and based on commitments 
Flexible and based on evidence 
Policy Focus 
Broad 
Race to the Top

Investing in Innovation 
Narrow 
Teacher Incentive Fund 
Replicating and Expanding HighQuality Charter Schools 
RTT epitomized the broad, prescriptive approach to competitive grants. Although presented by supporters as an opportunity for states to put forward their best and most innovative ideas, in fact the selection criteria amounted to a detailed list of commitments in areas ranging from state standards and data systems to teacher evaluation systems and strategies to turn around lowperforming schools. Because funding was based primarily on future commitments, the program did little to alter the complianceoriented relationship between federal officials and state and local educators once grants were awarded. As Rick Hess of the American Enterprise Institute has written, “the aftermath entailed years of invasive federal monitoring…during which junior staff at the U.S. Department of Education exerted remarkable influence over the states that received RTT funds.” While it is too soon to know whether states awarded RTT grants will see improvements in student outcomes, there is little hope that their efforts will be a source of rigorous evidence on the merits of specific policies they pursued. The sheer number of policies states were required to implement simultaneously makes it all but impossible to isolate the impact of any one.
Yet RTT was the exception, not the rule. Its scale reflected the unique circumstances of the postfinancial crisis stimulus package, and maintaining a single competitive grant program at this scale has already proven to be politically infeasible.
Other competitive grant programs are structured quite differently, with a narrow focus tied to a distinct federal purpose. For example, the Teacher Incentive Fund created by the second Bush administration asks school districts and charter schools to commit to implementing performancebased teacher compensation systems. The rationale is that local officials will be more likely to adopt politically controversial changes to how teachers are compensated when outside resources are available to support their efforts. For the past few years, the Department of Education has also offered grants directly to Charter Management Organizations seeking to expand or replicate highquality schools. Those schools need not adhere to a particular pedagogical model but must instead document a track record of improving student outcomes. The Teacher Incentive Fund and grants to expand and replicate highquality charter schools have been included in both the House committee’s bill and in Senator Alexander’s initial discussion draft.
Those bills do not, however, include the Investing in Innovation fund (i3), the second major competitive grant program created through the stimulus package. Initially funded at $650 million, i3 allowed school districts, charter schools, and nonprofit organizations working in partnership with one of those entities to apply for grants to support innovative programs aligned with one of four broadly defined federal priorities (e.g., supporting effective teachers and principals or improving the use of data). In other words, i3 was broad in its focus but avoided prescription with respect to the design of the programs eligible for federal support.
The origins and implementation of i3 have been ably chronicled by Ron Haskins and Greg Margolis, who present the program as a cornerstone of the Obama administration’s broader efforts to base spending on social programs on rigorous evidence. Two specific aspects of the design of i3 are especially noteworthy. First, the competition used a tiered evidence model to align the amount of funding a program could receive to the strength of the evidence to support its effectiveness. Second, grant winners were required to conduct rigorous evaluations and were selected in part based on the quality of their proposed evaluation design. Across the first four funding cohorts, i3 supported 53 randomizedcontrol trials—the goldstandard design for evaluations of program effectiveness and one that until recently was virtually unknown in the education sector. (Full disclosure: my primary employer, the Harvard Graduate School of Education, has benefited from i3 as a direct grantee and through evaluation contracts; I am the principal investigator on two of those contracts.)
A competitive grant program that includes these design elements need not be called i3. Indeed, it need not be drafted as a standalone program at all. The Coalition for EvidenceBased Policy has proposed language that would simply allow the Department of Education to reserve up to one percent of funding of all ESEA programs (except Title I) to award grants for innovation and research, with grant amounts based on the tiered evidence model used in i3. The proposal is modeled on the Small Business Innovation Research program under which 11 federal agencies since 1982 have set aside a small percentage of their budgets to award grants to small companies engaged in the development and evaluation of new technologies. As the Coalition notes, both the Government Accountability office and the National Academy of Sciences have offered consistently positive assessments of the program’s success. Importantly, the proposal like SBIR would include small businesses as eligible grantees, addressing a shortcoming of the original i3 program that arguably limited the types of innovations proposed.
The Coalition’s proposal could be strengthened by giving the Institute of Education Sciences the lead role in assessing the strength of applicants’ evidence of effectiveness and in supporting required evaluation activities. The risk with these competitions when carried out by the Office of the Secretary is that they become politicized, that they are judged by review panels without methodological competence, and that they are overseen, once awarded, by career staff in program offices that do not have the background to monitor what is, at root, a program evaluation grant. These risks could be substantially reduced if the competition were funded as a line item in the IES budget, with statutory language requiring that review panels include both practitioners and researchers.
Properly designed competitive grant programs provide an opportunity for Congress to target resources at federal priorities and encourage innovative problemsolving while avoiding federal mandates. They should avoid prescription and both reward and produce rigorous evidence, thus increasing the share of education dollars spent on evidencebased programs while at the same time fulfilling the federal government’s unique responsibility for producing and disseminating highquality evidence on the best ways to improve American schools. The i3 program was a promising step in this direction. It would be unfortunate if Congress were to miss the opportunity to make something similar a permanent feature of ESEA.