The “Reference Student”: What Standardized Tests Actually Measure (and What They Don’t)

Greg Mullen
Mar 8
10 min read

Updated: Mar 19

In many fields, researchers have identified a design problem known as the “reference man.”

For decades, systems in medicine, engineering, and public policy were often built around a statistical model of an average adult male body—sometimes defined as a 70-kilogram “reference man.” Drug dosages, safety standards, and equipment design were frequently calibrated around this hypothetical average person.

Over time, researchers discovered an obvious problem: Very few real people actually resemble the statistical average used to design the system. Women, children, and people with different body types were often poorly served by systems built around that statistical model.

A famous example comes from the U.S. Air Force in the 1950s, when researchers discovered that cockpits designed for the “average pilot” actually fit almost no pilots at all. Once engineers realized that the statistical average did not represent real individuals, they redesigned cockpits with adjustable controls that could accommodate a wide range of human variation. As these limitations became clearer, many industries began redesigning their standards to account for a wider range of human variation.

Education systems face a similar challenge.

The idea of a statistical “reference student” appears within academic standards that organize learning expectations by age-based grade levels. In theory, these grade levels represent the point at which most students, on average, are expected to develop particular academic skills, such as learning fractions around age 8-9 (CCSS 3.NF.1) and creating summaries of a text's main idea and key details around age 9-10 (CCSS RL.4.2).

When students begin Kindergarten around age 5, having these grade-level academic expectations allows schools to measure students against this linear development of academic standards across grade levels. This helps schools coordinate with curriculum designers, allows teachers to plan grade-level-specific lessons, and assessments can be designed around a shared set of expectations specific to each grade level.

However, child development does not unfold according to a perfectly synchronized timetable.

Decades of research in developmental psychology from scholars such as Jean Piaget, Lev Vygotsky, and more recent cognitive development researchers have consistently shown that children progress through intellectual and conceptual development at different rates and at different points in time. Two students who are roughly the same age may be ready for very different levels of abstraction in reading comprehension, mathematical reasoning, or written expression. [See this article for another perspective on standards across grade levels.]

Some students may grasp certain concepts earlier than expected. Others may need more time, practice, or experiences before those same ideas become fully accessible. Yet, assessments designed to measure progress in reference to a statistical average of academic achievement places an unnecessary burden on teachers, parents, and especially students to meet and exceed these statistically average expectations. In other words, the assessments are designed to unfairly and inequitably measure performance relative to a grade-level “reference student.”

While this design makes it possible to compare results across schools, districts, and states, it also means that the assessments are not designed to fully account for the natural variation in developmental readiness that exists among actual learners and their various developmental readiness to learn increasingly complex concepts and skills.

For example, some students taking a third-grade assessment may already be reasoning comfortably with ideas typically introduced in later grade levels. Others may still be consolidating foundational concepts that are essential for mastering those same standards. Both students are developing, but they are developing along different trajectories within a much larger learning continuum.

Understanding this distinction helps clarify both what standardized tests can tell us and what they cannot.

They can provide useful information about how large groups of students are performing relative to age-based academic expectations. But they cannot fully describe the pace, sequence, or individuality of each student’s motivational, psychosocial, and intellectual development. [Read more in this post, Why Care About Cognitive and Psychosocial Development?]

So Why Do These Tests Exist?

Across the United States, students participate in large-scale academic assessments designed to measure learning outcomes at the system level. In California, for example, many students commonly take:

CAASPP state assessments, which measure achievement relative to California academic standards
NWEA MAP assessments, which many schools use throughout the year to monitor academic growth

These assessments expanded nationwide after federal education laws such as No Child Left Behind (2001) and later the Every Student Succeeds Act (2015), which require states to measure student achievement and report the results publicly.

The purpose of these tests is not primarily to evaluate individual children. Instead, they help policymakers and districts answer large-scale questions such as:

What percentage of students across the state are improving in reading and math?
Are certain schools or districts demonstrating growth over time?
Where should resources and funding be directed?
Which schools may require additional oversight or support?

For charter schools, these results may also play a role in charter renewal decisions, where districts examine academic outcomes alongside (and sometimes with greater emphasis than) other indicators of school effectiveness. In this sense, standardized assessments function primarily as large-scale data tools used to evaluate schools within an education system.

What These Assessments Measure

Assessments like NWEA MAP and CAASPP measure specific academic skills aligned with grade-level standards. For example, they may assess a student’s ability to analyze a reading passage, solve multi-step math problems, interpret informational text, and apply mathematical reasoning. These skills represent specific academic concepts and skills.

However, because these assessments must generate comparable data across thousands or millions of students, they focus on tasks that can be scored consistently and efficiently. As a result, standardized assessments provide what researchers often describe as a snapshot of academic performance at a single moment in time. This means the results describe how a student performed on a specific set of grade-level academic tasks on a particular day, rather than providing a complete picture of the student’s learning process.

For example, I taught a middle school classroom where a few students were several grade levels behind in various reading and math skills. After working with these students at their level, they made several gains that tracked across 2-3 grade levels in half of an academic school year. However, these assessments continually categorized these students as “Developing” (which meant they were not “Meeting” or “Exceeding” their assigned grade level expectations). While some may argue that this could be motivational for students to continue working on that progress, in reality, these students often conflate these assessment results with their identity, which negatively impacts motivation to progress.

If these assessments were to actually track growth across grade levels, schools could be celebrated for helping students make huge gains in years where they are willing and able to do so, and take into consideration years of community and family difficulties that overwhelm children and negatively impact their willingness and capacity to learn what may be, to them during those difficult times, unnecessary topics.

This makes it important for us to reflect on who and what these assessments are for, what they measure, and perhaps most importantly what they do not measure.

What These Tests Cannot Capture

While standardized tests can measure certain academic skills, they cannot capture many aspects of development that matter deeply in a child’s growth. For example, standardized assessments cannot directly measure:

curiosity and intrinsic motivation
creativity and original thinking
perseverance through long-term challenges
collaboration and interpersonal skills
empathy and ethical decision-making
the ability to plan, revise, and complete complex projects
growth in confidence or independence as a learner

Consider a student who spends several weeks researching a topic they care about, revising their ideas, asking questions, and presenting their learning to classmates. That process may demonstrate curiosity, persistence, communication skills, and deep understanding of a concept. Yet a standardized test might only capture a small portion of that learning such as whether the student can answer a few multiple-choice questions about a reading passage on a similar topic.

Similarly, imagine two students solving a difficult math problem. One student may reach the correct answer quickly but struggle to explain their reasoning, while another student may take longer, try several strategies, and ultimately develop a deeper understanding of the concept. A timed test can record whether the final answer is correct, but it cannot fully capture the thinking process, motivation to initiate and sustain effort and experimentation, and the persistence that often leads to meaningful learning.

These capacities develop through long-term learning experiences, relationships, and reflection, not through a single timed assessment. This makes it all the more important to reflect on how much value we are placing on these assessments as each child develops through a unique combination of interests, strengths, challenges, and learning experiences that cannot be fully represented by a statistical model of an “average student.”

Why Schools Still Take These Assessments Seriously

Even though standardized tests capture only part of the learning process, schools must still take them seriously because education systems have chosen to use these assessments as key indicators in policy and resource decisions.

At the state and district level, large-scale assessment data is often used when determining:

school accountability ratings
funding priorities and resource allocation
intervention supports for struggling schools
charter school renewals or program oversight

Because these decisions influence how schools are evaluated and supported, schools must still prepare students to participate in these assessments. At the same time, it is important for educators and families to recognize that children do not develop academic understanding on a perfectly synchronized timeline.

As students grow, their readiness to grasp increasingly complex reading, writing, and mathematical ideas often ebbs and flows. Some students may be ready for certain grade-level concepts earlier, while others may need more time and experience before those ideas fully take hold.

This is why it is important for parents and teachers to stay aligned in understanding that standardized tests measure how students perform relative to age-based grade-level expectations at a specific moment, while real learning unfolds over a longer developmental journey. This does mean some students may be ready for higher grade-level academic topics and may benefit from these assessments as a means for accessing certain schools that are tailored to their capacity to learn at those higher levels. However, the majority of students tend to show strengths and gaps across a particular grade-level set of skills, so it’s important that we recognize gaps as a failing of any one student because their “score” is higher or lower than the norm or average, but rather that they are ready to learn certain concepts and skills which, once they master those, will prepare them for the next level of those concepts and skills at their pace, and not at the pace of the “reference student”.

How I Talk About Testing With Students

In my classroom, we frame testing in a balanced way. Students are encouraged to:

approach the assessment calmly
take their time and try their best
see the test as one experience among many ways they learn

At the same time, students are reminded of something equally important: a test score does not define who someone is as a learner. A child’s development is much broader than a single number. Parents can help reinforce this message at home as testing approaches.

Simple reminders such as encouraging a good night’s sleep, a healthy breakfast, and a calm mindset can help students feel prepared without feeling pressure. It can also be helpful to remind children that the goal is simply to try their best and show what they know today, while understanding that learning continues to grow over time. When students hear the same message from both teachers and parents, they are more likely to approach testing with confidence and perspective.

How Do These Assessments Fit The Bigger Goal of Education

Remember: the long-term purpose of education is not simply to produce test scores. It is to help students become thoughtful, capable human beings who can understand complex ideas, solve meaningful problems, work with others, initiate and sustain motivation for approaching new challenges, and continue learning as a process (rather than a product) throughout their lives.

Standardized assessments may provide analysts and administrators useful information about the education system they have created, but the deeper work of education happens every day through relationships, exploration, and sustained learning.

In my classroom, this broader perspective is why learning is approached as more than preparation for tests. Yes, we are looking at the test format and exploring how test questions are designed, but students are encouraged to develop academic skills alongside habits such as curiosity, persistence, collaboration, and reflective thinking. These capacities support students in becoming increasingly self-directed learners, or individuals who can understand their progress, take ownership of their learning, and continue developing long after any particular test has been completed.

Final Thought: When Test Scores Become Educational Currency

I talk about the currency of assessments and grades in this podcast episode with Dr. Matt Townsley, one of the nation's leading experts on leadership for standards-based grading. In this episode, I explain how grades have evolved into a transactional currency within the education system, which shifts student motivation away from learning and toward accumulating points that can be exchanged as a form of capital for future opportunities. In other words, the system often encourages students to optimize for the grade rather than for understanding learning as a process that benefits themselves as a person or their world.

When it comes to these large-scale test scores, they can often influence access to:

academic programs
gifted and advanced placement tracks
school accountability ratings
charter renewal decisions
funding priorities

In this way, numbers originally designed to measure academic performance can gradually become signals used to distribute opportunity within the education system. Ideally, assessments are tools that communicate feedback for learning specific to an individual, but standardized tests are intentionally designed for large-scale comparability (re: “reference student”), which means they must narrow learning into forms that can be measured quickly and consistently across millions of students.

This creates a tension within modern education systems.

The same tools that help states analyze educational trends are not well suited to capturing the broader dimensions of learning that define the development of a whole person. Recognizing this tension does not mean dismissing standardized assessments. These systems provide useful information for policymakers and school leaders. But it does invite an important reminder: A score can describe a moment in academic performance. It cannot fully describe a developing human being. Assessments may measure pieces of the learning journey, but the journey itself is much larger.

Ultimately, when grades and test scores become the currency used to access opportunities within the education system, students and institutions naturally begin optimizing for those numbers rather than for the deeper learning those numbers were originally meant to represent.

Greg Mullen

March 8, 2026

General Consult

30min

Book Now

References

California Department of Education. (2023). California Assessment of Student Performance and Progress (CAASPP).https://www.cde.ca.gov/ta/tg/ca/

Every Student Succeeds Act (ESSA), Pub. L. No. 114–95 (2015).

National Research Council. (2011). Incentives and Test-Based Accountability in Education. National Academies Press.

NWEA. (2022). MAP Growth Technical Overview.https://www.nwea.org