Understanding Assessment Validity and Reliability Guide

Validity and Reliability are undoubtedly the two pillars of assessment theory. As an adjunct professor of Assessment, I have had to define these concepts and explain their significance to teacher candidates in all content areas. Often, they confuse the two, so I had to develop a few memorable examples that would stick in their minds, long after they earned their teaching license.

What Does It Mean For a Test to be Valid?

Does the test measure what it purports to measure? For example, if you suspect you have the flu, would you take a pregnancy test? No, because it does not purport to measure whether you have the flu. To measure if you have the flu, you would need to take a test that was developed to determine whether you have the flu.

If you want a proficiency test, then be sure to select a test that has been proven by research to be a valid test of proficiency. If you want to test achievement, there are many assessment tools that do that. So be sure you know what you really want to test – then select the tool that has been demonstrated as valid for that purpose.

What Does It Mean For an Assessment to be Reliable?

Does the test provide consistent results? A test that is reliable produces stable, dependable results over time and across different conditions. You should be able to test the same individuals repeatedly with similar results. For example, a student will not go to bed one night, functioning at Novice Mid on the ACTFL Proficiency scale, and wake up the next morning and magically function at Advanced High. The words rely and reliable are related. Remember that you can rely on a Reliable test to provide consistent results.

Why Does It Matter?

For one thing, it matters to test takers. They want to know that their score really means something to them and to admissions officers, educators, and prospective employers. If you cannot rely on something to test what it says it tests and provide consistent results, then why test at all?

Imagine you are on a diet. You weigh yourself every morning on the bathroom scale. Every day you get a different number that does not seem to correlate with what you ate yesterday or how your pants fit. Now you go for your annual checkup at the doctor’s office and when the nurse weighs you, the number is different again! The scale in the doctor’s office has been calibrated by professionals who use it day after day to weigh many different people. That scale is valid and reliable, so you can be sure that the number on the doctor’s office scale is the amount you really weigh!

Prove It!

So now, with my funny examples, you care that your assessment tool is valid and reliable, but how can you really know that? Check if the tool has been thoroughly researched by academics who determined it to be both valid and reliable, and then published their findings in a peer-reviewed journal. If not, then the tool has not been confirmed to be valid and reliable. Just saying something is valid and reliable does not demonstrate that.

Be wary of” research” that is not conducted by external researchers that are not affiliated with an assessment company but rather internally with a small sample size. Such research does not prove validity or reliability. What it shows is that there has been NO external validation of the test’s validity and reliability. If there was, it would be publicized!

Establishing AAPPL’s Validity and Reliability

The research into AAPPL’s validity and reliability is well-known. AAPPL’s original design and test framework were based on the 2006 ACTFL Assessment of Uses and Needs, a survey of over 1,600 world language instructors and administrators regarding the assessments they used and the kinds of assessments they needed. Based on rigorous piloting and field testing and follow-on studies conducted for nearly a decade, the AAPPL represents effective practices in world language assessment.

Analyses of 9,000 test takers demonstrate that the AAPPL can reliably differentiate examinee results according to different levels as described by the AAPPL performance scores. In addition, item difficulty parameters reflect the targeted proficiency levels.

Cox and Malone (2018) further document AAPPL rater reliability and articulate a validity argument using evidence from over 10,000 test results. For a more detailed discussion of AAPPL validity and reliability, refer to:

Cox, T.L., & Malone, M.E. (2018). A validity argument to support the ACTFL Assessment of Performance toward proficiency. Foreign Language Annals, 51 (3), 548-574. Retrieved from https://doi.org/10.1111/flan.12353.

Be an Alert Consumer!

Think critically before you buy language assessments. Ask about external research that confirms claims of reliability and validity. Who did it? Where was it published? Then, read it yourself to be sure.

For more information about adding ACTFL assessments like the AAPPL to your testing program, contact LTI at https://www.languagetesting.com/contact-us/sales

What Does It Mean For an Assessment to be Valid & Reliable?

What Does It Mean For a Test to be Valid?

What Does It Mean For an Assessment to be Reliable?

Why Does It Matter?

Prove It!

Establishing AAPPL’s Validity and Reliability

Be an Alert Consumer!

As a Bilingual Professional You Can Be a Champion in the Retail Sector

Today's Evolving and Diverse School Landscape Requires Bilingual Educators

What Does It Mean For an Assessment to be Valid & Reliable?

What Does It Mean For a Test to be Valid?

What Does It Mean For an Assessment to be Reliable?

Why Does It Matter?

Prove It!

Establishing AAPPL’s Validity and Reliability

Be an Alert Consumer!

Recommended Posts

Did You Read the Directions? Five Ways to Lower Test Anxiety.

From Classroom to Career: Meeting Perkins V Outcomes with Language Proficiency and ACTFL Credentials

State-by-State Requirements for the Seal of Biliteracy