What is a reliable assessment?
There are lots of factors which contribute to the reliability of an assessment, but two of the most critical for teachers to acknowledge are:
- the precision of the questions and tasks used in prompting students’ responses;
- the accuracy and consistency of the interpretations derived from assessment responses.
Designing questions and assessment processes which work in the same way for different students at different points in time is a skill to be honed, but one that can pay repeated dividends to teachers and their students.
An assessment is a means by which we can create a set of circumstances in which a student can represent their knowledge, skill and understanding in an observable form. Because it is a proxy for something unseen, and because interpretation is often part of making sense of the information derived from an assessment, error is always present in some form or other.
Some (of the many) sources of error include:
- the assessor’s unfamiliarity with the topic being assessed
- the assessor’s unfamiliarity with robust assessment practices
- bias (teachers are human, after all!)
- the subjectivity of the material to be assessed
- the conditions in which students take the assessment
There are lots of ways in which classroom assessment practices can be improved in order to increase reliability, and one of the most immediate is to improve so-called inter-rater reliability and intra-rater reliability.
Inter-rater reliability: getting people to agree with one another on simple matters can be hard enough, so when it comes to complex judgements (such as whether the grades two teachers award independently for the same writing task are consistent with each other), reliability challenges arise.
Intra-rater reliability: most people acknowledge that it is difficult to achieve high levels of inter-rater reliability, but an often overlooked challenge also comes from the accuracy and consistency of one’s own judgements.
Imagine your responses to a set of different assessment tasks of the same quality, but at different times during the day, week, month and year. Particularly in areas of subjectivity – where judgement is needed – you can imagine how your decisions, comments and grading of assignments may vary dependent on time of day, hunger, how many other tasks you’re juggling in your mind, caffeine ingestion…
Improving rater reliability: improving reliability begins by acknowledging that assessments always have a degree of unreliability inherent in them. Improving reliability will improve the quality of the information derived from the assessment process, thus increasing its potential value to teachers and students. Below are three ways to improve reliability of assessment in school:
- Use exemplar student work to clarify what success looks like in specific assignments: be explicit about these criteria;
- Blind-mark assignments: this reduces bias and increases rater reliability
- Blind-moderate samples of students’ work: this increases rater reliability and also offers a good professional development opportunity to share standards.
Given that information from assessments are used to make decisions about the needs and progress of pupils, shouldn’t we be able to answer the question “how reliable is your assessment?” And how many of us could?
Reliability in the assessment of student learning is also about accuracy and consistency and, as a rule, the higher the stakes of the decision we want to make based on assessment information, the more accurate and consistent we want the information to be. High-stakes decisions need highly reliable information. As we saw with validity, a determination of how reliable an assessment needs to be is informed by its intended end uses.
More information on the reliability and validity of assessment in education can be found here.