Criterion-referenced testA criterion-referenced test is a style of test which uses test scores to generate a statement about the behavior that can be expected of a person with that score. Most tests and quizzes that are written by school teachers can be considered criterion-referenced tests. In this case, the objective is simply to see whether the student has learned the material. Criterion-referenced assessment can be contrasted with norm-referenced assessment and ipsative assessment. Criterion-referenced testing was a major focus of psychometric research in the 1970s.
Level of measurementLevel of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scales, of measurement: nominal, ordinal, interval, and ratio. This framework of distinguishing levels of measurement originated in psychology and has since had a complex history, being adopted and extended in some disciplines and by some scholars, and criticized or rejected by others.
Criterion validityIn psychometrics, criterion validity, or criterion-related validity, is the extent to which an operationalization of a construct, such as a test, relates to, or predicts, a theoretical representation of the construct—the criterion. Criterion validity is often divided into concurrent and predictive validity based on the timing of measurement for the "predictor" and outcome. Concurrent validity refers to a comparison between the measure in question and an outcome assessed at the same time.
Rating scaleConcerning rating scales as systems of educational marks, see more articles about education in different countries (named "Education in ..."), for example, Education in Ukraine. Concerning rating scales used in the practice of medicine, see articles about diagnoses, for example, Major depressive disorder. A rating scale is a set of categories designed to elicit information about a quantitative or a qualitative attribute.
Scale (social sciences)In the social sciences, scaling is the process of measuring or ordering entities with respect to quantitative attributes or traits. For example, a scaling technique might involve estimating individuals' levels of extraversion, or the perceived quality of products. Certain methods of scaling permit estimation of magnitudes on a continuum, while other methods provide only for relative ordering of the entities. The level of measurement is the type of data that is measured.
Summative assessmentSummative assessment, summative evaluation, or assessment of learning is the assessment of participants in an educational program. Summative assessments are designed to both assess the effectiveness of the program and the learning of the participants. This contrasts with formative assessment, which summarizes the participants' development at a particular time in order to inform instructors of student learning progress. The goal of summative assessment is to evaluate student learning at the end of an instructional unit by comparing it against a standard or benchmark.
Formative assessmentFormative assessment, formative evaluation, formative feedback, or assessment for learning, including diagnostic testing, is a range of formal and informal assessment procedures conducted by teachers during the learning process in order to modify teaching and learning activities to improve student attainment. The goal of a formative assessment is to monitor student learning to provide ongoing feedback that can help students identify their strengths and weaknesses and target areas that need work.
Test scoreA test score is a piece of information, usually a number, that conveys the performance of an examinee on a test. One formal definition is that it is "a summary of the evidence contained in an examinee's responses to the items of a test that are related to the construct or constructs being measured." Test scores are interpreted with a norm-referenced or criterion-referenced interpretation, or occasionally both. A norm-referenced interpretation means that the score conveys meaning about the examinee with regards to their standing among other examinees.
Rasch modelThe Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, attitudes, or personality traits, and the item difficulty. For example, they may be used to estimate a student's reading ability or the extremity of a person's attitude to capital punishment from responses on a questionnaire.
Cronbach's alphaCronbach's alpha (Cronbach's ), also known as rho-equivalent reliability () or coefficient alpha (coefficient ), is a reliability coefficient and a measure of the internal consistency of tests and measures. Numerous studies warn against using it unconditionally. Reliability coefficients based on structural equation modeling (SEM) or generalizability theory are superior alternatives in many situations. Lee Cronbach first named the coefficient in 1951 with his initial publication, Cronbach's alpha.