Book/Analysis of a Test and Its Items

A summative didactic test can be understood as a tool for measuring the level of knowledge and skills that a student has acquired during the learning. The results of decisive testing can have fundamental consequences for the test participants – for example, acceptance or rejection for further studies, certification for a certain profession or the award of a degree. If the tests were inadequate for their purpose and did not measure the qualities we expect them to measure, substantial errors in decision-making could occur, thereby reducing the effectiveness and jeopardizing the credibility of the entire system. It is therefore important to measure and continuously monitor the quality of tests and test items.

Part of the properties of tests (and items) can be described using intuitively the comprehensible categories of difficulty and sensitivity. Difficulty can be understood as the probability with which the test taker will not answer the given test or item correctly. Sensitivity refers to the degree to which a test or item differentiates between better and less prepared students.

In addition to these intuitive metrics, we also use the terms reliability and validity to describe test properties. Reliability expresses the accuracy and repeatability of the test. Using reliability, we actually find out whether retesting the student with a different version of the same test will lead to confirmation of the previous result. Validity tells whether a test or item measures the knowledge we want to measure.

In addition to these traditional metrics, considerable attention has been paid in recent years to the fairness of tests. It is verified whether the test does not somehow disadvantage certain groups of test takers.

Item analysis makes it possible to evaluate the characteristics of individual test items, especially their difficulty and sensitivity, based on the analysis of the completed test. Item analysis can also include the analysis of distractors, which examines in more detail the quality of the options offered in closed-ended (multiple choice) items. It deals, for example, with how the test takers chose individual suggested answers depending on the test taker's overall performance.

Item analysis provides a range of psychometric data for each item, which makes it possible to construct independent tests with similar properties.

Test analysis should include its descriptive statistics and graphical display of the results, most often in the form of histograms. Comparing graphs from individual test runs will help us to assess whether, for example, some items used in the test were leaked, etc.

Let's first look at the properties of the test as a whole, especially its reliability and its validity.