Book/Testing Cycle

Z StaTest

< Book

Like all education, testing is a cyclical process. During the preparation, execution and evaluation of each test, we create (perhaps inadvertently) the simplest basis of the test cycle.

Fig. 7.1 Diagram of the simplest intuitive test cycle.

Once we start preparing tests repeatedly and systematically, we start to project our experience from previous runs into the creation of new items and tests, and this feedback creates the first complete test cycle, at the end of which we are ready to work (better) on a new round of tests.

For tests of great importance, which must meet the standards of validity and reliability, a number of steps need to be implemented that are not explicitly emphasized in the intuitive test preparation. Typically, a test cycle for a test of high importance might look like this:

Fig. 7.2 Diagram of the test cycle for tests of great importance. Loosely based on [1]

The outline of this book follows, in a practical sense, this “big” test cycle. A corresponding chapter exists for most of the steps. Therefore, we will be relatively brief in commenting on the individual steps, as the aim is only to remind you of what is discussed in the relevant chapter.

Defining Learning Objectives

The work on the test should start with the clarification of the objectives. By defining the learning objectives, the teacher defines the scope of the subject that the student should be able to master after completing the course. The teacher specifies the key competencies to be tested.

Test Plan

Test design is another key point of the whole process. It is necessary to establish how many items the test will contain on each thematic area and what types of items will be used. This phase is particularly important if the test is prepared in several versions that are to be compared with each other. The learning objectives will be reflected in the selection of questions and the ratio of representation of individual topics in the upcoming test. Named after the blue colored copies of building plans, this test planning is called blueprinting.

Plan Review

When preparing important evaluations, the influence of the individual preferences of individual educators must be minimized. Therefore, the test plan needs to be opposed by other teachers, so that the representation of topics and the use of test formats is based on the consensus of multiple teachers.

Item Creation

Perhaps the most demanding stage of test preparation is the creation of items. Teachers will design new items in accordance with the topics determined by the blueprint and in the format prescribed by the blueprint.

Item Review

Items usually bear the "handwriting" of the author. That is why we present them to teachers who also know the target group and the subject being discussed. When the questions are answered, the items are submitted for assessment to a group of experts (for example, the test preparation methodology of the Rogō program recommends at least 5-9 people, which is, of course, mostly unrealistic in our conditions), who, according to the prepared form, go through the test items and verify the individual aspects that the new item must meet, and possibly suggest any necessary modifications. An experienced item author must then go through the individual reviews, assess the relevance of the comments, and edit the items if necessary. When supervising reviews, the supervisor can give reviewers credit for good quality reviews. This provides reviewers with feedback and at the same time generates information about their performance and usefulness.

Assembling a Test

The author of the test chooses items from the pool of created items in such a way as to fulfill the intention of the blueprint and at the same time comply with other (often unspoken) requirements. For example, to keep the test's difficulty and time demands reasonable, ensure that the number of calculation items is not higher than in other versions, and so on.

Piloting and Reviewing a Test

If the test is to be of high quality, a pilot run and a test review must also be part of its preparation. To check the behavior of the items and of the test overall, it is advisable to try a pilot run. The analysis of the results of the pilot test can show the (in)ability of the items to differentiate students according to mastery of the material, clarify their objective difficulty, etc. Pilot testing is time- and organizationally demanding, therefore often only the first run of the test is used as a pilot run. In addition to piloting, we have the quality of the test checked by a group of experts who identify and remove the last remaining errors, and ambiguous or problematic wording. Even if, after all these checks, it seems that there can be no more problems in the test, some problems are always discovered!

Setting Boundaries

An important step is to set the borderline for passing the test. If the test is linked to criteria that the participant must meet in order to pass, this step is called absolute standardization and its time comes precisely at this point in the test cycle. Setting the criteria for passing the test in advance gives the test takers confidence that the boundary will not be set on purpose so that a particular participant will still pass. For teachers, setting an objective cutoff is a means of ensuring that only sufficiently competent students pass the test. There are more options for finding “boundaries”, let's recall, for example, the gold standards – the Angoff and Ebel methods.

Test Implementation

As we have already mentioned, a written test can be given in paper or electronic form. In both cases, it is necessary to ensure the creation of “test versions”, distribution of tests to students and the collection of their answers. In addition, for tests where the results will have a significant impact, we must ensure a level playing field for participants, such as supervision during the test and more.

Processing of Responses

This step in the test cycle mainly concerns paper testing, when the answers on the collected answer forms must be optically scanned.

Preliminary Analysis of the Test

For tests of great importance, it is desirable to pre-analyze the test before evaluating the results. We can recognize gross errors such as a mistyped item key or wording errors from the suspicious behavior of items. For such items, the key is modified to make the item work before the test is evaluated, or the item is removed from the tally (all participants receive a point for it). This will prevent a situation where, after the announcement of the results, someone could complain about an incorrect question and it would be necessary to recalculate, say, the order of accepted students.

Grading Students

Student grading is the most important output of the test. During grading, it is possible to compare the number of points (total score) achieved by individual students and thus determine their relative placement. Using expert estimation (e.g. the Ebel or Angoff method) we determine the threshold for the “pass” or “fail” decision (so-called absolute standardization) and by dividing the success interval into the necessary number of parts, we can determine students’ grades in the form of grading scales – grades. The anonymization of the tests before the full evaluation (grading) of the tests contributes to ensuring equal conditions for the participants.

Setting Relative Boundaries

If the test is aimed at comparing performance among the students, then setting the thresholds for passing this test cannot be done before this moment, when the results are known and any problems revealed by a quick analysis are solved. Test takers are ranked according to their test scores, and the dividing line is set either according to some method of relative standardization, or arbitrarily in the event that the test was given to select suitable candidates for a limited number of positions.

Grading of students

In this step, the students' results are converted into a grade assessment and provided to the students.

Analysis of Test Results After the test round, data is available that can be used to examine in more detail how the test actually performed. While in the quick analysis it was only about identifying, and eliminating, possible problems, in the test analysis we examine the characteristics of the items and the test. This enables us to provide feedback to authors and reviewers as to how well they “hit the mark” in their work. The test is a measuring tool and, like any tool, it has its own characteristics that are important to know. It is optimal to be able to estimate the quality of the test even before its live deployment, i.e. already during pilot testing. The properties of the test then need to be verified on the target group during live use. When using the test repeatedly, it is useful to compare the results between individual test runs.

Evaluation and Reporting

The results of the test analysis are reported as feedback, both to the authors and reviewers, as well as to the relevant responsible persons in the hierarchy of the institution. Since reporting is a common procedure for high-impact tests, many test analysis programs also include tools to prepare the report or parts of it. (Iteman, Xcalibre, Remark Office, ...).


Reference

  1. SCHUWIRTH, Lambert W. T. a Cees P. M. VAN DER VLEUTEN. General overview of the theories used in assessment: AMEE Guide No. 57. Medical Teacher [online]. 2011, 33(10), 783-797 [cit. 2021-10-31]. ISSN 0142-159X. Dostupné z: doi:10.3109/0142159X.2011.611022