Book (2022)/Test Piloting

Credible testing of learning outcomes, especially if it affects students’ further progress, presupposes that we know the properties of the test being used before it is actually used in a live setting. Pilot testing and pretesting are used to determine test properties.

A note on the terminology: The two terms partially overlap. The term pilot testing is mostly used in this book as a broader designation of both steps. If differentiation of the two steps is needed, the term pilot testing refers to a more general “proof of concept” – a kind of feasibility study that reveals possible errors in the concept and design of the test on a small group of students and can also provide useful subjective feedback. The term pretest then refers to a more formal and detailed pre-screening of the test, which makes it possible to estimate the psychometric properties of the questions, their difficulty, the ability of the test to distinguish between stronger and weaker test takers, and which makes it possible to obtain subjective and objective feedback from the tested group.

Pretesting uses comparable procedures for test evaluation to those used to draw conclusions from “live” testing. While a smaller group of students (for example, 20^[1]) with the same level of knowledge and motivation as the target group is sufficient for the pilot run of the test itself, a larger group of at least 100 respondents is needed for the pretest, which is used to calculate the statistical parameters of the items.

In view of the demanding and time consuming nature of building a relevant group, the first "live" run of the testing itself is often used as a pre-test. The inputs obtained from the evaluation of the preliminary tests need to be incorporated into the design of the final version of the test. It is usually necessary to modify at least some items. If the pretest shows significant deficiencies, however, it may also be a case of reworking the entire test concept^[2].

Subjective Feedback

Subjective feedback provides very important information from a selected sample of the target group of respondents – typically from selected students. They can help us identify ambiguities or errors in the questions with their subjective opinions. The opinions of each member of the selected group must be taken into account and their comments and suggestions must be considered. The composition of the pilot group should be balanced, i.e. it should not be composed, for example, of only pupils with above-average results, or, on the contrary, expressly underperforming students. Multiple resources are available for the implementation itself. With respect to the efficiency of further processing, the most widespread is the questionnaire format, in electronic form, where the answers can be easily processed and passed on to the working group in a clear format. Below is a list of suitable options for how subjective feedback can be obtained:

Questionnaire
Discussion group
Frontal teaching discussion (in the case of a smaller number of students, in greater numbers this option becomes ineffective)
Notes in the test or so-called thinking aloud (see ^[3]), when students are asked to comment or record their thought processes while solving the test

Objective feedback

Objective feedback is important for its irrefutability, which is based on the mathematical processing of pilot test results. The conclusions of the objective feedback give indications for the possible modification of unsatisfactory test items. Among the most well-known and widely used test evaluation outputs are:

Evaluation of test item difficulty (identification of easy and difficult questions, unsatisfactory items, the possibility of arranging items according to difficulty)
Determination of sensistivity of individual items (analysis and adjustment or exclusion of items with undesirable sensitivity)
Evaluation of test quality as a whole, primarily its reliability and validity

When evaluating the results of the pilot group test, we must bear in mind the possible differences between the pilot group and the target group, caused, for example, the different motivations of the two groups. It is a good idea to minimize these differences in advance, e.g. with a suitable “legend” accompanying the pilot test.

Odkazy

Reference

↑ ALDERSON, J, Caroline CLAPHAM a Dianne WALL. Language test construction and evaluation. New York, NY, USA: Cambridge University Press, 1995, 310 p. ISBN 0-521-47255-5.
↑ KOMENDA, Martin a Andrea POKORNÁ. Benefity a úskalí elektronického testování [online]. Brno : Masarykova univerzita, 2011, dostupné také z <https://www.mefanet.cz/res/file/publikace/benefity-uskali-elektronickeho-testovani.pdf>.
↑ TAVAKOL, Mohsen a Reg DENNICK. Post Examination Analysis of Objective Tests. 1. vydání. AMEE, 2011. AMEE guide; sv. 54. ISBN 978-1-903934-91-3.

[1] ALDERSON, J, Caroline CLAPHAM a Dianne WALL. Language test construction and evaluation. New York, NY, USA: Cambridge University Press, 1995, 310 p. ISBN 0-521-47255-5.

[2] KOMENDA, Martin a Andrea POKORNÁ. Benefity a úskalí elektronického testování [online]. Brno : Masarykova univerzita, 2011, dostupné také z <https://www.mefanet.cz/res/file/publikace/benefity-uskali-elektronickeho-testovani.pdf>.

[AMEE_guide_54-3] TAVAKOL, Mohsen a Reg DENNICK. Post Examination Analysis of Objective Tests. 1. vydání. AMEE, 2011. AMEE guide; sv. 54. ISBN 978-1-903934-91-3.

[1]

[2]

[3]