Book/Multiple Choice Items

Multiple-choice items predominate in written tests today. Their main advantage is that they are easy to evaluate. As we shall see below, they can take a number of forms. All have in common the fact that the examinee chooses one or more answers from the options offered. How the options are offered—checking a “radiobutton”, "checkbox", or selecting from a drop-down menu—is not decisive.

When it comes to properties and use in tests, it is important to divide multiple choice items—regardless of their formal appearance—into two groups:

Dichotomous items (TRUE/FALSE items): One offered option (or more of them) is completely correct, the others are completely wrong.

Example:

Indicate whether the statement is true:

A fin whale is a mammal that lives in the sea TRUE – FALSE

Scoring dichotomous items is simple. Most often, one point is awarded for a correct answer, nothing for an incorrect one. Less common are scoring schemes in which other numbers of points are assigned, e.g. the point gain is weighted according to the difficulty of the item, or points are deducted for an incorrect answer.

The disadvantage of individual dichotomous items is that by simple guessing you can get an average of 50% of the maximum possible score. At first glance, this may not be a problem if the threshold that the student must reach in order to pass the test is correctly set. However, this reduces the discriminative power of the test. Therefore, some authors recommend various modifications of dichotomous items, for example requiring that, along with each “FALSE” answer, the student states how the question would need to be changed to get a "TRUE" answer^[1]. This actually creates a combination of a multiple-choice question and an open-ended question.

Bundles of dichotomous items

(Also referred to as: multiple true/false, MTF; multiple response question, MRQ.)

Sometimes, several dichotomous items are combined into a bundle with a common core.

Example:

A dog is a popular pet. There are many breeds that vary in size, color and temperament. Which statement about dogs is true?

a) Some dog breeds have no hair at all. TRUE FALSE

b) Regardless of size and color, all dog breeds belong to a single biological species. TRUE FALSE

An important feature of dichotomous item bundles (MTF) is that the examinee has to decide on each statement independently, without regard of the other statements in the bundle. In other words, we can break down the above set of dichotomous items into two separate dichotomous items:

Item 1:

A dog is a popular pet. There are many breeds that vary in size, color and temperament.

Indicate whether the statement is true:

Some dog breeds have no hair at all. TRUE FALSE

Item 2:

A dog is a popular pet. There are many breeds that vary in size, color and temperament.

Indicate whether the following statement is true:

Regardless of size and color, all dog breeds belong to a single biological species. TRUE FALSE

The formal appearance of dichotomous items and their sets can vary. Most often, a TRUE/FALSE or TRUE/FALSE answer is chosen for each statement. It is less appropriate to ask the test taker to mark the statements that are true and leave the false statements unmarked. In this case, the bundle of dichotomous items (MTF) is similar to items with a single best answer (SBA), which, however, have different characteristics and do not require an answer for each of the offered options separately.

There are, however, other possibilities, in which mutually exclusive alternatives are indicated.

Example:

We have five test tubes available. Each of them contains 1 ml of a solution of one of the carbohydrates listed below with a concentration of 1 g/l.

We add 1 ml of potassium hydroxide solution (2 g/l) to each of the test tubes and boil the mixture briefly. Then we add a solution with complex bound divalent copper to all test tubes. The resulting color of the mixture in some tubes is blue, in others red.

For each carbohydrate, circle what color you expect the mixture to be after the described experiment is over:

a) amylose BLUE – RED

b) fructose BLUE – RED

c) glucose BLUE – RED

From individual dichotomous items, their set often differs in terms of scoring. Different scoring schemes are used:

All or nothing - 100% (most often 1 point) is given if all answers in the set are correct, 0 points in all other cases
Partial score – each partial dichotomous question is scored independently, e.g. 0.25 points
Partial weighted score – each partial dichotomous question is scored independently, each has a different point value (e.g. according to importance or difficulty)
Penalty scoring – negative points are given for some wrong answers
Partial scoring – e.g. PS50: If the test taker answers the entire set correctly, they receive 100%. If they answer more than half of the component dichotomous questions correctly, they will receive 50%. In other cases, they get nothing.
Guessing correction – an estimate is made of what score the examinee could have achieved for the set by random guessing and the result is corrected
Other more complex methods.

Most frequently used are the All or Nothing and PS₅₀ methods, while others are being discarded.

Problems with only one correct answer: (Single Best Answer, or SBA); The most often used and at the same time the most effective type of multiple choice problems are problems with a single correct answer. In appearance, they may resemble sets of dichotomous items, but they are constructed differently. Again, the item has a stem followed by an offer of several options. The task is to choose an answer that is significantly better than all the others. Thus, the examinee does not evaluate each option separately and does not try to determine whether it is valid or not, as in a set of dichotomous problems, but compares the offered options with each other. At the same time, none of the options offered may be completely correct in all circumstances, and none may be completely wrong. On the other hand, it must be possible to rank the options offered from best to worst.

Comparison of TRUE/FALSE and SBA type multiple choice item
TRUE/FALSE Item	An item with a single best answer
The shape of the Earth is close to ___________	The shape of the Earth is close to that of a rotating body. Which of the following is it most similar to?
Sphere TRUE – FALSE Ellipsoid TRUE – FALSE Ovoid TRUE – FALSE Cylinder TRUE – FALSE	Sphere An ellipsoid Ovoid Cylinder

Both problems in the example ask the same question and offer the same solutions. In both cases, the author considers option 2 to be the correct answer. Note, however, that the TRUE/FALSE item is not completely unambiguous: it can be argued that the Earth does not have an exact ellipsoid shape, and on the other hand, its shape can be approximated accurately enough by a sphere for some purposes.

In the case of an SBA-type item, the situation is different: the examinee has to choose the most accurate (not necessarily completely accurate) answer. The solution is clear.

At the general level, it can be said that for higher education, SBA type items are more suitable than sets of dichotomous items^[2]. The fact that it is impossible to say with absolute validity whether an individual option in SBA is completely correct or, on the contrary, completely wrong, reflects real life. Testing with SBA better prepares students for real world experience. On the other hand, it tends to be difficult to create a larger number of MTFs for a certain topic in a way that ensures the items are truly unambiguous. The pursuit of clarity often leads to the refinement of the assignment, which is then increasingly longer and more detailed, but often also more instructive, resulting in an MTF item that is clear but at the same time very easy. It can therefore be said that, for a certain topic, more SBA items of high quality can be created than MTF items. The common concern that a problem with a single correct answer will be easier and more predictable than a set of dichotomous problems that may have more than one correct answer is not justified. In practice, however, it turns out that properly constructed SBA-type items tend to be more difficult and usually differentiate better than MTF items.

The reader can find more detailed information on the creation of SBA type items in the chapter entitled Recommendations for the Creation of Test Items.

Note: Multiple-choice questions (MCQ) We often come across the term Multiple-choice question, MCQ. This is a more general term that includes multiple true-false (MTF), single best answer (SBA), and other types of items. In this publication, we deliberately avoid the term MCQ, as its meaning is not clear-cut. This is because in common communication, the term MCQ is often narrowed down to cover only the most common type of items, and depending on the customs in a specific geographical area, it means something different each time:

In the literature, MCQs are most often synonymous with single-answer items, i.e. SBA.
In some parts of the world, MCQs are most often used as a designation for bundles of dichotomous items, i.e. MTF.

Due to the fundamental differences in the construction and properties of SBA and MTF, the term MCQ can cause unpleasant misunderstandings.

Matching Questions

The matching question consists of a set of premises and answers. The examinee's item is to assign the best answer to each premise. Matching problems can have different ratios between the number of premises and answers, and various subtypes are sometimes distinguished accordingly. In the simplest case, the number of premises and answers is the same, and it is given that each answer belongs to exactly one premise. Another possibility is that there are more premises (and some answers are used more than once).

Example:

Put each animal into a group according to the type of food it eats
1. Domestic pig _____	A. Carnivores
2. Desert lion _____	B. Herbivores
3. Steppe zebra _____	C. Býložravci
4. Przewalski’s horse _____
5. Nile crocodile _____

Conversely, there may be more answers offered than premises. An extreme case are so-called extended matching questions, or EMQ. In many ways, they resemble several SBA items in a row, but the range of options is significantly larger (typically more than ten) and the same set of answers is used for multiple premises. EMQ-type items have become widespread in the medical field, where they have mainly been used for testing clinical disciplines.

Example:

Choose the most likely diagnosis for each case report of back pain from the following menu:

A. Ankylosing spondylitis

B. Dissection of the aorta

C. Intervertebral disc herniation

D. Lumbar spondylosis

E. Vertebral fracture

F. Intervertebral disc infection

G. Pars interarticularis defect

H. Metastasis to the vertebral body

I. Renal colic

J. Herpes zoster

Item 1:

A 23-year-old man has a six-month history of lower back pain. The pain mainly affects the thoracolumbar junction and the right buttock. The pain is usually worst in the morning, and makes it difficult for him to get out of bed. There is a partial improvement during the day. During the examination, we find limited mobility of the lumbar spine, especially lateral flexion.

Item 2:

A 32-year-old woman comes in due to sudden pain in the lower back. The pain is constant, it does not depend on her position. All spinal movements are limited and painful. Three weeks ago, she had a urinary tract infection which was treated with amoxicillin.

Matching items can take a variety of graphical forms. For example, the test taker can write the letter or number of the chosen answer for each premise, or they can connect the premises and answers with a line. When testing on a computer, answers are often selected from a drop-down list, or answers are dragged to the premises using the mouse. In a broader view, matching items also include, for example, placing labels into an image.

Matching items are widely used, for example, in language learning. Their features are very similar to single best answer questions, essentially a sort of bundle of SBA questions. In many fields, matching questions are gradually being abandoned, they are being replaced by questions of the SBA type. A smaller number of types of items that are used in a certain test is usually an advantage, because the test taker does not have to think so much about what form of answer is expected from him, and can better concentrate on answering the questions themselves. This also makes the test “friendlier”, reducing test anxiety.

Matching tests are scored using procedures similar to those for scoring dichotomous item bundles (MTF), most commonly all-or-none, subscore, or PS₅₀ methods.

Ordering Items

The examinee has the task of ordering the presented items (e.g. concepts, events) according to a certain rule. It can be, for example, the ordering of the steps of a certain procedure or the arrangement of some objects according to some quantity or property.

Example:

Rank the liquids from highest to lowest freezing point.

Water

Oil

Alcohol

Glycerine

From a formal point of view, ordering items can resemble matching questions, since the examinee assigns its order to each item. In some cases, arrangement items can have more than one correct solution, e.g. the seasons follow one another in the order spring – summer – autumn - winter, but also autumn – winter – spring – summer, etc.

The weakness of the ordering items is that they are difficult to evaluate. An all-or-nothing method is sometimes used, but this assessment tends to have low sensitivity. Therefore, evaluation is most often performed sequentially in pairs and it is examined whether the items in the pair are arranged correctly or not:

Example:

On the table lie four cubes of the same size, each cast from one metal – iron, aluminum, copper and gold. Sort the cubes from lightest to heaviest.

The correct order: aluminum – iron – copper – gold

Examinee's answer: aluminum – copper – iron – gold

aluminum – copper: correct order

copper – iron: wrong order

iron – gold: the correct order

The examinee receives 2/3 points for the item.

Odkazy

Reference

↑ KUBISZYN, Tom a Gary BORICH. Educational Testing and Measurement. - vydání. Wiley, 2000. 530 s. ISBN 9780471364962.
↑ Simbak, Nordin & Aung, Myat & Ismail, Salwani & Mat Jusoh, Norhasiza & Ali, Tarik & Yassin, Wisam & Haque, Mainul & mohd amin rebuan, Husbani. (2014). Comparative Study of Different Formats of MCQs: Multiple True-False and Single Best Answer Test Formats, in a New Medical School of Malaysia. International Medical Journal. 21. 562-566.

[1] KUBISZYN, Tom a Gary BORICH. Educational Testing and Measurement. - vydání. Wiley, 2000. 530 s. ISBN 9780471364962.

[2] Simbak, Nordin & Aung, Myat & Ismail, Salwani & Mat Jusoh, Norhasiza & Ali, Tarik & Yassin, Wisam & Haque, Mainul & mohd amin rebuan, Husbani. (2014). Comparative Study of Different Formats of MCQs: Multiple True-False and Single Best Answer Test Formats, in a New Medical School of Malaysia. International Medical Journal. 21. 562-566.

[1]

[2]