Book/Validity

Validity (correctness, truthfulness, faithfulness, soundness) describes the extent to which a test measures what we want it to measure. The validity of a test refers to the extent to which conclusions based on its results are meaningful and useful. That is, if the test is designed correctly and if its result is not too affected by systematic errors.

By definition, the validity of a test is the extent to which both theory and gathered evidence support the proposed interpretation of test scores when the test is used as recommended^[1]. It is clear from the definition that validity (as opposed to reliability) is a construct that cannot be measured directly. It can only be inferred from the context of other observations.

In practice, we must ask whether our test is really measuring what it is supposed to measure. The resulting validity is affected by a whole chain of assumptions that must be kept in mind. For example, if we use a test on standard high school subjects to select applicants to study medicine, then we should consider:

Whether the test measures the knowledge and skills the student could have acquired in high school.
Whether the ability to master subjects taught in high school predicts the ability to graduate from college.
Whether graduating from college predicts a graduate's ability to be a good doctor.
Whether the test result is affected by any secondary factors (e.g. availability of preparatory materials).

It is clear that making a precise statement of validity runs into some fundamental problems. For example, it is difficult to describe what a good doctor is. Abroad, this is sometimes circumvented by examining the degree of academic and professional success of graduates. But this is a simplification, because even a completely unambitious graduate who leaves to work as a district doctor in the borderlands can be a good doctor. When estimating the validity of entrance exams, we are therefore often satisfied with the success rate expressed as the ability to successfully graduate school in the allotted time span. The compromises continue, however, since in practice it is often not possible to wait for the passing of an entire period of regular study to verify validity and we must satisfy ourselves with academic success after, for example, the first year of study. This adds another link to our chain of assumptions, where we assume that successful completion of the first years of study predicts success over the entire course of study to an acceptable extent. In reality, such an assumption may have only limited soundness, because, for example, the first years of study at medical schools are devoted to theoretical fields and the upper years to clinical study.

Test validation

Since the validity of a test cannot be measured directly, in practice we focus on validating the test by collecting evidence that the test is sound. Test validation involves the collection of empirical data and logical arguments that demonstrate that the conclusions are indeed appropriate. The evidence we seek to demonstrate validity can be of a varied nature. Individual types of evidence are not interchangeable, but rather they intertwine and complement each other.

Content Validation

Content validation deals with the relationship between the content of the test and the target competencies that the test taker is meant to have achieved. During test preparation (especially when planning and reviewing the test), several experienced educators are address the question of whether and to what extent the questions included in the test cover the knowledge and skills tested for, and conversely, whether all the questions fall within the area being tested and are not testing something else. It is also examined whether the representation of items devoted to individual topics is proportionally balanced. Assessment of content validity is in a way a check whether the test plan (i.e. its blueprint) has been followed.

It always depends on the purpose of the test. For example, if the purpose of the test is to evaluate the educational program, it may also include topics that have not yet been covered, and the test then actually determines how students are able cope with new topics. On the other hand, if the test is intended to assess whether the test taker can advance to the next year, the content of the test must be strictly based on the content of the material already taught^[1].

During content validation, it is also necessary to monitor whether the interpretation of the achieved test scores does not favor any of the tested subgroups.

Criterion Validation

The content validation mentioned above is used to verify whether the test being prepared corresponds to the objectives of the tested field. However, it does not demonstrate how such a test corresponds to the objective criteria (e.g. academic success) with which we would like to compare our test. This is where criterion validation comes in, which examines the relationship between the test result and an objective independent criterion or criteria (grades, progression to further study, successful completion of school, ...).

In general, we distinguish two types of studies that examine the connection between the test and the criterion: concurrent and predictive studies.

When investigating concurrent validity, we compare a validated test and a criterion at the same time and compare whether they are really alternative ways of measuring the same construct^[1]. In principle, another, already validated test, can be a concurrent criterion. We then find out to what extent the results of the new test being examined agree with this verified test. The degree of agreement can be expressed, for example, using the correlation coefficient.

Predictive validity describes the extent to which our test predicts future values of some criterion. Predictive validity is a key parameter of all admissions tests. The purpose of entrance exams is to select students with the best dispositions for future studies. It is therefore appropriate to determine whether the tests used really predict success in studies. In practice, this means that the correlation of entrance exam results with study success is determined, or that a regression model is estimated from the data, which can be used to predict study success.

Additionally, we may be interested in whether the given test brings new information beyond that which we obtain in other ways, i.e. what is its incremental or growth validity. In the case of the aforementioned entrance tests, we may be interested, for example, in whether the entrance tests add new information regarding the applicant's future studies beyond that provided by his or her high school grades. E.g. study^[2], based on data from students admitted to the 1st Faculty of Economics of the Charles University, showed that secondary school performance explains roughly 15% of the variability in academic success. The result from the entrance exam increases the percentage of explained variability of success to 22%, the addition of information on successfully completed profile subjects in high school to 25% and information on the year of graduation even raises the percentage to 30%. All the mentioned effects were significant (i.e., statistically proven) in the model, thus proving their growth validity^[3].

Those interested in test validation can find more detailed information in a number of sources^[4], ^[5],^[6].

Construct Validation

The construct validity of a test refers to whether the test measures the desired psychological construct. It is one of the most important proofs of validity. The test attempts to assess the skills of the student that cannot be measured directly in any way – they are latent. We are therefore trying to create an abstract conceptual construct (model) that helps us understand and describe this latent ability.

As an example, let's imagine a math test. A latent ability may be the ability to solve a certain type of mathematical word problem. If the test is supposed to assess this latent ability, but the test questions are written in long, rambling sentences, it may be that we are actually measuring the ability to navigate a long and complicated text – a completely different concept. Performance is then influenced by a factor that has no connection with the measured construct, so from the test perspective it is therefore a construct-irrelevant variance.

Demonstrating construct validity requires gathering multiple sources of evidence. Evidence is needed that the test measures what it is supposed to measure (in this case, knowledge of basic mathematics) and also evidence that the test does not measure what it is not supposed to measure (reading skills). This is referred to as convergent and discriminant validity evidence.

Convergent validity evidence consists of providing evidence that two tests that are supposed to measure closely related skills or types of knowledge are highly correlated. This means that two different tests end up scoring students similarly. Discriminant validity evidence, by the same logic, consists of providing evidence that two tests that do not measure closely related skills or types of knowledge are not highly correlated (i.e., will produce different student rankings).

Both convergent and discriminant validity provide important evidence for construct validity. As stated earlier, a basic mathematics test should primarily measure math-related constructs and not reading-related constructs. In order to determine the construct validity of a particular mathematics test, it would be necessary to show that the correlations of the results of that test with the results of other mathematics tests are higher than the correlations with reading tests.

Generalization of Proof of Validity

For the practical use of the test-criterion relationship in new settings (e.g., the same course in the next academic year), evidence needs to be provided that validity checks obtained in previous settings can be used to predict the degree of validity in a new but similar setting. This step, which is the opposite of the situational specificity hypothesis, is called generalizability of validity and is usually verified through meta-analyses. We try to assess whether the parameters of previous studies assessing criterion validity are reasonably comparable. The results generally support arguments for generalization of validity, suggesting that it is not necessary to conduct a new proof of validity in each new case unless the conditions and parameters of the study are significantly different^[7].

Summary of Validity Evidence

Overall validation integrates individual evidence of the validity of the intended interpretation of test scores, including the inclusion of the technical quality, fairness, and score reliability of the test.

↑ ^1,0 ^1,1 ^1,2 Chybná citace: Chyba v tagu <ref>; citaci označené Standardy není určen žádný text
↑ ŠTUKA, Čestmír, Patrícia MARTINKOVÁ a Karel ZVÁRA, et al. The prediction and probability for successful completion in medical study based on tests and pre-admission grades. The New Educational Review [online]. 2012, roč. -, vol. 28, no. 2, s. 138-152, dostupné také z <http://www.educationalrev.us.edu.pl/vol/tner_2_2012.pdf>. ISSN 1732-6729.
↑ Chybná citace: Chyba v tagu <ref>; citaci označené Testovani2012 není určen žádný text
↑ BYČKOVSKÝ, Petr a Karel ZVÁRA. Konstrukce a analýza testů pro přijímací řízení. 1. vydání. Praha : Univerzita Karlova v Praze, Pedagogická fakulta, 2007. 79 s. ISBN 978-80-7290-331-3.
↑ ZVÁRA, Karel. Regrese. 1. vydání. Praha : MATFYZPRESS, vydavatelství Matematicko-fyzikální fakulty Univerzity Karlovy v Praze, 2008. 254 s. ISBN 978-80-7378-041-8.
↑ BYČKOVSKÝ, Petr a Karel ZVÁRA. Konstrukce a analýza testů pro přijímací řízení. 1. vydání. Praha : Univerzita Karlova v Praze, Pedagogická fakulta, 2007. 79 s. ISBN 978-80-7290-331-3.
↑ Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization. Journal of Applied Psychol-ogy, 62, 529–540.

[Standardy-1] 1,0 ^1,1 ^1,2 Chybná citace: Chyba v tagu <ref>; citaci označené Standardy není určen žádný text

[2] ŠTUKA, Čestmír, Patrícia MARTINKOVÁ a Karel ZVÁRA, et al. The prediction and probability for successful completion in medical study based on tests and pre-admission grades. The New Educational Review [online]. 2012, roč. -, vol. 28, no. 2, s. 138-152, dostupné také z <http://www.educationalrev.us.edu.pl/vol/tner_2_2012.pdf>. ISSN 1732-6729.

[Testovani2012-3] Chybná citace: Chyba v tagu <ref>; citaci označené Testovani2012 není určen žádný text

[Byčkovský-4] BYČKOVSKÝ, Petr a Karel ZVÁRA. Konstrukce a analýza testů pro přijímací řízení. 1. vydání. Praha : Univerzita Karlova v Praze, Pedagogická fakulta, 2007. 79 s. ISBN 978-80-7290-331-3.

[5] ZVÁRA, Karel. Regrese. 1. vydání. Praha : MATFYZPRESS, vydavatelství Matematicko-fyzikální fakulty Univerzity Karlovy v Praze, 2008. 254 s. ISBN 978-80-7378-041-8.

[6] BYČKOVSKÝ, Petr a Karel ZVÁRA. Konstrukce a analýza testů pro přijímací řízení. 1. vydání. Praha : Univerzita Karlova v Praze, Pedagogická fakulta, 2007. 79 s. ISBN 978-80-7290-331-3.

[7] Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization. Journal of Applied Psychol-ogy, 62, 529–540.

[1]

[2]

[3]

[4]

[5]

[6]

[7]