Book (2022)/The Limitations of Testing

Z StaTest

Although education at universities and higher education institutions in general has been around for centuries, there is still no clear consensus on its actual objective[1]. There are probably multiple such objectives, and it also depends on the focus of the university. In general, we can probably say that a university graduate should leave as a professional prepared for the performance of a certain profession, occupation, or role. The traditional idea is that this requires the acquisition of knowledge and numerous skills. However, this alone is not enough. A university graduate should be able to work more or less independently, i.e., perform certain activities. After all, the authorization to perform a certain profession is often linked with obtaining a university degree.

The performance of a specific activity often requires more than simply acquiring a range of factual knowledge. You need to understand a certain area, have certain skills, but also adopt a professional attitude. If we are to guarantee that a university graduate is capable of professionally performing work in the given field, we should verify that he is sufficiently competent not only in terms of knowledge, but also in the corresponding “higher” levels.

Dimensions of Knowledge, Skills, and Attitudes

When describing the learning objectives, for example in the annotation of a subject or its division into a syllabus, their description is usually based on the substantive content of the subject – a list of topics that we want to teach. But this kind of division is not enough on its own. For high-quality learning, and then also for the assessment of its results, it is useful to add a second dimension to the list of thematic areas – to divide each topic into several levels according to the complexity of the educational objectives.

The most used model that describes the complexity of the objectives of education, upbringing and professional training is Bloom's taxonomy. It is actually not one, but three hierarchical models (also referred to as domains):

  • The cognitive (knowledge-based) domain,
  • The affective (emotion-based) domain, and
  • The psychomotor (action-based) domain.


In education, one works most often with the first domain, i.e. the domain of educational objectives, which relates to knowledge, and the ability to engage and use it. However, many authors repeatedly point out that the other two areas are also an equally important, albeit harder to grasp, part of education. Education or upbringing in the emotional domain leads to the development of professional attitudes. The psychomotor domain then includes the acquisition of practical skills.

Several versions of Bloom's taxonomy are currently in use. In the 1990s, it was extensively revised, and a two-dimensional map was created. Instead of three domains, this revised version works with four knowledge dimensions – factual, conceptual, procedural, and metacognitive knowledge. Each of these knowledge dimensions then contains six dimensions of cognition: remember, understand, apply, analyze, evaluate and create.

Given that the term Bloom's taxonomy refers to several different concepts and versions, we’ve chosen to work in more detail with only one of these, traditionally referred to as Bloom's taxonomy of educational objectives. It more or less corresponds to the cognitive domain of Bloom's original taxonomy from the turn of the 1950s and 1960s and the dimension of factual knowledge from the revision mentioned above.

Bloom's Taxonomy of Educational Objectives

The taxonomy within the scope of learning objectives describes the levels of competences and skills relating to factual knowledge, its comprehension and understanding of context. It has six levels: the lowest is knowledge, then understanding, application, analysis, evaluation, and the highest is creation. Traditional education tends to work precisely with this set of objectives, primarily with its lower levels.

Knowledge (also remembering)
The lowest level of learning objectives is knowledge, i.e. remembering and the ability to recall facts and the most fundamental concepts. The student learns basic terms and definitions. We mainly want the student to state something, repeat something, explain a concept, classify something in a certain classification scheme, etc., which the student can do even without a full understanding of the concepts he or she is working with – for example, a student can correctly classify a plant species in a family solely thanks to the fact that the student has learned which species belong to the respective family. At the same time, the students do not need to have any idea what the plant they are talking about looks like, what the characteristics of the given family are, and why the given plant actually belongs to that family. Knowledge at this level often has the character of isolated data from an encyclopedic dictionary.
Comprehension (also understanding)
Accomplishment of this educational objective can typically be demonstrated as the ability to explain or interpret certain material. A student who has achieved understanding is able to explain the main idea. With a grasp of the material, the student can also describe, compare, sort, or translate something into another language that he or she speaks, etc. Although the student understands the given subject, he or she may not yet be able to put it to use – typically this is reflected in the fact that the student cannot combine it with other knowledge that he or she also has and understands. For example, the student comprehends the movement of the Earth around the Sun and its connection with the changing of the seasons, as well being well familiar with the shape of the Earth, but he or she cannot answer the question of what the position of the sun is above the horizon in different geographical regions at the time of the summer solstice.
Application
A typical indication of having achieved this level is the ability to use acquired knowledge in new situations. The student can solve questions and problems that he or she has not encountered before. To do this, they must recognize connections and relationships and find a way to use them to solve problems. Often, the student must use rules or procedures that they already know, but in a different way than they have up to now. However, the student is only able to work with relatively uncomplicated assignments. This level is not sufficient for solving a complicated question, because, for example, the student does not yet distinguish between data essential for the solution, insignificant data, and data that is possibly missing and must still be acquired. A more complicated problem is therefore unsolvable at this level, because even though the student has the necessary component knowledge, he or she becomes lost in it.
Analysis
In Bloom's taxonomy, analysis means breaking down a complex whole or problem into smaller parts, which will enable a better comprehension of it. Analytical skills are needed, for example, to distinguish cause and effect, or to find evidence that supports some generalizing statement. This level also includes the ability to recognize the structure of some information, break it down into individual components, assess the relationships between them and, thanks to this, estimate the credibility of the information source. The achievement of this educational objective can also be demonstrated, for example, with a thought experiment in which the student can estimate how a certain intervention would change a certain event. To do this, they must analyze the action, recognize its components and the links between them, determine what exactly would change with the intervention under consideration and what consequences it will lead to.
Evaluation
Evaluation is one of the highest objectives in Bloom's taxonomy. It refers to the ability to evaluate information and, based on this evaluation, to make informed decisions, take positions or defend opinions. Evaluation requires the ability to first analyze information and is therefore linked to the preceding objective. Fundamentally, evaluation aims to assess the individual parts of the information and judge their significance and validity. The resulting verdict, meanwhile, should be key to solving a certain problem or creating something new. A typical question that demonstrates the achievement of this objective can be, for example, the preparation of a professional review of a scientific article, including recommendations for its publication, rejection, or modifications. Clearly, accomplishing such a question requires not only the achievement of all previous objectives, but also some creativity. It is not surprising then, that the order of the two highest objectives sometimes differs in different versions of Bloom's taxonomy.
Creation
The ultimate objective that a student can achieve through learning is creation. It is an expressly creative, productive objective. Achieving this objective allows the student to propose a new original solution, design something, invent something, etc.

Bloom's taxonomy forces us to think about how people learn, which is also valuable when considering how to assess learning outcomes. Although this is undoubtedly the most commonly used “compartmentalization” of the educational process, it still has its drawbacks and critics. The previous description makes it clear that while the nature of the lower categories of Bloom's taxonomy is quite unambiguous and easily applicable in practice, the higher levels are prone to increasingly abstract definitions, are more ambiguous, and there are even doubts about how exactly to organize them. Bloom's Taxonomy suffers from multiple limitations[2][3][4], for example:

  • Bloom's Taxonomy assumes that a person learns in a linear, sequential fashion – that he or she progresses from the most basic objectives to the higher ones. In reality, this is not the case. A learner can jump between individual “levels” and repeatedly return to lower categories. If a person learns to evaluate or create something, then he or she reanalyzes the result of his or her work, adds to his or her knowledge and learns to understand it, etc.
  • A person can also learn from the top of the pyramid towards its base, by creating something. In this case, other processes become applied, ones that Bloom's taxonomy does not describe very well: research, trying out a solution, creating a prototype, revision or critical assessment. Only these activities then force one to seek out sources and acquire new factual knowledge[5].
  • Bloom's Taxonomy assumes that a person learns in isolation from others – it is individualistic. It overlooks the social and connectivistic aspects of learning, which are very important in higher education[6].
Other Taxonomies

Bloom's Taxonomy is relatively complex – recall that in the text above we only worked with one of the knowledge dimensions, so we completely omitted everything related to procedural skills or attitudes. And this, together with its incompleteness, led to the emergence of other, variously understood models of learning and levels of competences achieved[7][8].

In the 1990s, specifically for the assessment of knowledge and skills, the so-called Miller's pyramid began to be used in medicine[9], and it then gradually spread to other fields[10][11]. The original four levels of assessed competencies were later supplemented by a fifth level[12].

  • Knowledge (the student is on the level of “knows”).
  • Competence (student knows how). The examinee can integrate the knowledge from the previous level into the current context.
  • Performance (shows how). The skill is already comprehensive, the examinee “is self-oriented in” and combines a wide range of knowledge and skills, which he or she often acquired in various subjects and areas of study.
  • Action (in practice, the student performs all the necessary actions correctly; action, does). This level should be reached, for example, by a candidate for the state final exam.
  • Identity (he or she is a true professional). A person who has reached this level can consistently demonstrate the attitudes, values and behaviors expected of a representative of a certain professional group. It can be said that the given person thinks, behaves and feels like a doctor, teacher, lawyer, designer, etc.

If we accept that a university graduate should be a professional prepared to perform a certain action, we should verify during the course or at the end of the studies that they have actually acquired the relevant competencies. While knowledge and understanding can usually be assessed very well using both standardized tests and non-standardized methods (such as oral examinations), assessing the highest levels of competence is more complicated. We would certainly consider it absurd if, for example, an orchestra hired a violinist based on a written test or an oral exam on violin playing. It would be equally absurd to claim that someone is, for example, a well-prepared teacher, lawyer, historian, or doctor based on a simple written test or oral exam. This would verify with greater or lesser credibility that he or she has acquired the knowledge and understanding that are a necessary prerequisite for the performance of the profession, but we would not ascertain whether he or she can also apply the knowledge in an appropriate manner, whether he or she has acquired the necessary skills and whether they can actually perform the activities that the specific work involves.

Knowledge and understanding can be accurately assessed using a well-prepared and standardized written test. Assessment of skills by written test, on the other hand, is only possible in specific cases. We can use a written test if the skill is, for example, the ability to solve a mathematical problem or to describe some reaction using chemical equations. However, most skills cannot be assessed by written test or oral exam – it would be difficult to test, for example, laboratory tasks, practical work with surveying instruments or blood sampling in this way.

To verify skills and activities, it is possible to use a technique generally referred to as practical examination or workplace (based) examination [13],[14]. As a rule, these are methods that can be standardized only with great difficulty. For items that are tested in practice, it is simply not possible to ensure identical (standardized) conditions for a larger number of candidates. Moreover, these methods often assess skills that cannot be fully classified in a standardized fashion, such as communication with a client or teamwork.

However, even these practical tests can be made objective, i.e., arranged in such a way as to suppress the influence of undesirable factors – for example, the subjectivity of the examiner or the variability of the conditions under which the test takes place. The next step is the validation of the practical tests, i.e., verification that the test result truly reflects the skills acquired, which are needed in real world practice.

Methods that make practical testing more objective tend to have several typical features:

  • Long-term or repeated performance is monitored rather than one-time performance within a single test session. If the practical exam takes place in a short period of time (e.g., during a single day), it is divided into several separate parts (often referred to as stations). Each of them is assessed by different assessors and each is focused on a different range of skills and activities.
  • The examinee is evaluated independently by a larger number of evaluators. The evaluators include experts in the given field, but often also other persons – for example, classmates, role players who conduct model communication with the examinee during the exam, and sometimes even technical staff.
  • The exam and its evaluation are structured, i.e., the evaluators comment on the monitored aspects of the performance (so-called exam rubrics) according to predetermined criteria.
  • The exam is validated as a whole, or its individual parts are validated.


A major shift in practical examination formats was brought about by the introduction of so-called objective structured clinical examinations (OSCE) in medicine in the mid-1970s. Twenty years later, this approach began to be used in other fields as well, which is why it is sometimes referred to as objective structured practical examination, OSPE. During the OSCE, students go through a series of stations in which they are confronted with common situations arising in everyday practice and are to perform a certain procedure. They are evaluated using a structured questionnaire, which is filled in by the attending evaluators. More weight is given to steps and procedures of a more general nature (in medicine, for example, the prevention of the spread of infection, communication with the patient during the procedure, patient instruction and explanation of the procedure, etc.), while less weight is given to actions that are narrowly specific.
Another way to assess the achievement of higher educational objectives is to compile portfolios. A portfolio is a systematically created set of samples of a student's work that demonstrates their efforts, progress, and achievement of educational objectives throughout the course or curriculum[15][16][17][18]. It is a distinctly constructivist assessment tool. Its advantage is that it faithfully reflects the achievement of the highest, creative educational objectives, as well as the adoption of professional habits and attitudes, value rankings, etc. On the other hand, portfolio is a highly individualized evaluation tool that does not allow for standardization, and it is also difficult to increase its objectivity. Compared to other tools, portfolios are quite time-consuming for both the student and the teacher. Implementing portfolio assessment requires careful preparation and seamless integration with the curriculum[19].

Quantitative and Qualitative Forms of Evaluation

In the preceding text, we mainly approached the evaluation of education results as a measure of the extent of the student’s achievement of the expected knowledge and skills. Taken this way, the result of the assessment is a certain quantity, grade, or numerical value. The charm of this concept lies in its comprehensibility and in the fact that the validity of the conclusions we draw can be easily examined by scientific methods. We can talk about the accuracy and reliability of such conclusions, where we can quantify the degree of accuracy by statistical methods, we can even measure the degree of uncertainty with which we communicate the result. We can also track the impact of every change we make in teaching and testing. Altogether, it allows us to standardize exams – to ensure that assessment results are substantiated, reproducible, objective and valid.

However, even the hierarchization of learning objectives according to the most common concept of Bloom's taxonomy already shows that standardized testing and examinations cannot capture the entire breadth of higher education. We have shown that it can only capture lower levels of cognition. Higher educational objectives can be evaluated with non-standardized, but still objective methods, and the result can still be of some value. However, for the evaluation of the most complex objectives, skills and attitudes, a simple one-dimensional expression is not enough.

Comprehensive competence, attitudes, behavior, or the achievement of professionalism in a certain area cannot be measured by a number. Although they are not measurable (or at least cannot be expressed by a one-dimensional quantity), they are describable. They can be described by verbal evaluation. This, however, always has a subjective component, and is typically a non-standardized assessment.

Approaches to evaluating educational outcomes are constantly evolving. Over the past few decades, there has been intensive development in the area of standardized assessment, and this method has become an integral part of education in developed countries. Thanks to the verifiability and defensible nature of the results, it has gradually completely displaced non-standardized methods in many areas. In recent years, however, the limits of such an approach are being pointed out[20]. Standardized methods remain the best-known tool in certain stages of learning, but some authors caution against these methods being the only tool used[21][22]. Non-standardized methods are not inferior, nor are they superior to standardized ones. They are merely different tools, each of which is suitable for something different.


Odkazy

Reference

  1. The aims of higher education. 1. Pretoria: Council on Higher Education, 2013. ISBN 978-1-919856-84-1
  2. SOOZANDEHFAR, Seyyed Mohammad Ali a Mohammad Reza ADELI. A Critical Appraisal of Bloom’s Taxonomy. American Research Journal of English and Literature: An Academic Publishing House [online]. 2016, 2016(2), 1-9 [cit. 2021-11-16]. ISSN 2378-9026. Dostupné z: https://www.arjonline.org/papers/arjel/v2-i1/14.pdf
  3. ROLAND, Case. The unfortuate consequences of Bloom’s taxonomy. Social Education [online]. National Council for the Social Studies, 2013, 77(4), 196-200 [cit. 2021-10-11]. ISSN 0037-7724.
  4. TUTKUN, Omer, DILEK GÜZEL, MURAT KOROĞLU a HILAL İLHAN. Bloom's Revized Taxonomy and Critics on It. In: MURAT, Koroğlu a İlhan HILAL. The Online Journal of Counselling and Education [online]. 2012, s. 23-30 [cit. 2021-10-11]. ISSN 2146-8192. Dostupné z: https://www.researchgate.net/publication/299850265_Bloom's_Revized_Taxonomy_and_Critics_on_It
  5. BERGER, Ron. Here’s What’s Wrong With Bloom’s Taxonomy: A Deeper Learning Perspective. Education Week [online]. 2018, 14.3.2018 [cit. 2021-11-16]. Dostupné z: https://www.edweek.org/education/opinion-heres-whats-wrong-with-blooms-taxonomy-a-deeper-learning-perspective/2018/03
  6. Criticisms of Bloom's Taxonomy: Educational theorists have criticized Bloom’s Taxonomy on a few grounds. Teachers commons: A place for teachers to share [online]. 24.4.2008 [cit. 2021-11-16]. Dostupné z: http://teachercommons.blogspot.com/2008/04/bloom-taxonomy-criticisms.html
  7. O´NEIL, Geraldine a Feargal MURPHY. UCD DUBLIN. Assessment: Guide to Taxonomies of Learning [online]. Dublin: UCD TEACHING AND LEARNING, 2010 [cit. 2021-10-11]. Dostupné z: https://www.ucd.ie/t4cms/ucdtla0034.pdf
  8. MALAMED, Connie. Alternatives to Bloom’s Taxonomy for Workplace Learning. The eLearnin Coach: Helping you design smarter learning experiences [online]. 2020 [cit. 2021-11-16]. Dostupné z: https://theelearningcoach.com/elearning_design/alternatives-to-blooms-taxonomy/
  9. MILLER, G. E. The assessment of clinical skills/competence/performance. Academical Medicine. 1990, vol. 65, no. 9 Suppl, s. 63-7, ISSN 1040-2446. PMID: 2400509
  10. PEÑALVER, Elena Alcalde. Financial Translation. CUI, Ying a Wei ZHAO, ed. Handbook of Research on Teaching Methods in Language Translation and Interpretation [online]. IGI Global, 2015, 2015, s. 102-117 [cit. 2021-11-16]. Advances in Educational Technologies and Instructional Design. ISBN 9781466666153. Dostupné z: doi:10.4018/978-1-4666-6615-3.ch007
  11. KREVIČ, Nataša. Katalyzátor změn vyučování?: Inovace v hodnocení žáků. Pro vzdělávání: Školské poradenské zařízení a zařízení pro další vzdělávání pedagogických pracovníků [online]. Praha: Národní ústav pro vzdělávání, 2019 [cit. 2021-11-16]. Dostupné z: http://provzdelavani.nuv.cz/clanky/ze-zahranici/katalyzator-zmen-vyucovani-inovace-v-hodnoceni-za
  12. CRUESS, Richard L., Sylvia R. CRUESS a Yvonne STEINERT. Amending Miller’s Pyramid to Include Professional Identity Formation. Academic Medicine [online]. 2016, 91(2), 180-185 [cit. 2021-11-16]. ISSN 1040-2446. Dostupné z: doi:10.1097/ACM.0000000000000913
  13. Definitions: Workplace-based assessment (WPBA). Assessment Department [online]. London: The Royal College of Pathologists, 2019 [cit. 2021-11-16]. Dostupné z: https://www.rcpath.org/trainees/assessment/workplace-based-assessment-wpba.html
  14. PRAKASH, Jyoti, K CHATTERJEE, K SRIVASTAVA, VS CHAUHAN a R SHARMA. Workplace based assessment: A review of available tools and their relevance. Industrial Psychiatry Journal [online]. 2020, 29(2) [cit. 2021-11-16]. ISSN 0972-6748. Dostupné z: doi:10.4103/ipj.ipj_225_20
  15. KLENOWSKI, Val, Sue ASKEW a Eileen CARNELL. Portfolios for learning, assessment and professional development in higher education. Assessment & Evaluation in Higher Education [online]. 2006, 31(3), 267-286 [cit. 2021-11-16]. ISSN 0260-2938. Dostupné z: doi:10.1080/02602930500352816
  16. SEIFERT, Kelvin. Advantages and disadvantages. Educational Psychology [online]. OpenStax CNX, 2011, s. 318-320 [cit. 2021-10-11]. Dostupné z: https://www.opentextbooks.org.hk/ditatopic/6468
  17. HERMAN, Joan L. a Stephen A. ZUNIGA. Assessment: Portfolio Assessment. Education Encyclopedia [online]. [cit. 2021-11-16]. Dostupné z: https://education.stateuniversity.com/pages/1769/Assessment-PORTFOLIO-ASSESSMENT.html
  18. STITT-BERGH, Monica a Yao HILL. What is a portfolio?: Using Portfolios in Program Assessment. Assessment and Curriculum Support Center: Learning outcomes assessment for improvement [online]. Manoa, Hawaii: University of Hawaiʻi at Mānoa [cit. 2021-11-16]. Dostupné z: https://manoa.hawaii.edu/assessment/resources/using-portfolios-in-program-assessment/
  19. DRIESSEN, Erik, Jan VAN TARTWIJK, Cees VAN DER VLEUTEN a Val WASS. Portfolios in medical education: why do they meet with mixed success? A systematic review. Medical Education [online]. 2007, 41(12), 1224-1233 [cit. 2021-11-16]. ISSN 03080110. Dostupné z: doi:10.1111/j.1365-2923.2007.02944.x
  20. VLEUTEN, Cees van der. OSCEs by Cees van der Vleuten. Maastricht University [online]. 2019 [cit. 2021-11-13]. Dostupné z: https://www.maastrichtuniversity.nl/news-events/newsletters/article/+5u+DZKHLUQtFBjwefD8Tg
  21. The Limits of Standardized Tests for Diagnosing and Assisting Student Learning. FairTest [online]. Jamaica Plain: National Center for Fair & Open Testing, 2007 [cit. 2021-11-16]. Dostupné z: https://fairtest.org/limits-standardized-tests-diagnosing-and-assisting
  22. FRANZ, Riffert. The Use and Misuse of Standardized Testing: A Whiteheadian Point of View. Interchange [online]. Salzburg: University of Salzburg, 2005, 36(1-2), 231-252 [cit. 2021-10-12]. ISSN 0826-4805. Dostupné z: doi:10.1007/s10780-005-2360-0