Quantifying Metacognition — Some Numeracy behind Self-Assessment Measures


Ed Nuhfer, Retired Professor of Geology and Director of Faculty Development and Director of Educational Assessment, enuhfer@earthlink.net, 208-241-5029

Early this year, Lauren Scharff directed us to what might be one of the most influential reports on quantification of metacognition, which is Kruger and Dunning’s 1999 “Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments.” In the 16 years that since elapsed, a popular belief sprung from that paper that became known as the “Dunning-Kruger effect.” Wikipedia describes the effect as a cognitive bias in which relatively unskilled individuals suffer from illusory superiority, mistakenly assessing their ability to be much higher than it really is. Wikipedia thus describes a true metacognitive handicap in a lack of ability to self-assess. I consider Kruger and Dunning (1999) as seminal because it represents what may be the first attempt to establish a way to quantify metacognitive self-assessment. Yet, as time passes, we always learn ways to improve on any good idea.

At first, quantifying the ability to self-assess seems simple. It appears that comparing a direct measure of confidence to perform taken through one instrument with a direct measure of demonstrated competence taken through another instrument should do the job nicely. For people skillful in self-assessment, the scores on both self-assessment and performance measures should be about equal. Seriously large differences can indicate underconfidence on one hand or “illusory superiority” on the other.

The Signal and the Noise

In practice, measuring self-assessment accuracy is not nearly so simple. The instruments of social science yield data consisting of the signal that expresses the relationship between our actual competency and our self-assessed feelings of competency and significant noise generated by our human error and inconsistency.

In analogy, consider signal as your favorite music on a radio station, the measuring instrument as your radio receiver, the noise as the static that intrudes on your favorite music, and the data as the actual sound mix of noise and signal that you hear. The radio signal may truly exist, but unless we construct suitable instruments to detect it, we will not be able to generate convincing evidence that the radio signal even exists. Failures can lead to the conclusions that metacognitive self-assessment is no better than random guessing.

Your personal metacognitive skill is analogous to an ability to tune to the clearest signal possible. In this case, you are “tuning in” to yourself—to your “internal radio station”—rather than tuning the instruments that can measure this signal externally. In developing self-assessment skill, you are working to attune your personal feelings of competence to reflect the clearest and most accurate self-assessment of your actual competence. Feedback from the instruments has value because they help us to see how well we have achieved the ability to self-assess accurately.

Instruments and the Data They Yield

General, global questions such as: “How would you rate your ability in math?” “How well can you express your ideas in writing?” or “How well do you understand science?” may prove to be crude, blunt self-assessment instruments. Instead of single general questions, more granular instruments like knowledge surveys that elicit multiple measures of specific information seem needed.

Because the true signal is harder to detect than often supposed, researchers need a critical mass of data to confirm the signal. Pressures to publish in academia can cause researchers to rush to publish results from small databases obtainable in a brief time rather than spending the time, sometimes years, needed to generate the database of sufficient size that can provide reproducible results.

Understanding Graphical Depictions of Data

Some graphical conventions that have become almost standard in the self-assessment literature depict ordered patterns from random noise. These patterns invite researchers to interpret the order as produced by the self-assessment signal. Graphing of nonsense data generated from random numbers in varied graphical formats can reveal what pure randomness looks like when depicted in any graphical convention. Knowing the patterns of randomness enables acquiring the numeracy needed to understand self-assessment measurements.

Some obvious questions I am anticipating follow: (1) How do I know if my instruments are capturing mainly noise or signal? (2) How can I tell when a database (either my own or one described in a peer-reviewed publication) is of sufficient size to be reproducible? (3) What are some alternatives to single global questions? (4) What kinds of graphs portray random noise as a legitimate self-assessment signal? (5) When I see a graph in a publication, how can I tell if it is mainly noise or mainly signal? (6) What kind of correlations are reasonable to expect between self-assessed competency and actual competency?

Are There Any Answers?

Getting some answers to these meaty questions requires more than a short blog post, but some help is just a click or two away. This blog directs readers to “Random Number Simulations Reveal How Random Noise Affects the Measurements and Graphical Portrayals of Self-Assessed Competency” (Numeracy, January 2016) with acknowledgments to my co-authors Christopher Cogan, Steven Fleisher, Eric Gaze and Karl Wirth for their infinite patience with me on this project. Numeracy is an open-source journal, and you can download the paper for free. Readers will likely see self-assessment literature in different ways way after reading the article.

About Ed Nuhfer

Ed Nuhfer received his PhD in geology from University of New Mexico, and served as a geologist and researcher in industry and government before starting an academic career. He held tenure as a full professor at four different universities, authored publications on environmental geology, sedimentary geology, geochemistry, petrology and geoscience education, served as a mentor for hundreds of geology and reclamation students. He served as a regional/national officer for the American Society for Surface Mining and Reclamation, the American Institute of Mining Engineers and as national editor for The American Institute of Professional Geologists from which he received three presidential certificates of merit and the John Galey Sr Pubic Service Award. His book, The Citizens' Guide to Geologic Hazards, won a Choice award for "outstanding academic books" from the Association of College and Research Libraries. While on sabbatical on 1988-1989 in Colorado, he discovered faculty development and returned to found one the first faculty development centers in Wisconsin. He subsequently served as Director of Faculty Development for University of Wisconsin at Platteville, University of Colorado at Denver, and Idaho State University, as Director of Faculty Development and Assessment of Student Learning at California State University Channel Islands, founded the one-week faculty development program "Boot Camp for Profs," which he directed for nearly twenty years, received the national Innovation Award Finalist and the Faculty Development Innovation Award, from POD and served in his last full-time job as Director of Educational Effectiveness at Humboldt State University "years beyond when I thought I would want to retire" before finally retiring in 2014. He has authored over a hundred publications in faculty development, and served as an invited presenter and featured speaker of workshops for The Geological Society of America, POD, AAC&U, WASC, Lilly Conferences, and as an invited presenter of workshops and keynotes on faculty development and assessment for many universities and conferences. He continues to work from as a writer and researcher, as a columnist for National Teaching and Learning Forum for which he has written Developers' Diary for over twelve years --a column based on the unique theme of using fractals and chaos as a key to understanding teaching and learning. Ed remains on as a member of the editorial review boards for several journals and publishers and is winding up a seven-year project with colleagues as principal investigator in developing and testing the Science Literacy Concept Inventory.

2 thoughts on “Quantifying Metacognition — Some Numeracy behind Self-Assessment Measures

  1. Roman Taraban

    Ed Nuhfer provides an insightful analysis of the potential disparity between students’ judgments of what they know and what they actually know. I take this to apply to judgments-of-learning (JOLs), judgments-of-memory, judgments-of-comprehension, etc. I especially appreciate Ed’s apt analogy to tuning in a radio signal through the ambient noise. The analogy immediately brings a couple of possibilities to mind. One possibility is that the person actually has the knowledge (as indicated on the more objective measure, like a test), but the method we use to ask the person to judge his knowledge is faulty, like collecting JOLs. If we used a more sensitive or appropriate measure, we would find that the person is a better judge of the extent of his knowledge. Ed addresses this possibility, in part, by suggesting that global JOLs will generate more noise than more specific JOLs.

    Another way to look at this is to say that the “unskilled” is not necessarily “unaware”, to pick up on the Kruger-Dunning effect, but that the signal itself is weak. I will use an analogy to fog. When I am driving through dense fog, I have more difficulty judging distances and objects, not because my visual perceptual system has gone bad, but because the physical representations that the visual system is processing are ambiguous. We can readily think of the neural representations that the person is judging in a JOL as a mental representation that is weak, ambiguous, shrouded in fog. The person’s ability to judge is just as good as it’s ever been, but the representations that are being judged are not providing a clear signal. In this sense, the unskilled-unaware suggestion loses much of its relevance.

    In a previous blog “Metacognitive Judgments of Knowing”, I showed how the discrepancy between students’ judgments and what they know was a function of how much they actually knew. Specifically, the higher their recall of the text that they studied, the less discrepant was their prior judgment of how much they would recall. Perhaps my fog analogy works here. That is, the less you know, the more difficult it is to make judgments about the mental representations you have. Therefore, the “unskilled” may need more guidance with study practices – i.e., with creating better mental representations, let’s say, through an elaboration strategy. Giving them practice with JOLs – i.e., with making better metacognitive judgments, might be less useful.

    In any case, Ed Nuhfer’s is quite useful in highlighting some of the difficulties in interpreting students’ judgments, and specifically, in trying to quantify metacognition.

  2. Ed Nuhfer

    Roman, thnk you for the great discussion.

    We are currently drafting a part 2 of the numeracy paper that deals with some explanations for the noise of self-assessment soon. Realize first that self-assessed competence is largely arising from the affective domain. The tests of competence tap largely the cognitive domain. Thus we are plotting graphs and making subtractions of cognitive versus affective measures. In physical sciences we do not add and subtract physical properties with different units. There is no ready physical science equivalent to what we are doing in measures of self-assessment accuracy.

    Examine “tip of the tongue phenomenon” on the web. That metacognitive phenomenon is verifiable and quite real. Thus, when we feel we know something, we really do know it even though we cannot recall it in the moment, and a test may register that we do not really know. In that case, the test is wrong, not the self-assessed feeling. The test measures only instant recall or understanding. Self-assessment recognizes that we may need to think a few hours or even days before we assemble the correct answer, but the feeling that we actually have the ability to accomplish the task is correct.

Leave a Reply