What is the Point of a Final Exam

2022-01-06 :: academia

It’s weird, but professors are almost never taught how to teach, how to design a course, how to assess students, how to design an exam or what the point of an exam even is. We’re just expected to pick this up on our own, I guess. It’s not as nonsensical as it sounds, since we are trained how to do research and communicate that research, and there is some overlap. But still.

If my experience is any indication, we just pick up an existing course structure and more or less follow that. Oh, the last person who taught this course used this material, and this syllabus, and these exams, so I’ll just do more or less that for now. If we’re ambitious and/or want to shoot our tenure track in the foot, we might try to innovate soon after. Otherwise, we might innovate later.

Anyway I’m not good at doing things just because that’s how they’ve always been done; I need first principles. After designing, administering, grading, invigilating several of them, I was struggling to figure out what the point of a final exam is. So I had a bunch of conversations on Twitter and now I’m collecting my thoughts on what the point of a final exam is and how it might, or might not, serve a purpose in my course.

Warning: I am not an education researcher, and this is not research, and it’s got a lot of stream-of-consciousness.

What is the point of a final?

Obviously, a final exam is a part of a course, so what is the point of a course? Presumably, to teach students. If only we lived in such a utopia.

I’d start by assuming the purpose of a university course is:

to teach students; and
to credential students.

I consider credentialing to be less important than teaching, but acknowledge that there is a use in credentialing, so I should do it and do it well.

Credentialing

A final exam can very easily accomplish credentialing: you put everything the students should have learned in the course on the final exam, and you measure how well they do, giving them a grade (credential).

But is a final exam the best way to accomplish credentialing? What goes into credentialing? I think the main principle is that we’d like the credential to be close to ground truth. A high mark should indicate high mastery, and vice versa. How do we know if an exam is measuring mastery well?

After a bunch of discussions, I think there are a few threats that could cause measurement error in any assessment:

cheating;
student performance error (e.g., stress, anxiety, illness);
design error (e.g., confusing instructions or questions, too little time to complete assessment, questions that don’t measure learning objectives);
attribution error (e.g., measuring a group assessment that doesn’t reflect an individual’s mastery);
sampling error (e.g., a grade on an assessment early in the semester does not actually indicate lack of mastery if the student learns later but has no opportunity to demonstrate improvement later; or, an assessment in one context may not be representative in another context, so another assessment in another context is useful).

I’m going to mostly ignore (3). It’s a research area of its own (e.g., Item Response Theory, Concept Inventory), for one, and I’m no expert in that area. But design errors in an exam are as likely (or unlikely) as in any other assessment, or perhaps less because course staff can spend more time polishing a single exam than multiple assessments.

Exams, and final exams, reduce (1) and (4) very well. Exams are typically very structured making cheating difficult compared to homework’s, projects, etc, and to properly measure a single individual. Take home or online exam don’t have this benefit, so must be designed to be more difficult, complex, or required invasive invigilation technology.

Final exams in particular mitigate (5), since they occur at the very end of the semester. Exams in general also provide an opportunity to revisit assessing several learning objectives in another, broader, perhaps integrated context.

Exams seem very susceptible to (2). Universities typically have policies in place to support accessibility and deal with illness, to help address some of (2). But many students find a single, highly weighted exam very stressful, and they may fail to perform on this one, very important assessment, despite having mastered material and demonstrated that mastery throughout the semester. On the other hand, some students find exams far less stressful, and far less time consuming, than many intermediate assessments throughout a semester. Students who find planning or long term focus challenging, or have chronic accessibility issues, might find an exam much easier than a project, for example.

So exams are pretty good for credentialing, in that they avoid sources of measurement error, as long as they’re well designed. But they’re very stressful. So if you need a credentialing assessment that reduce (1), (4), and (5), a final exam seems like a good choice, if you can find a way to reduce stress and anxiety. Maybe novel exam structures could do that; however, novel structures might cause more of (1) and (4), and be more complex to design and thus making (3) even trickier.

Some learning objectives might also be difficult to assess with other forms of assessment than exams. For example, in my project-based compilers course, some of my learning objectives don’t fit into the course-long “write a compiler” project. They are latent skills I’d like students to learn from building a compiler, but I can’t figure out how to explicitly measure that learning in the project. This could be a failure of my ability, but it suggests the structure of exams is simpler to employ.

Teaching

I consider credentialing less important than teaching, yet I covered it first because exams seem most related to credentialing.

Some of my colleagues pointed out that you can make the exam part of the teaching process. This is very counter to how I’ve seen exams.

An exam, particularly a final exam, can be an opportunity for students to reflect on all the material students have seen. They could be presented a new challenge or material that they should be able to learn from if they’ve mastered course material. They could also be given a chance for explicit transfer from one context to another, related to point (5) in the previous section about measuring skills in different contexts.

Given the stress caused by the assessment function of an exam, particularly a final exam, I’m not sure how useful this is compared to other methods of teaching. If you want to give students the opportunity to see something in a new context, why not multiple (smaller) projects, or multiple homeworks? Novel exam structures, or lower stakes can mitigate this, as discussed above, but come with design and measurement challenges. And at what point is a less stressful, novel structured “exam” really an exam, and not something else?

Another way the exam, and final exam, serve teaching is actually taking advantage of the high stakes of the exam. By giving students adequate time and material to review for the exam, they’re able to review all the material in the course in preparation for a high-stakes exam. This review is useful for learning, even if the exam itself is not.

This seems at odds with research on teaching as it relates to grades, though. I’d suggest that it’s the grade, not the exam, that causes students to review. Teaching More by Grading Less (or Differently) notes that grades motivate students perversely, lowering interest in learning but causing anxiety and interest in avoiding a bad grade. This suggests exams as motivator for reviewing material are not a good way to motivate learning.

Final exams in particular seem to next to useless for teaching since students also don’t get very much qualitative feedback. Even if they do, they have very little reason, method, or even time to review that feedback and apply it. So even if the exam is cleverly structured to enable learning in a new context, or lets students integrate disparate lessons, will they know whether they successfully integrated lessons or transfered skills to a new context? Even if the instructor spent the time to provide qualitative feedback (which, in my experience, is not what happens with a final exam), the incentives are against the student using it.

Is a final exam right for my course?

For Teaching

I’m not really convinced by the value of the exam for teaching, in itself. But, I could see two reasons:

your course doesn’t provide another final opportunity for students to integrate disparate lessons into a whole;
your course must introduce a variety of disparate skills that students have little opportunity to revisit, but that a final exam encourages them to review.

This requires mitigating stress causing performance problems on the exam, or destroying the intrinsic motivations for learning. I don’t think stress is a problem in itself, if it can be managed. I’m more concerned about replacing intrinsic motivation with extrinsic.

For Credentialing

Final exams seem very useful for courses with group work, or where detecting cheating is difficult. I’d hate to optimize for trying to catch cheating, but I think detecting it is necessary, particularly since credentialing is part of the goal of a university course. In-person exams are a much less invasive and less time-consuming mode of invigilating.

Similarly, with heavy group work, it is difficult to assess individual mastery. Alternatives include oral assessment or review of the group work. This is resource intensive, but also enables more qualitative feedback so could have additional value for teaching.

An Example

In my case, I’m trying to decide whether exams are right for my compilers course. The course is largely structured around a single semester-long project where the students implement a compiler (a big software project). The first two weeks milestones are completed individually, but the rest of the semester is completed in a group. Students who complete and contribute to the project can be reasonably assumed to have mastered a large portion of the material and most of the learning objectives. The project is not graded until the end of the semester, and students are provided considerable intermediate feedback, so have a lot of opportunity to learn from feedback, improve, and demonstrate improvement.

Cheating is a minor concern, but shirking is a bigger concern. A student could easily coast on their group.

I think an exam has some minor effect for discouraging shirking, and at least lets us catch it. I think performing oral code reviews and some form of survey of group contribution would be a more direct way of identifying this and providing more qualitative feedback. However, I don’t really have the resources for more than about 1 code review. We could perhaps implement it stochasticly.

An exam provides some opportunity for learning, in itself, but at a high cost. I can use it to have students apply the same material in a new context. However, in such a time constrained and high-stakes setting that I’m not convinced it is particularly effect or is worth the cost. The project itself is cumulative, and provides all the opportunity and material needed to review the course, so encouraging review isn’t necessary.

Compared to an exam, providing more, low-stakes opportunities to apply lessons in a new context seems like a better approach, since it mitigates (2) and provide more opportunity for feedback, learning from the feedback, and demonstrating learning. Making these exercises individual would address cheating and shirking, but might require them to be higher stakes, at least cumulatively. This would have a stronger effect for detecting shirking early. However, this would disadvantage students who find multiple assessments problematic. Such students are already at a disadvantage because of the time investment the project requires.

The exam does provide an opportunity to assess some learning objectives that I can’t figure out how to assess in the project. Perhaps I could integrate these into the aforementioned lower stakes exercises, or figure out how to integrate them into the project. Or perhaps these learning objectives aren’t really important, if I think the project is most important.

So it seems like the primary purpose the exam is serving in my course is credentialing. It helps us avoid attribution error, and to a lesser extent sampling error, at the cost of normal performance errors. It serves little, but some, learning function.

The learning function might be better served by multiple excesses outside the project. There’s a tension in using them: they need to be high stakes to serve the credentialing function and avoid sampling error, but low stakes to serve the learning functions. Sampling error is not a big problem in this course, so maybe I should favour low stakes exercises.

Attribution error is a problem, but stochastic code review and group work surveys might be a better way to address this than an exam.

If the exam is not in-person, the increased possibility of cheating negates some effect on attribution error, although code similarity detection tools make detecting cheating easier than detecting attribution error.

The exam is perhaps a simpler mechanism to employ than a combination of more small exercises and stochastic code review, but could result in more performance error.

Is an exam right? I don’t think it’s a wrong choice, but I think there are better choices particularly in favour of learning, and now I have a better idea of what the trade-offs are.