> have to manually run the code in your head
If you're doing a good job, you have to do that anyway, or at least have enough of a spidey sense for broken code to know when to investigate and add an extra test case.
Something like 30% of the time at $WORK, interviewers report the candidate as having solved the problem when a closer inspection reveals UB, memory corruption, and other bullshit. The test cases pass, and I think that's part of the problem. You can't tune out and avoid deeply understanding the submission.
> If you're doing a good job
I think the problem is that the grader has to run your code in their head. That's a whole different problem.
I would imagine every professor (or assistant) who graded any programming exam I wrote was doing just that and further expect them to let some things slide as long as the general direction was correct.