return to table of content

U-M finds students with alphabetically lower-ranked names receive lower grades

lqet
63 replies
1d3h

I work in academia. When we grade exams, the order of the exams on the stack is the order in which they were collected in the room (people can sit wherever they like). For grading, we are usually 5 people in a single room, and everyone grades a specific exercise for consistency. The exams are getting shuffled heavily, with everyone just grabbing stacks, looking for exams where "their" exercise was not yet graded, and taking them out. So basically, the order in which we grade exams can be considered random.

However, I also grade weekly exercise sheets during the semester, and these are committed into a repository, where each student has a folder that... begins with the first letter of their first name. Everyone I have ever worked with acknowledges that you have to shuffle the order in which you grade these submissions each week, for fairness. Several effects come into play: (1) your are usually less tired at the beginning, (2) your mood gets better during the last 2 sheets because you know you are done soon, (3, and crucially) at the beginning, you have not yet seen all the common errors / developed a "feeling" for them, and you might thus miss them in early submissions, but spot them immediately in later submissions.

Another alphabetic effect: In elementary school, my name was on top of the list of students in my class. I remember that I often had to do some special job simply because I was the first name on this list (for example, carry a group ticket when we visited some museum, keep track of something, be the first at something where nobody wanted to be the first, with everyone watching, be the first to be graded in PE, again with everyone watching, etc.). As a fairly shy kid, this already annoyed me in first grade.

cvwright
17 replies
1d3h

My strategy was to, like you said, grade problem by problem. Then for each problem, first find all those who got full marks. Then group the others into piles based on what mistakes they made.

This ensures that everyone who made the same mistake(s) gets the same grade. It also tends to shuffle the order of the exams after every problem.

Obviously you don’t need this strategy for simple multiple choice questions, and it’s probably also not a great fit for long-form essays. But it worked great for technical short answer problems in CS and security.

jcla1
8 replies
1d2h

This sounds like an organisational nightmare to be honest. You'd be going through the pile of exams multiple times (at least twice) and what do you do if there are multiple mistakes that are common in a single exam question?

Also: if you're sorting into "mistakes piles" for single exercises, how can you parallelise marking of separate and independent questions?

cvwright
6 replies
1d2h

Teach at a broke public university, and you never have to juggle huge teams of TAs.

kkylin
4 replies
1d1h

Even at top-notch universities (public or private), when I talk to retired faculty, grading almost always comes up as a reason they don't want to teach anymore.

[Edit: not disagreeing with your point.]

bobthepanda
3 replies
21h51m

Not only is it generally time intensive, you are also subject to lots of tiring back and forth with some students about their grades.

No grading is perfect, but there’s also some undercurrent of an attitude that students have paid to be there and are entitled to a certain grade.

mschuster91
2 replies
19h59m

No grading is perfect, but there’s also some undercurrent of an attitude that students have paid to be there and are entitled to a certain grade.

Given that students have taken on hundreds of thousands of dollars in debt that they'll have to repay no matter what and on top of that a lot of jobs being completely out of reach these days without an academic degree (that for fucks sake isn't remotely required by virtually all jobs requiring it!), that's completely understandable.

Want to fix higher education? Bring the hammer down on companies abusing it as a proxy for legally discriminating against classes of society that are closely correlated with poor academic outcomes. Academic education should be reserved for the best of the best of our youth, and it should be fully paid for by the government, not simply another hurdle to pass to get a job that pays barely more than flipping burgers.

bsder
0 replies
13h5m

Given that students have taken on hundreds of thousands of dollars in debt that they'll have to repay no matter what and on top of that a lot of jobs being completely out of reach these days without an academic degree (that for fucks sake isn't remotely required by virtually all jobs requiring it!), that's completely understandable.

Would that my students were this engaged before the exam. Guess which students show up the most often for office hours? ... yeah, the ones that are getting the best grades.

If my students spent half as much time learning the subject as arguing with me about grades, they would be getting a higher grade than the one they are arguing for.

bobthepanda
0 replies
16h49m

I think it is rational that students can feel entitled to that.

I also think that the vast majority of poorly paid, non-tenured professors and other teaching staff don't love being the targets of this harassment, since it's not their fault and largely out of their control, and it's not like they're getting the bulk of the tuition money. (That mostly goes to administrative expenses and sports programs.)

Heck, most adjunct faculty are often paid below minimum wage and qualify for food stamps.

jcla1
0 replies
1d

I do (I'm a mathematican). We are usually between 4 and 10 people marking an exam with anywhere between 50 and 600 participants.

kkylin
0 replies
1d1h

Online tools like Gradescope make this a little less painful (but still painful), but sometimes it's what's needed, especially on problems that are a little open-ended.

underdeserver
2 replies
1d1h

Sibling comment already said so, but I want to emphasize - this requires two run-throughs (at least).

When I was grading homework, it took about 5 hours a week per class per run through. They didn't pay me enough to make sense for it to be 10 hours.

raydev
1 replies
21h49m

A second pass wouldn't necessarily take the same amount of time, especially if you note the issues/concerns on your first pass.

underdeserver
0 replies
21h33m

True, but the overhead is large. I graded into linear algebra and intro calculus, so there were a lot of students - I think 150 or so - and most of them were wrong.

Graders know that wrong homework takes much longer than correct homework to grade. It's correct? Full marks, move on. Is it wrong? Well, how wrong is it? Did they make a bad assumption, but followed it through to its conclusion? Did they forget a minus sign? Or is it complete hogwash?

So it might not be 10 hours, but still would be around 8 hours. And that's still too much.

bobbiechen
1 replies
1d1h

When I was a TA at CMU, we used Gradescope https://www.gradescope.com/ for this. Every exam would be scanned and divided into problems (based on a predefined template - fixed page space for answers).

Then, each problem was assigned to a TA. Either there's a predefined rubric, or you create it as you go (-1 point for mistake X, half credit for mistake Y, etc.). There's a pretty slick interface where you just read the answer, and use keyboard shortcuts to apply the relevant deductions.

It still has the issue that every time you change the rubric, you'd need to go back and re-do previously-graded instances of that problem. But it was way faster and (equally important) less tiring.

Tijdreiziger
0 replies
18h38m

There’s also open-source software that does the same job at TU Delft: https://zesje.tudelft.nl/

(disclaimer: I briefly worked on the software for my bachelor’s thesis)

ska
0 replies
23h56m

For final exams, we use to mark across all sections of a course (so for 101 type courses, this can be hundreds to 1000s of papers).

Get all the profs and TA's together, break in to groups taking one problem or set of problems. Then you random sample (each group takes a stack) to get a feel for the 'typical' errors, once that's done - you are a machine going through the stacks.

Every once in a while (not that often) you run into a novel error or approach, and the group discusses.

nextos
0 replies
1d2h

My CS school implemented OCR test sheets, with some exceptions, and equivalent strategies, such as test suites and benchmarks for programming assignments. This was done to avoid subjective grading, as it was a big issue even in well-intentioned cases.

Often, you still get big problems, but the set of solutions is small. It's always three options plus a fourth option (none / all). If you make a mistake you score negative points. It's not perfect, sometimes wording is ambiguous and it's unclear whether you need to tick the fourth catch-all option, but I found it better than the alternatives as it removes most arbitrariness from the process, but obviously has other issues.

Regular exams often had wildly different grading standards for the same course depending on the class, and thus on the professor who was correcting exams. This was really annoying.

anticensor
0 replies
12h55m

An even better strategy is to have the papers scanned by a double-sided scanner and graded by an AI grader.

spullara
9 replies
1d2h

Everything in this thread just randomizes who doesn't get graded fairly.

karaterobot
7 replies
1d1h

Is there a better solution? It's not for teachers to be perfect. Since that's not possible, it's not a solution.

jtriangle
3 replies
22h38m

No, the solution is for the scoring to be handled by software that doesn't exist yet. Some things have easy, objective measures of correctness. STEM is mostly this way. Others, your humanities et al, are fairly subjective.

You could probably cover most of this with an LLM, and access to a large body of graded material for a given course, provided said material was graded fairly. Generating that data would be time consuming, as, any given assignment would need to be graded by as many people as possible in order to find a fair average.

From there, it's simple comparison between your sample work and the presented work. We're probably a decade from this really being viable en masse, but, it no doubt will happen, and for better or worse we'll likely end up with EDUAAS (education as a service).

jacoblambda
2 replies
18h54m

LLMs are not going to be a solution. LLMs have absolutely no concept of truth.

And not everything has an objective solution. Even those that do often have a process associated with them and factoring in that work/process is an important part of grading. Reducing that subjective grading process to only objective solutions being right is grossly reductive and disproportionately punishes students who have the process right and understand the material but make small errors. That's exactly what you don't want to do.

---

Instead the solution is to make sure each assignment gets multiple eyes on it and in a random order. Then to document biases and trends in biases so that the TAs and professors can be aware of them and mitigate them.

It's a process problem that can only be solved by a process solution. Replacing the graders with technology or reducing problems to a binary right/wrong will never ever solve this and in many cases will end up being more harmful than the biases they claim to solve would be.

coredog64
1 replies
18h16m

The LLM can compile verbose prose down to a short summary. If the summaries of each chunk are consistent, then it’s at least structurally well written. Then you grade the summary itself.

brewdad
0 replies
16h20m

At that point you are grading the work of the LLM, not the student.

thaumasiotes
0 replies
1d

Yes, you can grade objectively.

stevage
0 replies
18h59m

Automatically unskew the results after grading based on this finding?

jacoblambda
0 replies
19h10m

Probably it would be something like as follows:

Have a group of N graders. And a parity of k. Let's say N is 6 and k is 2. Randomly shuffle the assignments and partition the assignments into N groups.

Each grader gets assigned k of the N groups such that they share at most 1 overlap with any other grader and each group is assigned to k people. The assignment orders are shuffled for each grader. They mark up and then grade the assignments.

Then for each of the N groups, randomly shuffle the group and equally distribute the assignments to the N-k graders.

Now each grader reviews the assignment grades/markups (in random order) and assigns a grade based on the k grades/markups from the previous rounds along with a rationale for the grade assigned.

From there the student receives the final assigned grade, the rationale for the grade, and the k markups. If they have a complaint they can go to the professor (who then can also see the k initial grades along with everything else) to dispute the grade for the assignment.

---

This way each TA only has to mark up (class size * k / N) assignments, and review (class size / N) assignments to assign a final grade (which should take far less time to do than the initial markups). On top of that every assignment has a guaranteed (k + 1) separate eyes on it. And then the professors can serve as an unbiased arbiter while retaining all the context from the process.

To take it an additional step further, the professors could sample a random subset of the assignments to verify the markup and grading is going properly.

And those reviews/grade adjustments can then be recorded (along with the final grade/rationales) to document how a given TA's grading deviates from the final reviewed grade or the grade the professor assigns. Likewise for a TA's final assigned grade deviating from the professor's. This would allow deviations to be mitigated over time and major deviations to be identified.

jibe
0 replies
1d2h

For a single assignment, yes. But at least randomization might mitigate the effect across a term.

madeofpalk
6 replies
1d2h

you have to shuffle the order in which you grade these submissions each week, for fairness

I don't think this is fair. It's just a more randomly distributed unfairness, rather than by a deterministic factor (like the student's name)

'Fair' would be each student is assessed independently for the work they did, rather than their mark being impacted by how early or late they were marked.

shepherdjerred
2 replies
1d1h

It would be essentially impossible to have something "truly" fair for open-ended questions since humans are stateful.

Maybe this is a case that AI could actually do quite well.

Manually grade the answers and identify the classes of mistakes. Then hand the classes of mistakes to the AI and ask for it to determine which answers have which types of mistakes.

Once you've done that, you just need to associate a deduction for each type of mistake and do some simple math.

vagrantJin
1 replies
20h28m

what do you mean AI? you must be joking.

shepherdjerred
0 replies
19h59m

Imagine a question: compare bubble sort and quick sort algorithm.

Some students might mix up the algorithms, some might give the incorrect computation complexity, some might describe them incorrectly in some way.

Manually grade some (or all of) the answers by noting the kinds of things students got wrong (e.g. the above criteria). Then, feed in to ChatGPT (or your favorite alternative) the answer + the categories of mistakes to expect.

Here's a simplified example: https://chat.openai.com/share/bf801e12-51d5-4255-9968-bbf91b...

luplex
0 replies
1d2h

There are many notions of "fairness", many of which are logically incompatible with each other.

In this example, I think it's kind of fair to give everyone an equal chance of being advantaged. You're not hurting anyone specifically.

jcparkyn
0 replies
17h48m

I think an important difference is that when you shuffle them, the unfairness stops being correlated across multiple assignments, so the "aggregate" unfairness over the course of the semester is much lower.

gqcwwjtg
0 replies
1d1h

Is that distinction worth making here? There’s no way to “assess independently” the work of each student without some amount of randomness. But I think that’s okay, because isn’t randomly distributed unfairness just… fairness?

donatj
5 replies
1d2h

In around the year 2000 I had an essay due that day I had forgotten, and about ten minutes of computer lab time before home room in the morning. I wrote an introduction and conclusion; then filled the remainder with copy pasted chunks of the introduction and conclusion. The thought being at least I’d get a laugh. If anyone had read the thing it would have been clear it was nonsense.

I received an 80% with no notes or markup.

I have been left wondering for the last 25 years how much student work is actually even reviewed.

I work in EdTech and every time we add a feature that requires manual teacher review of student work you will see that some teachers are VERY diligent while others never touch it.

jtriangle
3 replies
22h45m

I know a guy who copy/pasted a wikipedia article, in line citations and all, and submitted it for a sociology class and got an A, no notes, nothing.

mixmastamyk
2 replies
20h33m

He “only cheated himself.” :-D

wolverine876
1 replies
17h51m

The point is to develop skills and knowledge, so I would agree. Do you disagree?

mixmastamyk
0 replies
17h21m

I agree, but we used to cringe at this saying when young, so funny to bring it back now.

filipezf
0 replies
1d1h

There was this numerical calculus class at Uni where the teacher forbid us to use the calculator. So I just programmed the integral on it, got the partial steps, and just wrote random numbers to fill the the substeps. Got full grade :D The other case everybody got to pass the class, but after vacation we found the stack of exams completely untouched under a desk. The teacher had a side business to run...

V__
4 replies
1d3h

A teacher friend of mine always goes through his stack twice. Once to correct all mistakes and a second time to write down points. As you said, once you have seen all mistakes you know how "bad" of a mistake it actually is.

smogcutter
3 replies
1d2h

As you said, once you have seen all mistakes you know how "bad" of a mistake it actually is.

Crucially, this is not quite what the poster said. It’s not about stack ranking students against each other.

Say every paper makes the same subtle mistake, and you only notice it halfway through the pile. Unless you go back through them all, you’ll unfairly grade the later entries more harshly.

Zancarius
1 replies
1d2h

It’s not about stack ranking students against each other.

It's not, but it sort of has that effect, albeit indirectly, and definitely unfairly.

smogcutter
0 replies
13h56m

I think we’re talking about the same thing, but to clarify my meaning:

If you weigh the severity of students mistakes (or successes for that matter) in relation to each other rather than to an objective rubric, you’re effectively stack ranking them whether you mean to or not.

kkylin
0 replies
1d1h

I'm not a big fan of putting everything in the cloud, but one of the advantages of online grading systems is that it is easier to make this kind of adjustment. The workflow goes like this: make a rubric item for a specific kind of mistake (it takes a little experience to know which mistakes are likely one-off and which ones are likely to be repeated by other students), assign X points, and later if you decide there are worse mistakes, adjust the points and that gets applied to everyone.

starttoaster
3 replies
17h2m

This might come off rude on accident, but I mean genuinely without malice. When I'm writing an essay to submit to my professor/teacher, I am asked to make multiple drafts to get a proper end result that is ready to submit. Understanding that educational staff is already often overworked, should I expect _less_ from the person I receive my education from? If you acknowledge that many of the grades I receive are actually not fair to me, and there's an attempt to randomize the order that papers are graded, many of the grades that I received (whether high or low) were done partially (that is to say, the opposite of "impartially".) And there's a real concern that in your example where the submissions are committed to a repository that you need to shuffle, that my submission ends up in a similar position in the stack week after week, unless you're actually doing something to ensure my position in the stack is different between submissions. It's probably sufficient in many cases but doesn't guarantee randomness unless the algorithm to randomize submissions takes previous stack orderings into account.

jamiek88
1 replies
16h57m

It’s simply human nature. Teachers can either lie to themselves and you about it or mitigate it. What more could you possible want from them as humans?

starttoaster
0 replies
16h54m

I somewhat assumed there would be commenters suggesting the human angle as a retort. That's why I prefaced with both "this is what the teacher expects of me" and "understanding that educational staff is already often overworked." It just seems to me that the current systems aren't sufficient, and acknowledging that is what leads people to improving those systems. The above commentor suggested what they do in academia as workarounds to what the study showed, and I'm saying even that is not sufficient.

It seems like you're agreeing with me, but jumping to their defense with "people are fallible." People are fallible, that's why we build systems to take human elements out of it. Recognizing where humanity has soured something is key to that.

advael
0 replies
15h23m

I know it's not the point of your post, but I think it's worth pointing out that you're misunderstanding randomness (albeit in a very typical way). Although randomness is likely eventually (over a lot of instances) going to be the most "fair" way to distribute where your submission is in the order, it does not guarantee that it will always be different, and in fact a "random" algorithm that took previous orderings into account would be provably less random than one that didn't

It's also worth noting that randomization in a context like this is inherently an imperfect solution to a problem that generally can't be solved perfectly. If we find out that weird ordering biases exist, I think randomization is done on the assumption that many we don't know about could also exist, that there's no clear way to mitigate them completely, and then randomizing the order per-instance is just the best we can do to ensure it's fair (Which, again, won't be perfect. Perfect isn't available)

dheera
2 replies
1d2h

When I saw the title I would have thought that the higher concentrations of Asian names starting with V, W, X, Y, Z would have led to higher grades at that end of the alphabet, and thought that effect would have eclipsed anything else.

pks016
0 replies
1d1h

Anecdotally, the course I grade has this effect (just looking at the average score). I have been grading this course from last 5 years(9-10 times). Last names with L-Z score slightly more than A-L.

lupire
0 replies
1d1h

Indian names start with A,B, N. Chinese names also start with, C, F, L.

yeahwhatever10
1 replies
1d2h

When I was a TA I always did a second pass to make sure everything was even. It’s not that hard.

eks391
0 replies
1d

It's hard when you are the only TA for 260 students who get 3 assignments per week, you must also hold free hours and you aren't allowed to go over 27 hrs each week so the school isnt breaking federal laws.

bandrami
1 replies
1d

We tried a lot of things. What eventually worked was ending grades. You mastered the material or you did not; perhaps a couple of students mastered it with high marks.

Obvs this takes an administration that is OK with that, which most aren't.

dev_tty01
0 replies
20h23m

Having hired a lot of engineers, I can tell you that mastery of material is nothing close to a bimodal distribution.

xorvoid
0 replies
23h47m

We graded similarly, incidentally, when I was at U-of-M (lol). I don’t think we ever sorted by name so I don’t know if we’d have a bias effect by name unless it’s an implicit bias towards lexicographical esthetics. I won’t deny that grading fatigue can have subjective effects. I always thought we did a pretty fair and objective job. I taught Computer Architecture and we we developed answer keys and grading scales before grading a single test. Of course assigning partial credit always ended up being pretty subjective. Typically though people would error in the same ways and so those would be subjectively identical. I never thought names factored into this much but, to be fair, no one ever collected data…

Finally, I guess I’ll admit that I’m probably very biased because my initials as A.B. and I’ve always gotten excellent grades, so… maybe maybe maybe

ripjaygn
0 replies
1d

While this helps the students with names lower down the order, people who are graded later still suffer.

pjdesno
0 replies
1d2h

There are all sorts of good ways to avoid these biases. I use the same practice described above for paper exams, and grading order for eg question 2 may be affected by score on question 1, but it won’t be affected by name or ID number.

If you use Canvas or Gradescope with the default settings, it’s almost impossible to avoid this sort of bias.

Worse yet, in Gradescooe you’re strongly steered towards grading with a fixed “rubric” with specific points off for each of N pre-defined errors, allowing grading to be done by TAs with little more knowledge than the students themselves, resulting in scores which have little relationship to the quality of the student answer.

gonzo41
0 replies
1d1h

Have you ever thought about just passing out a set of grades on random to random individuals and see how that shakes out. Like totally random and unjustified grades. D minus for an A+ student. A+ for fails etc. Just random chaos. Then just score the final correctly and see the effect?

Or just having a Kafkaesque pass fail grade with no feedback for each student relative to their own performance over time with an expected growth rate applied?

euroderf
0 replies
4h25m

For grading essay assignments, and possibly also essay-style exams:

It is important to get a feel for the collective level of writing before grading essays individually, and it is important to avoid over-grading or under-grading essays at the beginning or end of a stream of papers. Therefore I did a three-stage grading process, with three colors of pens:

The first pass, with a red pen, is marking up single-point problems like misspellings and glaring usage errors. Also of course the general level of writing begins to seep in. This pass of course includes all papers, and it can pass fairly quickly.

The second pass, with a green pen, mostly just marks in the margin where a (good) point is made or a conclusion is reached. This is to prepare for the next pass. Again, all papers are done in this pass.

The third pass, in blue pen, is where the quality of the writing is assessed and critiqued. Maybe some short notes in the margins, maybe just comments at the end of the essay.

When students get their papers back, there are some chuckles (or whatevs) when students see all the pretty colors. But after I explain the method and its rationale, the method is clear and understood (and also appreciated?).

Fnoord
0 replies
1d

[..] As a fairly shy kid, this already annoyed me in first grade.

(I suppose the cons outweighted the cons.)

Did you perceive any pros?

I suppose one way to do grades is first read through all papers to get an idea of the levels of the students. Though you still have bias/nepotism and such then. Perhaps a teamwork or commitee would work, or teachers swapping classes/schools?

I had a French teacher on high school who dropped a pen on list of students and then where it landed that person would get rehearsal. People in mid (waves) were fried.

Plus, there is also the issue of certain last names being common in certain cultures, leading to skewed statistics.

zdw
36 replies
1d4h

As someone whose initials are Z and W, I tend to notice alpha sort a lot. Asking a friend whose initials are A and B about this, it's not something they ever noticed.

I haven't noticed a grading/ranking difference, but far more frequently I'll hear that "oh, we ran out of item/time/etc. before we got to you", which has made me much more sensitive to issues of planning/organization.

zeroonetwothree
18 replies
1d3h

Outside of school I can’t think of even a single instance of alphabetical sorting of my name (I have a middle letter). What situations are you in that this comes up a lot?

IshKebab
7 replies
1d2h

Yeah I don't think it really happens outside school, but school is pretty formative and it happens all the time in school.

wryoak
6 replies
1d1h

It happens in your phone contacts when you’re deciding who to talk to. You’re starting with your Abrahams, Billys and Changs, probably rarely reaching out to your Xaviers, Yusufs and Zeldas about going out tonight because you’ve already assembled a crew by the time you reach the Mimis, Natashas and Ottos.

IshKebab
2 replies
1d1h

I don't think many people use their phone contact list like that.

godelski
1 replies
1d1h

I wouldn't be surprised. It's very natural. Probably not for that specific use case but if for some reason you are actually going through the list then it's natural

smeej
0 replies
19h27m

Plus, it's common for me to meet someone on a first-name basis and not find out their last name right away. And people's last names change more often than their first names. Phone sorting by first name is the way to go.

thaumasiotes
1 replies
1d

Do none of your friends like or dislike any of your other friends?

wryoak
0 replies
14h29m

Probably but that’s not how they’re organized in my contacts. It’s a list not a graph.

fsckboy
0 replies
21h14m

just want to add that in my lifetime that switched from being "by last name" to "by first name". So, Yusuf Ahmed and Abraham Zigfeld experienced a noticeable shift in popularity that they were totally unprepared for

wcunning
1 replies
1d1h

My daily standup is run by the order my boss sees the participants in the JIRA board -- My first name starts with W, so I'm last in that list. Makes staying engaged the whole meeting hard...

macintux
0 replies
21h25m

I'm the first in the list, which has some advantages, but I do get tired of always being the first person to throw themselves on whatever grenade is lying around.

libria
1 replies
1d1h

Probably every single health or wellness "Find a Provider" portal lists them A-Z. That's a multi-billion dollar industry. If I was Dr. Zachary Zane, I'd change my name.

sitkack
0 replies
8h41m

AAA Aches and Ailment

kaashif
1 replies
1d2h

I have a middle letter and also don't remember this happening much.

We should ask people with later letters if they remember this more.

godelski
0 replies
1d1h

I'm a near last letter surname. It's not uncommon for arbitrary things to be sorted by name, but a ton of official things use surname ordering. There's things that also I tend to seem to be last on where I don't know the sorting method, but I suspect it isn't uncommon for someone to just throw in a sort somewhere (though it's also common to see people do things in a LIFO so disadvantage people who get shit done on time... My apartment renewal does that...). I also remember getting a PCR test in covid where they binned by last name.

I can just say I do remember being last in a lot of arbitrary and official things and seeing other friends just get done with it faster and have to waste less time sitting and waiting.

zo1
0 replies
23h52m

As the other poster said, the order of standup and other such things. Having a "Z" means that you're usually last, and sometimes people make a point of "hey let's do it in reverse today" where I end up being first.

I remember when working on joint tasks, by the time it got to me, most of the people that worked with me had already given their updates and details. So when it was my turn, I'd say "same as A, B, C", cause they'd given all the juicy details.

Other than that, it's pretty straightforward and boring. The world doesn't magically function differently for us.

talsperre
0 replies
13h56m

The order in JIRA boards during the daily standups comes to mind. I am sure there are similar examples in other domains that are not software related.

smeej
0 replies
1d2h

My mom made the critical mistake of marrying from first five letters down to last five letters during the police academy, only later to be released from the "we have to expose you to tear gas so you know how it feels and only use it judiciously" chamber in alphabetical order by last name.

It was 40ish years ago and I still don't think she's forgiven my dad.

hu3
0 replies
1d2h

Company Discord of a client. My name is among the top.

It's a remote job so, being frequently visible in that list can be an advantage.

StevenXC
7 replies
1d3h

Like most inequities, those who are in the benefiting group frequently don't realize that privilege.

sdwr
5 replies
1d2h

They realize (bring form to, make real) them, but don't realize (understand) them

godelski
2 replies
1d1h

This reminds me of cliques. I give them the definition: insight everyone can recite but nobody can act upon.

DangitBobby
1 replies
23h23m

I think you mean clichés.

godelski
0 replies
17h12m

I do. Gotta live swipe and homophones

wryoak
1 replies
1d2h

I hate how much I love this worthlessly picky comment

dustingetz
0 replies
1d3h

I for one am glad that I was not born a mosquito, the odds are not in our favor!

washadjeffmad
1 replies
1d2h

Similar initials, frequently last in line, and same.

I wonder if this was the kiln of my patience and acceptance, or if people who road rage and get frustrated with waits are more likely to have earlier lettered names?

ambrose2
0 replies
17h50m

I really cannot stand sitting in a car in traffic and my last name starts with an A, interesting!

CamelCaseName
0 replies
1d1h

Do you have something to say about this? I'm confused, why did I read this wikipedia page?

winwang
0 replies
1d

(just doing roll call here with initals WW)

underlipton
0 replies
23h28m

This seems like a good example (free of cultural baggage) of how people with privilege often don't notice that they're receiving that privilege. What seemed normal and fair to your friend turned out to be an advantage that they didn't even consider.

arp242
0 replies
23h42m

When I was a kid marbles were the big thing, and if you were playing with them in class the teacher would put it in a big glass jar. When it was full he would call out the kids and each would get a handful.

I was last in the alphabet; this was already an issue with books we had to read; you could choose which book to read, but it was always in alphabetical order. When it was my turn there were just a few left, and certainly all the popular high-demand ones were gone.

Anyway, when it finally was my turn to get my marbles he was all out. When I asked "where's my marbles?" he just shrugged and said "all out". I must've been about 7. Lots of crying ensued and I think I got some marbles from other kids, but it wasn't about the marbles – not really.

I still don't understand how anyone can expect any different result...

andoma
0 replies
1d2h

This reminds me of a funny event when in fourth of fifth grate. When the class was supposed to stand in line we always had to sort based on last name. My last name started with Ö (Last letter in alphabet in the Nordics) so I always ended up last. Then one time, the teacher said something like "Let's reverse the order today, but wait, we also sort on the first name". My first name starts with an A so I ended up last in line anyway, much to the joy of everyone :)

RheingoldRiver
0 replies
21h28m

Asking a friend whose initials are A and B about this, it's not something they ever noticed.

Kinda surprised, my last name starts with C and I was hyper-aware of this and how random it was probably all the way from kindergarten. Being a child and therefore an asshole, I was grateful for my advantage rather than thinking the system was unjust.

hilux
33 replies
1d4h

Wang noted that for a small group of graders (about 5%) that grade from Z to A, the grade gap flips as expected

This is critical. Otherwise we could not discount some group (e.g. some ethnicity) disproportionately occupying one end of the alphabet or another.

Super interesting and important finding. I hope this gets wide visibility and universities take a break from politicking to fix the problem - presumably through enforced randomizing.

buggy6257
31 replies
1d4h

Enforced randomization isn't going to fix the problem, it just evenly distributes the problem.

Based on these results, it would mean that the graders are just getting tired/lazy/inattentive the further they get in their stack of papers to grade. That's the problem the needs to be fixed, not the order they get graded in. Enforced randomization is simply a short term alleviation so no student(s) get disproportionately affected by this phenomenon.

bluGill
7 replies
1d4h

it would mean that the graders are just getting tired/lazy/inattentive the further they get in their stack of papers

Or maybe they are getting better / more picky.

I know in code reviews I often pass a few and then notice something that I realize was also wrong in previous reviews I allowed, but later reviews that day (week?) will not allow that.

13of40
2 replies
1d3h

I've participated in day-long and multi-day interview events for job candidates, and I see the same effect. At the beginning you don't have a frame of reference and you're more likely to question your own decision or give someone the benefit of the doubt, but by the end you're far more systematic, plus a little bit numb to the effect your decision is having.

throwaway35777
0 replies
1d2h

by the end you're far more systematic, plus a little bit numb to the effect your decision is having

Maybe decision fatigue is supposed to bias humans toward the optimal solution for the fiancee problem [1].

[1] https://en.m.wikipedia.org/wiki/Secretary_problem

cyanydeez
0 replies
1d3h

For grading, you could probably just add a mediating factor and throw in test cases that calibrate the factor and then you curve everyone on that factor.

It'd seemingly be more work but would result in averages that are more reasonable to the changes in stress.

labcomputer
1 replies
1d3h

Yes, and:

Additionally, universities (and, by extension departments) want grades to approximately follow a normal distribution (and yes, you in the back, their actions show they do actually want that, even if they say otherwise).

When you start grading a problem you have some idea what a "good" solution looks like, what an "ok" solution looks like, and same for "bad" solutions... If you award points based on that, the result will be a normal-ish distribution. But your idea of a good/ok/bad solution evolves as you see more papers.

There's two reasons for that:

First, you can't (ahead of time) imagine all the ways that students will invent to fuck up a problem set, and find edge cases in your grading rubric that result in unfairly-high or -low scores. As you gain experience teaching, you will anticipate more of the ways, but you will never anticipate every way.

Second, the TA/grader wants to be able to stack-rank the papers and have the scores be monotonic. The grader wants this because non-monotonic scoring triggers far more complaining than harsh scoring or picky scoring. When you come across papers that are worse than ones you've already recently graded, you assign even lower scores.

This results in a ratcheting effect with more extreme scores as you get closer to the bottom of the pile. But, since the mean score is usually a B/B-/C+ (~75-85), and since scores are usually limited to the range 0-100, this means that papers closer to the bottom will receive statistically lower scores.

Now, you could go back a re-grade ones you've already done, but:

1. The university is officially only paying you for 20hrs/week (and requires a signed end-of-semester statement attesting to the same).

2. The assigned workload of teaching and grading doesn't permit a two-pass grading scheme while keeping within 20 hours.

3. If you complain to the graduate ombudsman about the workload needing more than 20 hours, you won't have funding next semester (so you have a prisoner's dilemma among TAs who might want to grade more fairly).

4. If you're grading (say) a final exam for a frosh/soph class, you're probably in a room with 4-8 other graders late into the night. One effective way to make your coworkers hate you is to be that guy who always finishes grading his stack last, when everyone is worried about catching the last train/bus.

Basically, all the incentives are aligned to make this happen.

hilux
0 replies
15h14m

That's thought-provoking - thank you.

Essentially, unless it's an old exam where the universe of bad answers is already known, you need two passes - a discovery pass followed by the grading pass.

rjzzleep
0 replies
1d3h

I had the same conclusion. You learn things as you go, including things you don't like.

bigfudge
0 replies
1d3h

In my case, I have to make a conscious effort to remain consistently (in)tolerant of lazy writing. It’s hard to keep on reading between the lines and giving the benefit of the doubt.

throwaway35777
4 replies
1d3h

I was a grader once. I guarantee if someone gives a good answer they'll get full marks even near the bottom of the stack. For BS answers I'll admit I got less generous as the hours went on.

No one's getting hurt by this system if it's randomized. It's a matter of graders giving out partial credit for wrong answers which is discretionary. Rarely students are granted a small mercy. Seems OK.

dunham
1 replies
1d3h

I was one of many TAs for a large math class in college (pre-calc - think high school math for college students). For uniformity, the prof had the partial credit down to a science - specifying points for getting certain aspects of the problem. For the finals, a few TAs would be assigned to a given page, for uniformity.

The fascinating thing was that the distribution of grades was about the same every year.

And I had a math prof for analysis who would give negative points for BS answers. You could say “I need X but don’t know how to prove it” in the middle of a proof, but if you made up something that was incorrect, you’d get negative points.

hilux
0 replies
15h6m

Oh, that brings back memories! "For every epsilon, there is a delta ..."

bumby
1 replies
1d3h

For BS answers I'll admit I got less generous as the hours went on.

What do you think is the cause of this? Do you become more cynical (and less generous) because you’ve seen so many BS answers previously? Is it just that getting fatigued makes you less generous?

ihaveajob
0 replies
1d3h

When I was a TA in grad school, I noticed the same. Early on I thought some BS answers were at least kind of funny, and I gave them the benefit of the doubt, maybe giving more attention to the parts that were correct. After I saw similar answers later on, the novelty wore off and I was probably less amused, so the inclination to be lenient disappeared. Sometimes I went back to previous decisions if I remembered them, to be fair, but I don't think I always remembered since the volume could be high (grading 80 exams in a row is TEDIOUS).

BugsJustFindMe
4 replies
1d3h

Enforced randomization isn't going to fix the problem, it just evenly distributes the problem.

Evenly distributing the problem does fix the problem. Proportionality is what matters. Grading being arbitrary is fine if everyone is graded equally.

zeroonetwothree
1 replies
1d3h

Random order would still mean a few students in the class get unlucky and near the end the majority of the time. Although over the course of all classes it would tend to even out somewhat.

It’s certainly better than fixed order.

BugsJustFindMe
0 replies
1d3h

"randomization" is not the important part here. "evenly distributing" is. It is absolutely possible to reorder the sequence fairly such that your scenario doesn't occur. It could even to a human observer look randomized if you want. In a trivial example case where the effect were linear you could just switch the order back and forth, and on average every student would receive the same middle-of-group impact.

whiterknight
1 replies
1d3h

The mistake is assuming grades are an objective measurement, and not gamification to try to help you learn.

BugsJustFindMe
0 replies
1d3h

It's a common mistake. So common, in fact, that it has real practical impact on students at the edge who might not otherwise have failed or passed.

skhunted
3 replies
1d3h

For me I grade tests as follows. The stack is created as students turn in the test. I grade the first page in that order. The stack reverses for the second page. So on and so forth. I teach college math. I just cant imagine a system of grading done in alphabetical order.

falseprofit
1 replies
1d3h

Scanning and grading on a computer can alphabetize them.

skhunted
0 replies
1d2h

That makes sense. I haven't had people upload assignments for a long time. I'd forgotten that this was a thing.

kurthr
0 replies
1d3h

I also came here to say this. My only guess is that the alphabetization (by the "learning management system") to make filling the grades into a table "easier" for the computer or for the person handing out the results? Why is it "easier" if the system doesn't have to order them at all, or it could do so by student number (same issue as alphabetical order) or something random, which is the other (non default) option for the "learning management system".

I feel like only the most obsessive compulsive humans would have this issue (without computer "help"), as the last thing I wanted to do as a TA was to add another step of ordering all the papers before grading them. I also always reviewed the first few papers I graded after grading the rest to make sure I was being fair, because it was obvious to me that until I saw a representative distribution of answers I couldn't do fair grading/marking.

bumby
1 replies
1d3h

As the number of assignments grows, wouldn’t randomization help converge on the more accurate grades (in aggregate)?

falseprofit
0 replies
1d3h

It would help, but with only a couple dozen courses and most determined by a couple exams it’s not quite a large number.

WaitWaitWha
1 replies
1d3h

graders are just getting tired/lazy/inattentive the further they get in their stack of papers to grade.

I will admit to this. Initially, my patience and tolerance for errors is significantly higher than towards the end of the grading. By the second hour grading, I am not only mentally exhausted my tolerance is significantly lower.

I try to prevent this by creating very explicit grading rubric and I stick to it as much as possible.

ghaff
0 replies
1d3h

Clear rubrics are the thing where possible. They aren't everywhere though. I've been on conference committees and so many different factors come into play--including how late in the day it is. But, in that case, a bunch of people are rating and commenting and there's no strict order so it probably evens out to a reasonable degree.

hilux
0 replies
1d4h

In the real world, universities are never going to fix the problem of overworked and underpaid grad students getting tired.

furyofantares
0 replies
1d3h

It's a 0.6 gap from top to bottom out of a score of 100. Plus or minus a third of a percent from average. Pretty small effect. But it would add up (or, well, persist - it wouldn't get bigger) if it happens to you for every assignment for every class and that sucks.

If there's more than one assignment you can basically erase it by randomizing each separately.

If you really care beyond that then randomize for one assignment, flip it for the next, then randomize again for the next etc.

davrosthedalek
0 replies
1d3h

In my experience, it's not tired/lazy/inattentive, but resignation. You normally have some expectation what students will be able to solve. Typically, these expectations are set too high. That's very common, not only for me, but for pretty much anyone I know. So over the time of grading, one adjusts down the expectations and gives partial credit earlier, for example.

cyanydeez
0 replies
1d3h

Unfortunately, it's gonna be AI to the "rescue" and the problem is obfuscated.

andix
0 replies
1d3h

Even distribution would fix the problem. If grading has a subjective component, there will always be deviations from the "correct" grade. If those patterns are randomly distributed over all students, their grade averages will be comparable again.

freeopinion
0 replies
1d3h

My first thought was, "Who takes the time to sort before grading?" Computers change the world in such incredibly subtle ways. Of course, such subtleties exist without computers. This is just one case where computers make the subtleties more detectable.

tokai
8 replies
1d4h

One simple fix would be to make random order the default setting.

Fixed in the sense that the bias will be random. Presumably students graded last will still receive lower grades.

exe34
5 replies
1d3h

It should average out over their career at the university - whereas if the alphabetical order is kept, then they would be systematically penalised.

zeroonetwothree
4 replies
1d3h

It won’t average out perfectly. There will still be lucky and unlucky students.

Of course it’s better than a fixed order, and if it’s easy to switch then might as well. But we should keep thinking about how we can make it even better.

furyofantares
3 replies
1d3h

Since the effect looks very small, it looks to me like it's only a problem because it adds up if it happens for every assignment for every course. I don't think it needs to average out perfectly; it looks to me like you'd have to be astronomically lucky/unlucky for it to matter if each assignment is in random order.

zeroonetwothree
2 replies
1d3h

Some courses are only graded based on a small number of tests. I actually went to UM and a grade might be something like 30% midterm 60% final 10% homework (obviously different professors have different systems). In that case if you get unlucky just twice on the two tests you basically get the full penalty.

furyofantares
1 replies
1d3h

I'm not sure how much a +/- 0.3 (out of 100) deviation from average on a single course matters even if you end up dead first/last for both midterm and final in that example. I mean, it will matter sometimes. But it's (by far) not as big a deal as if it happens for all your courses.

Still, yes, you could flip the order from midterm to final instead of randomizing both and the effect goes to more like +/- 0.1 out of 100 for the luckiest and unluckiest.

gwern
0 replies
1d1h

Yes, that sort of mirror-sampling would reduce variance. The problem is, though, you need to know all the uses of randomness in order to properly counterbalance them, and these systems are already enough of a pain to use.

(For example, if you have two, you can simply swap: but what about other biases? like if it's broken in half to assign to 2 grades. Or what about if there are three exams? And what about balance across other courses? if you want to do variance-reduction and tricks like antithetic sampling, you need to know all this in order to structure it properly - get it wrong, and you may make things worse.)

So that's why simple random shuffling would be preferred. It allows total ignorance of all other uses (past present and future), handles all ordering biases, and can be done independent in parallel across arbitrary sets of courses/exams/grades/students.

tetha
0 replies
1d2h

There are however other factors involved in the grade, which have a higher impact on the grade. Like, understanding of the material and ability to present a solution. - E - I'm mostly saying that because a bunch of comments are jumping on this as a significant bias against some students.

From my experience as a tutor, yes, this bias exists. But it won't turn a horribly wrong or an excellently correct solution into anything else.

I eventually knew my strugglers and my excellers. I'd skim the excellers first, because if they messed up, something bad was going on. Then I'd go through the strugglers to see problems. And then I'd grade the rest first in whatever order I got the sheets, then the strugglers and then the excellers. I needed the baseline to see how bad the worst ones actually do. Some exercise sheets were an accidental adventure, I can tell you.

And writing it like that, it sounds totally callous and cold. But focusing on the lower third in the exercises and communicating their struggles to the TA and prof was very appreciated by everyone, especially those students. It makes sure to get the important fundamentals right.

kibwen
0 replies
1d4h

It would be less than ideal, but still an improvement over the current situation as long as the order is re-randomized for every assignment, because at least then you'd only be occasionally disadvantaged rather than consistently disadvantaged.

noodlesUK
8 replies
1d4h

At my university, almost all of our marking was pseudonymised. We were assigned a random candidate number at the beginning of each year, and that is what went on our important papers/exams. The less important coursework often didn’t bother with this, and used our student numbers instead, but the general idea was the same.

We didn’t put our names on any of our work other than our dissertation (and a few trivial assignments that didn’t impact overall marks). It wasn’t that hard to de-anonymise, but it meant that the system had a bit more integrity.

It’s a really straightforward system to implement and I don’t know why it isn’t done more frequently.

I also think that our VLE sorted assignments by time of submission rather than any identifier.

ghaff
2 replies
1d3h

University exams, this probably makes a lot of sense. After all, the exam is the exam and whether a student is well-spoken and actively participates in class shouldn't matter for an exam grade. I'm less convinced that blinded conference proposals are a good idea--an argument I've had with various people. If you know based on past experience that someone will almost certainly hit a home run, I'm less inclined to pick a random person without obvious qualifications for the same topic--although just picking friends of the committee can obviously go too far.

wongarsu
1 replies
1d1h

You could try to work around that by first grading all anonymized proposals, then grading all potential speakers without knowing their proposal. In the third round you deanonymize and look at the weighted average of the two grades. You probably still need some judgment calls because the combination of speaker and topic can be important. But the score would give you a good base to work of.

Maybe you could make it even more impartial by allowing conditional scores in the first two rounds. Like "Jim is a 6, but a 8 if his talk is about molecular biology" or "this Lessons Learnt talk is a 5, but if it's by X, Y or Z it's a 9"

ghaff
0 replies
1d1h

Yeah, but I'm not sure conference proposals by themselves actually have a lot of value given that, in many cases (ask me how I know), the presentations won't actually exist until week or two before the the event.

Certainly a talk by X that's totally unconnected from anything they're directly involved with has less value.

__MatrixMan__
1 replies
1d3h

I think I get better feedback when the teacher knows who I am. Grades are secondary.

ghaff
0 replies
1d2h

I'm not sure exam grades at the university level are really the place to get useful feedback beyond grades.

xhkkffbf
0 replies
1d3h

I think the point is that some automated systems like Canvas may hide the names, but they're still presented in alphabetical order. Pseudonyms don't help if you don't shuffle them.

trescenzi
0 replies
1d2h

Wouldn't a possible outcome here though be that it just randomly reduces grades instead of reducing them in a way that's related to the students? If the issue is the sorting the random candidate numbers would still be sorted. It solves the problem of bias related to the individual but it doesn't solve the problem of bias related to the way that the submissions are sorted.

A random identifier coupled with a random sort order seem like the way to go here.

omoikane
0 replies
1d2h

I had classes like that, where at the beginning of the quarter, each student gets assigned an username of the form "<course id> <three alpha characters>" and all participation is based on username from then on. Even though the usernames are seemingly random, certain usernames started gaining reputations on the class discussion forums, and students come to recognize some names.

But computer science courses tend to have very objective rubrics for grading, so I am not sure the anonymity mattered much.

llm_trw
8 replies
1d3h

I'm willing to take bets that in 15 years there will be a scandal about faked data by at least one of the researches in this paper.

It smell just like every other interesting psychology result that at best is a fluke.

verdverm
5 replies
1d3h

Unlikely. If you talk with anyone who's done grading, this will likely jive with our experience and make us data aware of the outcomes. Like anything, with grading you can get into a flow, and the more you process an assignment, the more answers you've seen and those can change how you grade future answers

zeroonetwothree
2 replies
1d3h

I really doubt you can notice a 0.6% discrepancy anecdotally. They only detected it in the study because of the massive amount of data they used.

Classic confirmation bias.

verdverm
1 replies
1d2h

Anecdotally, I would go back and adjust grades on individual problems from earlier in the stack.

I can very easily notice my own over strictness from early in the stack.

2cynykyl
0 replies
1d1h

For sure. I also find I have to update my rubric to give more/less part marks, which also requires going back. It takes about 10-15 papers grades before things settle down.

somenameforme
1 replies
1d3h

Not really taking a position on this one way or the other, but I would say that "this jives with my experience" is near to being a prerequisite for junk science. Somebody saying something controversial is going to be challenged -- confirming biases is precisely how you peddle junk.

For instance the Journal of Personality and Social Psychology [1] is a terrible journal, with a replication success rate in the 20% range. Yet it's ironically well regarded. Both can probably be explained by the exact same phenomena - go read their articles and reads like a stream of bias confirmations for those of a certain ideological orientation -- the same orientation that's clearly widely shared amongst social science researchers.

[1] - https://psycnet.apa.org/PsycARTICLES/journal/psp/126/2

verdverm
0 replies
1d2h

I absolutely observed my own biases and created techniques to mitigate... a few that come to mind

1. Grade problem by problem. This actually makes grading sooo much easier on your own mind

2. Take a second pass to look for outliers in consistency

3. When possible, craft problems that can be automatically graded for correctness. This leaves more time for commentary on the quality of the solution

(I taught computer science, which lends itself to some of this)

The harder bias to handle is the one you develop for students one way or another through the course of a semester or course. Perceived effort shifts grades

zeroonetwothree
0 replies
1d3h

I think it’s maybe less likely since this is looking at actual grades and not some kind of survey or experiment. But certainly it’s always a concern in social sciences until we get reproduction.

hilux
0 replies
1d3h

The result seems pretty intuitive to me. The test is easy to re-run, unless the data have been "lost," which is not mentioned.

Most importantly, none of the researchers is a psychologist or behavioral economist or any kind of "social scientist."

ryandrake
7 replies
1d3h

Maybe related, or maybe not, but I remember when I was in K-12 school back in the 80s and early 90s, they would always seat us physically in the class front-to-back by last name. So the kids with last names starting with A-D or so would always be in front, and the kids with last names starting with U-Z would be in the back. For every class. I remember this because many of my friends had last names "near" my last name since we were always in close proximity to each other. I vaguely remember, by the time we were in high school, there were definitely more high-achieving kids with A-D last names and definitely more of the troublemakers were U-Z. Was it caused by sitting in closer proximity to the teacher and getting more teacher attention? We'll never know because this wasn't an experiment and there wasn't a control group.

wongarsu
3 replies
1d2h

"students who sit closer are more likely to be high achievers" might also be the source of most of the stereotypes of people with glasses. It took me years to realize I'm mildly shortsighted, so the first half of school I chose seats in the front half of the classroom to make reading the blackboard easier. Many of my friends had glasses and preferred to sit up front because their glasses didn't fully correct their vision.

RheingoldRiver
1 replies
21h30m

In a somewhat reverse scenario, when I was in 4th grade (9 years old), I knew 100% that I was getting nearsighted, and I absolutely did NOT want glasses. Fortunately (debatable) we got to pick our seats so I always picked a seat in the very first row, where I could kinda-sorta-almost see what was written on the board if I squinted. And I was also way above my grade level so I was able to fake it pretty well for most of the year even when this started to fail me. My mom insisted on taking me to get my eyes checked about 2/3 of the way through the year and I couldn't fake my way through that, though, so I finally got glasses, but by that point I was used to sitting at the front of the room, so I choose front-of-room seats when possible for most of the rest of my schooling. There's probably some moral here but I don't know what it is.

smeej
0 replies
19h31m

I moved states and schools midway through 3rd grade and was seated alphabetically, in the back, for the first time in my life. The teachers in my previous school knew me to be a model student, so would sit me up front "to set an example."

My parents couldn't figure out for the life of them why I was suddenly struggling and thought I was having adjustment issues. I had taught myself to read when I was 3; how could I suddenly be having trouble keeping up?

It took longer to figure out because I was only nearsighted in one eye. I was tall for my grade, so as long as the person in front of me to the left was shorter than me or the teacher was writing high enough on the board, I was fine, because my left eye was fine. But when everything aligned just wrong, I was suddenly helpless, because my right eye could barely see clearly an arm's length from my face! It's a hard thing to notice when only one of your eyes isn't working very well, especially when you're 9.

Ekaros
0 replies
1d1h

I remember at that age that my sight was going worse quite quickly. So in process there will be many points where your glasses might be slightly lacking.

nsriv
1 replies
1d2h

I'm a teacher now, and this made me wince. It's exactly how I've been told by my parents that seating worked for them in school (India, 60s-80s) but their grading was done by semi-anonymous roll numbers.

user_7832
0 replies
1d2h

Today I'm 99% sure all CBSE board exams (I think equivalent to A-levels?) are randomized heavily. However I did notice the name's alphabetical order effect in school, albeit in a minor way (folks with later letters were less involved in anything a teacher might need a volunteer for).

mertd
0 replies
1d1h

Circular shift is the trivial solution. In my high school every row moved up on Mondays and the front row moved to back. Of course you could argue the ones who started at the front on week 1 still has an advantage but it's likely not that significant.

jedberg
6 replies
1d2h

This is basically the reason my kids have the last name that they do.

My last name starts with E and my wife's with Y. Bucking tradition, she didn't change her name when we got married, so when we had kids we had to decide what name to give them. We opted to hyphenate.

Historically, hyphenated last names were [Woman's last name]-[Man's last name]. However, my wife hated that her last name was near the end of the alphabet growing up.

We bucked tradition once again and put my name first, so that when sorted alphabetically they would be at the front of the list. Incidentally their first names start with A and B so that they show up at the front when sorted by first name too.

lelanthran
2 replies
1d1h

Bucking tradition, she didn't change her name when we got married,

Unless you were married earlier than the 90s, I wouldn't really call that "bucking tradition" any time from, say, the mid-90s onwards.

If you really want to buck tradition, then don't get married - just live together, and have kids :-)

(After all, there's nothing more traditional than marriage, is there?)

jedberg
1 replies
23h45m

In the US, 80% of women still take their husband’s last name.

But you hit on an important point — a lot of couples are just skipping marriage now.

We went halfway there — we bought the house together years before we got married.

zeroonetwothree
0 replies
22h51m

Owning a house together is probably a more serious commitment anyway

zvolsky
0 replies
1d1h

Haha, I've always enjoyed being at the end getting less attention from teachers. If the data merely shows a correlation, it may as well be explained by us at the end being under less pressure.

throw_pm23
0 replies
1d2h

Wow, you really gave your children a headstart there :)

mjh2539
0 replies
13h34m

In Latin American countries (and Spain) the paternal surname goes first, followed by the maternal surname.

dotnet00
6 replies
1d4h

Yep, I noticed this with myself too when I first did some grading a few months ago.

There was also the factor that the ones I graded initially did not make certain mistakes or answered in expected ways, such that when I did encounter unexpected answers/mistakes, I had to go back and rethink the grading on the papers I had graded previously. Eg if someone answered in a way that made me think an answer I considered incorrect was actually less wrong.

I only had to deal with a small class, so backtracking was doable and I graded the papers in whatever shuffled up order they were turned in, otherwise there would have definitely been a bias.

JadeNB
4 replies
1d2h

I only had to deal with a small class, so backtracking was doable and I graded the papers in whatever shuffled up order they were turned in, otherwise there would have definitely been a bias.

Grading papers in submission order just introduces a different bias, though.

(For what it's worth, I'm in the same boat and I do the same, because I don't trust my ability to give the papers any true random sorting by hand, so I take the very weak randomization that the submission order gives me.)

dotnet00
3 replies
1d2h

Introducing a slight bias factor that is randomized each time results in a lower average bias compared to a bias factor that is the same every time. Plus, as these weren't take-home assignments, I think someone finishing earlier is more likely to be either someone who was already going to score well, or someone who was already going to make the most common errors.

withinboredom
1 replies
1d1h

I take tests extremely quickly, I either know the answer or guess it from what I know. I don't think about it. I was usually one of the first people to turn in tests.

I was usually (almost always) the last person to turn in assignments, I like to be one of the last people out of a door or the last person in a line (I don't like crowds).

Grading by order-turned-in would almost always mean my assignment would be one of the first or last one's graded.

If I were to guess that if you did a frequency analysis of people to order, you'd find there were always a certain group who turned it in first, and another group that turned it in last.

brewdad
0 replies
15h41m

You need to find a classmate to be a chaos gremlin that randomly mixes up the pile when they drop off their assignments.

JadeNB
0 replies
1d1h

Introducing a slight bias factor that is randomized each time results in a lower average bias compared to a bias factor that is the same every time.

That's what I'm saying—it's reasonable to believe that the submission time is correlated with other factors, such as ability or confidence (though the effect can cut both ways, with extremely able students submitting early because they finish early or late because they are extra careful, and similarly for other factors). Thus, this isn't really randomization, just correlation with another factor than the name.

bee_rider
0 replies
1d3h

I especially noticed this when grading programming projects, because it is slightly complicated.

I’d either find that:

A bug was really common, got to re-evaluate after the first couple times I see it, apparently it is an easy mistake to make.

Or, I’d find a new bug that was pretty common, but which I didn’t know about at first. Got to update my tests and re-run everybody.

I tended to be really thorough and re-do the whole stack eventually, but it was a real pain. Could have half-assed it of course, but they spend weeks on these things, feel like I owe them honest feedback.

It would tend to lead me to “softer” grading as well, if you are lazy and only check for a couple bugs, you might take a large number of points off for each problem. Finding some problems and punishing them harshly is not very fair for those students that randomly hit the bugs you expect. If you find every bug, you can only take a couple points off per bug without tanking everybody’s score.

princeb
4 replies
1d4h

“We kind of suspect that fatigue is one of the major factors that is driving this effect, because when you’re working on something for a long period of time, you get tired and then you start to lose your attention and your cognitive abilities are dropping,” Pei said.

there is a similar effect found here https://en.wikipedia.org/wiki/Hungry_judge_effect

zeroonetwothree
2 replies
1d3h

The thing is, it’s unclear why that effect would make you give people lower grades. surely an equally reasonable guess is that less cognitive abilities could make you give higher grades because you don’t notice errors?

janci
0 replies
23h45m

Sometimes you see the result is wrong so you do not give any points initially and then look on the steps and try to find something that looks correct to give at least some points. The willingness to track through every step diminishes with increasing fatigue.

bee_rider
0 replies
1d3h

It depends on what you are doing and how you are grading. I’d try to not take many points off if an error is somehow “really easy to make,” but that depends on my ability to evaluate the difficulty of mistakes.

tokai
0 replies
1d3h

I believe the hungry judge effect has generally been accepted as false.

xmddmx
3 replies
1d3h

Is anyone confused by "lower-ranked names"? To me this means A, B, C, but the article says "Wang said students whose surnames start with A, B, C, D or E received a 0.3-point higher grade out of 100 possible points than compared with when they were graded randomly."

So I guess "alphabetically lower ranked" means the last letters of the alphabet, not first? Confused.

samatman
0 replies
1d3h

This is an important observation!

The programmer's perspective and the user's perspective aren't always the same, and both need consideration. A user is going to see a list: it starts at the top, and it ends at the bottom. The first fields are higher, the later fields are lower.

Of course, if this is a sorted list, the first field will be the "lowest" value, for whatever comparison is used to sort it.

pks016
0 replies
1d1h

Yes, while grading we divide the students by their last names.

ghaff
0 replies
1d2h

Yeah, I misunderstood this at first and then was somewhat confused by the comments until I actually clicked through and looked that the post. :-)

I can actually believe the effect going in either direction and it's small.

jimmar
3 replies
1d4h

Order effects are real. I'm a prof. I notice that the longer I grade, the less motivated I am to take off points and then justify why I took off those points. It's easier just to give points and move on. (And if anybody wants to criticize this, I'll be happy to launch into a diatribe on the psychometric dumpster fire that most assignments and their associated grading scales really are.)

zeroonetwothree
1 replies
1d3h

This is the opppsite of the effect they found. I do wonder if there is a big difference depending on grader and the study found some kind of average.

jimmar
0 replies
1d3h

The article mentions that the paper is under review, but I'm guessing the effect size is small and that individual differences between graders is very substantial. The article states:

The researchers collected available historical data of all programs, students and assignments on Canvas from the fall 2014 semester to the summer 2022 semester.

Thousands of students X 8 years X lots of assignments per year and you get a sample size so big that it would be hard not to find statistically significant effects.

dgacmu
0 replies
1d3h

Also prof: me too. I'm much more likely to provide comments on the first couple of exams I grade than on the later ones.

I've found that gradescope is helpful in this regard, because it at least forces every point assignment to be matched to a rubric item. I don't have data, but I believe it makes our grading a lot more uniform compared to the pre-gradescope days. (This might be easier in grading computer science exams than in more subjective areas, though.)

candrewlee14
3 replies
1d

Serious unintended consequences of ordering… Reminds me of the hungry judge effect [1] - judges tend to be more harsh before a break and more lenient after.

[1] https://en.m.wikipedia.org/wiki/Hungry_judge_effect

thaumasiotes
2 replies
1d

https://nautil.us/impossibly-hungry-judges-236688/

we should dismiss this finding, simply because it is impossible. When we interpret how impossibly large the effect size is, anyone with even a modest understanding of psychology should be able to conclude that it is impossible that this data pattern is caused by a psychological mechanism. As psychologists, we shouldn’t teach or cite this finding, nor use it in policy decisions as an example of psychological bias in decision making.
SamBam
1 replies
16h43m

Odd article. It simply states that the effect size is too big to be believable (it calls it repeatedly "impossible," but it doesn't seem like it can possibly mean "literally impossible" or "mathematically impossible.") It doesn't give any alternative explanations or specific ways the study is wrong. And it links to a rebuttal by the original authors where the responded to a bunch of the suggestions for data error or confounding factors and found that their results remain.

thaumasiotes
0 replies
16h24m

That is explained in pretty much the section I quoted. The explanation of the effect is given in the article's links.

But the article is written specifically to make the point that it should be enough to observe that it isn't possible for the effect to be real. You aren't making a good point when you cite an effect that is obviously nonsense.

stikit
2 replies
1d3h

A .3 point difference isn’t going to make a real difference to anyone’s life and is likely a wash when other yet undiscovered biases are in the mix. Unfairness and bias is a critical factor in driving people to extraordinary achievements.

wolverine876
0 replies
17h43m

Unfairness and bias is a critical factor in driving people to extraordinary achievements.

The evidence is a strong negative correlation between bias and achievement: Extraordinary achievements so disproportionately achieved by people in groups that are not the target of bias. Look at top government officials, SV leaders, Nobel Prize winners, etc etc - mostly white males.

The biggest targets of bias in the US, for example - probably women and black people - genrerally get the worst results (in areas where there is discrimination). By contrast, as an example wherever black people aren't subject to bias, such as certain forms of music and certain sports, achievement is extraordinary. Imagine all that talent and drive in other fields.

inemesitaffia
0 replies
9h19m

It stacks over time

redandblack
2 replies
1d2h

When I studied engineering in India, we never put our names in the finals at college. Every one gets a exam id and that goes in the answer sheets.

Also, it is never your professor who grades you - the answer sheets are collected and lecturers/professors will correct them at the state level across all the engineering colleges in my state.

I do not know how it is now as there has been an explosion of colleges in the state. But expect the standardized tests are similarly conducted.

user_7832
0 replies
1d2h

As far as I know even now it's the same for government universities (eg Delhi/Mumbai Uni). But private unis may just have a few/one profs grade everything.

kwhitefoot
0 replies
1d1h

A lot of bachelor's degrees these days are awarded on the basis of modules with no finals. For instance when I did a course on C# a few years ago in Norway that was worth 6 points (I got full marks :-) ). If I had done another 29 modules of similar difficulty I would have got 180 points and been awarded a BSc in Computer Science.

It's quite different from the way it was when I studied physics in the 1970s when only the final counted. Annual exams only determined whether one was allowed to continue but had no effect on the class of degree that was awarded.

klysm
2 replies
1d3h

Job interviews have similar effects

1-6
1 replies
1d3h

Order matters a lot but recruiters typically present the highest flyers first and the lower candidates last.

ghaff
0 replies
1d3h

In my experience, it varies. I've been on interview panels where we just weren't feeling it for a number of candidates and basically told the recruiter to try harder and eventually hit someone who we were "That's who we want. Find a way to make it happen."

dcposch
2 replies
15h12m

I bet this correlation goes away if you separate the data by ethnicity.

carabiner
1 replies
15h6m

Yeah Chen, Cho, and Cohen are up there and would bias results.

justrealist
0 replies
13h37m

Wang, Zhao, Xi.

zeroonetwothree
0 replies
1d3h

This looks like one of the classic studies that won’t reproduce. For one thing, the effect size is unreasonably large. 50% more positive words just because of sequence order would be so huge we should be able to notice it anecdotally.

1-6
2 replies
1d3h

Let’s just hope parents don’t try to game the system by starting to name their kids AAAi Aung.

nsenifty
0 replies
1d3h

I'm Indian (in the US) and I've noticed a vast majority of my Indian friends name their kids Aanav, Aanir or Aanvi etc. some of which aren't even words in any Indian language. Now I probably know why.

jen20
0 replies
1d3h

Fortunately Bobby is near the front of the alphabet anyway!

xyst
1 replies
1d3h

That 0.6 pt gap over multiple semesters is the difference between graduating with “summa cum laude” or “magma cum laude”

zeroonetwothree
0 replies
1d3h

It’s 0.6% so it would only be if you happened to drop a letter grade as a result. Like 90.5 -> 89.9. And that would have to happen multiple times to significantly affect your GPA.

retrac
1 replies
23h32m

Electoral ballots have often listed the candidates in alphabetic order, but some studies have suggested that it gives a small benefit, to the first person listed. [1] Many election authorities, in Canada at least, have shifted to randomizing the order in some way [2]. Some people have even played with alphabetic sort for novelty purposes; a man in Ontario changed his legal name to "Above Znoneofthe" so he would appear last on the ballot as "Znoneofthe, Above".

[1] https://electionlab.mit.edu/research/ballot-order-effects

[2] https://www.cbc.ca/news/canada/british-columbia/vancouver-do...

zeroonetwothree
0 replies
22h52m

In the US it’s usually randomised as well

nebulous1
1 replies
1d3h

I wonder why Helen Wang chose this as a research topic

jeegsy
0 replies
1d2h

Well spotted!

zeroonetwothree
0 replies
1d3h

If anything the difference only being 0.6% seems pretty impressive for the brain.

danilor
1 replies
1d3h

Has anybody found this link to this study? Or even the title?

I searched the authors in google scholar but I couldn't find it.

analog31
1 replies
1d3h

I propose one of the following:

1. Keep the present system of grading by alphabetical order

2. Record the order in which the papers are actually graded

When the grading is done, the teacher assigns a point scale (A = 90, B = 80 or whatever) but the computer does a regression fit and removes the bias.

2cynykyl
0 replies
1d1h

This is a great idea! Next time I mark a stack of exams I will also note the time of day that the mark was entered. I can then cross-reference this with how long I have been sitting between breaks, since my last meal, etc, etc. Unfortunately I will not have this opportunity until mid-fall 2024.

yencabulator
0 replies
21h31m

It seems it would take less time for Instructure, Inc. (makers of the mentioned software) to fix this than it took do this research.

Anyone know whether this is happening, and if not why not?

underseacables
0 replies
1d4h

Anchoring?

stevage
0 replies
19h0m

Would it be possible to simply accept that this exists and automatically unskew the grades after marking?

shipmaster
0 replies
1d1h

My last name starts with a letter at the bottom of the alphabet. I notice this all the time. Anecdote from this year: My son is in a high school class that requires constant input from the teacher on long running projects they have. The teacher reviews the projects alphabetically by surname, about 40% of the time, the teacher never gets to the bottom of the class, and asks the students to find her after school if they have issues. But the nature of the projects definitely requires proactive comments from the teacher. I ask my son to go find the teacher regardless and get a pro-active review, but not all the kids do that, and hence the potential for a lower grade.

samatman
0 replies
1d3h

A computer-based system like this is an opportunity to remove all personal details from an assignment while grading it, it baffles me that this isn't the default.

The database could tag every assignment with a UUID4, and present them for grading top-to-bottom in UUID lexical order, without exposing who is being graded in any way.

You can't fix fatigue bias, but this would distribute it randomly. It also removes the opportunity for favoritism and hostility, subconscious or otherwise, which is probably more important.

Once grading is completed, the assignments are reconnected with students. Give the profs a way to mark assignments with metadata, sometimes they need to talk to a student personally about something, this should be made easy.

Grades can't be immutable, professors need discretion in that, but it would leave an audit trail if professors maliciously modified grades (or the opposite). That should be uncommon to begin with, but both professors and students benefit from an audit trail here.

A system like this should be used whenever it's practical, and always for high-stakes tests like midterms and finals. Not making a case against oral exams here, just that when it's possible to blind the grading process, it should be.

redandblack
0 replies
1d2h

The other benefit for being higher in the alpha order is you get the snow day calls first - 4:30 am, and get to call your friends before school calls them.

We were always woken up by my daughter screaming as here friends called her. No such luck for the post-pandemic kids.

prof-dr-ir
0 replies
1d2h

Randomizing the grading order just hides the problem at the level of an individual course, but at least it helps in the average.

More worrying is when e.g. job candidates are discussed (often in alphabetical order) and people simply tire out near the end of the meeting. When this happens, be sure to suggest taking a break!

pavlov
0 replies
1d1h

Clearly evidence of anti-Polish bias when all the Zbigniews and Zygmunts and Wojteks get lower grades. (Or just another example of correlation vs. causation in action)

p0w3n3d
0 replies
1d3h

Just do name coding. I doubt this happens everywhere on the world

mistrial9
0 replies
1d3h

current curricular trends in California include "algebra removed from 8th grade as unfair" (or more extreme rhetoric given) and this week "equity grading for K-8" where there is no D or F given in any subject. These real-life changes combined with something so arbitrary as this one as "news" really give an impression of a collapse of some kind in public education discourse.

markusde
0 replies
1d

I noticed this in myself last time I was as a TA. I'd go back and re-grade the first 15 assignments or so to make sure the rules were being applied consistently.

m3kw9
0 replies
1d3h

But all the wangs and Xiang and Zhu’s still getting high grades

levocardia
0 replies
1d2h

Wang said students whose surnames start with A, B, C, D or E received a 0.3-point higher grade out of 100 possible points than compared with when they were graded randomly. Likewise, students with later-in-the-alphabet surnames received a 0.3-point lower grade — creating a 0.6-point gap.

The hand-wringing over such a small effect size seems unwarranted. I suspect you would find similar effect sizes for other small interventions, like whether the grading took place during the week or the weekend, or in the morning vs. the evening.

largbae
0 replies
1d3h

What other popular systems might lead to different outcomes based on sort order? Dating site matches? Your own contact list?

Interesting category of problems...

jncfhnb
0 replies
1d3h

Most exam grading is not viewing the writing as a whole but rather looking for incidences of specific points to assign credit for. One could imagine an LLM be quite effective at labeling sentences as pertaining to a predefined idea at scale.

huffmsa
0 replies
1d2h

I had a theory in school that this was the case for presentations too so I always forced myself to go first. No one else to compare me against, and no sitting around getting jittery.

ghghgfdfgh
0 replies
18h49m

There's a section of one of the Diary of a Wimpy Kid books that talks about this exact thing. I was reminded of it as soon as I saw the headline. The justification is comes up with is that kids with names at the front of the alphabet sit in the front of the classroom, so they get called on and learn more. It definitely turned some gears in my brain when I first read it as a teen. Here's the relevant page: https://imgur.com/a/6wIx6qg

flawsofar
0 replies
1d2h

what’s weird is just how long it took to find a statistic like this one

faitswulff
0 replies
1d3h

I wonder if these biases are replicable in LLMs.

diogenescynic
0 replies
15h17m

It's the same with applying to jobs. The first applicants have a greater likelihood to get the job. If you're given a list of names... you're just generally more likely to pick something from the top of the list than the bottom.

corimaith
0 replies
23h54m

If we changed our policy of exams from discriminative to evaluative, grading bias would be a trivial issue but here we are since we just NEED ways to fit everyone into numbers that we can easily use.

cm2187
0 replies
1d3h

We know there are big disparities of academic success by ethnic group (cf the whole harvard discrimination against asians controversy), and there are also big concentrations of patronyms by ethnic groups (or at the minimum first letters that are more common in one part of the world than another). And on top of that if the university itself discriminates against certain ethnic groups in its recruitment it will reinforce this bias (like if asians students require better grades to get in, it is unsurprising those students that get in perform better than the rest).

That would be my best guess for a rationale behind that result.

beryilma
0 replies
1d2h

With huge grade inflation in US universities, all students are already getting better grades than they really deserve. The amount of gymnastics that professors do to pass all students is insane. So, no student is really receiving a lower grade.

TrianguloY
0 replies
1d1h

I also have the theory that having an app/software starting with A, B, or an "alphabetically first" letter was noticeable in the past. Nowadays things are usually sorted "algorithmically", but it was common for stores to list searches with some alphabetical score, which meant that those apps were usually shown first.

Even now, for example, if you go to Play Store and want to know the apps that you had but are not installed, the default sorting is by name.

TrianguloY
0 replies
1d1h

As a different but similar situation: I have a first name that is usually at the top when sorted alphabetically. Nowadays it's not a problem anymore, but as a kid I usually received a lot of calls from people that either misclicked or didn't know how to use a phone properly. It turned out it was because I was the first on the phonebook list.

StefanBatory
0 replies
1d2h

I have an surname that's alphabetically low. Even at uni amount time I went to class and came out empty-handed as my teacher didn't score my assignement on time (at my uni 90% we have oral discussion about it) and I have to come next week while others don't are way too high.

RecycledEle
0 replies
1d1h

I can explain why the kids with A names outperform the kids with Z names.

As someone whose first and last names are both very early in the alphabet, I was always called on first or second when I was in elementary school and middle school. I always had to be there early.

My friend whose name was very late in the alphabet learned he did not have to be ready for the first minute or two of class.

He would be standing near the door talking as I was quickly pulling out last night's homework, and I would be marked down for not being ready while he would later be commended for being ready when the teacher called his name.

As a teacher, I see that the kids who stand outside the door talking do not do as well as the kids who are there early.

COGlory
0 replies
1d3h

Multiple factors at play here.

1) Rubrics are often defined, but the application of the rubric is by a human. Application will shift as the grader gets a sense of the classes understanding.

2) As you get fatigued while grading, you'll make mistakes, and be less tolerant of others. Especially if you're an overworked adjunct or graduate student.

3) There are probably a lot more last names early in the alphabet so weighting is important.

My policy on this when I was a grad student was to publish the rubric, and ask all students to check their grades too.

Aldo_MX
0 replies
1d4h

Maybe the answer is smaller groups?

1shooner
0 replies
1d3h

This reminds me of an experience I had of just the opposite: tightly-controlled consistency in writing assessments:

Almost 20 years ago I worked for a standardized test essay grading service. We graded against all sorts of secondary-level rubrics (not AP, who do their own). These would usually be from 9 - 12 grade, from every US state, and evaluating everything from reading comprehension to subject matter-specific assessment. We'd do weeks long jobs of a single test (e.g. Alabama 9th grade reading proficiency). These usually had at least 3 dimensions, and at least 4 points per dimension. We would go through a week or more of training on a rubric, then another week of 'leveling', where a manager would occasionally bring you aside and talk through why that '3' you gave on a dimension should have been a '2'.

By the end of the training, we usually had had enough discussions and encountered enough edge cases to understand the weaknesses/inconsistencies in the rubric (which we had to abide by anyway). Once we were running at full-speed, everything was still double-graded and inconsistent scores were reviewed. Sometimes graders were pulled if they still didn't get the rubric.

It was a simultaneously stimulating and very boring job, and most readers were educators themselves. I wonder how long before it disappears completely.