How to scale grades to take into account how strong the students are?

Question

I am interested in different methods for scaling grades from a class test, say. In this case there is only one test and this gives the whole class grade. I remember from years ago a method that took into account the relative strength of the students taking the class by looking at their grades in other classes. However I can't remember the details at all. The idea was that a hard test taken by strong students would be scaled up more than an easy test taken by weak students.

Is anyone familiar with this idea and how it might work mathematically?

In response to comments let me give an example. Say there is an optional class on Semiotics and Structuralism that the strongest students typically choose to take. The mean for this unit is 60% typically, say. Say there is also another optional class on how to spell the words in Harry Potter book one that the weakest students take. Say this gets the same mean of 60% typically. It would be interesting to know what formulas have been used to scale the former class so that the mean is higher than the latter.

Are you sure that the goal is to judge the students on some larger scale, or is it to assess whether they've learned what the course offers? "Education as filter", or "universities as status-gatekeepers" are a bit ugly, I think. Can you clarify your goals? — paul garrett, Jun 12 '18 at 00:56
@paulgarrett The structure of the degree requires that your grade for each optional class is comparable. That is ideally you can’t get higher grades just by choosing easier classes. That’s all the scaling is designed to address. — Simd, Jun 12 '18 at 04:46
In some sense this kind of comparison is routinely made, at least implicitly, when comparing CVs. Is it better to publish a lot of papers in mediocre journals or a few papers in strong journals? Is a transcript with solid grades in graduate level courses better than one with superlative grades in undergraduate clases? What is not typically done is to rescale grades within a class based on factors external to that class and its evaluation system. In most university systems such a scheme would be against the rules, and would certainly generate a sense of injustice in the students evaluated. — Dan Fox, Jun 12 '18 at 05:27
@DanFox Scaling is very common isn’t it? See https://academia.stackexchange.com/questions/8261/how-to-scale-or-curve-the-grades-for-an-examination?rq=1 for example. — Simd, Jun 12 '18 at 07:48
Handing out different grades for the same results sounds like a great idea. Your students will love it. — Glen Pierce, Jun 12 '18 at 14:04
@Anush: Such practices are mostly peculiar to the US, and mostly limited to forcing grades to conform to some preestablished distribution, which is different from rescaling based on factors completely external to the evaluation of the particular class, such as would be a given student's performance in a different class. For example, such a practice would be plainly against the rules (even the law) in most European countries (I can't speak of other areas simply because of ignorance, but I suspect it would be similarly viewed). Even in the US, such a rescaling would be quite unusual. — Dan Fox, Jun 12 '18 at 14:57
@DanFox I think there must have been crossed wires. I am talking about scaling all the grades for a class according to some formula. Not scaling individual students differently from each. That would certainly be very bad. — Simd, Jun 12 '18 at 17:32
@GlenPierce I am sorry if I have been unclear. As an example, say there is a class on the theory of advanced semiotics that always attracts the most talented students. Say the mean grade is typically 60% and there is another class that always attracts weaker students but also has the same mean. The idea is to scale the students from the first class to something so that the mean is more than 60%. You don't look at each individual student and scale them separately which would certainly be bad. — Simd, Jun 12 '18 at 17:34
Your two question halves are totally separate; are you talking about scaling tests or class grades? — Azor Ahai -him-, Jun 12 '18 at 19:02
@AzorAhai They are the same thing for the purposes of my question. That is the mark you get from the class test at the end of the class is your grade for the class. — Simd, Jun 12 '18 at 19:16
@Anush Then you should clarify that; most classes don't have only one test that is the whole grade for the quarter. — Azor Ahai -him-, Jun 12 '18 at 19:28
So if I take the 9:00am class with the nerds I'll get a worse grade than the 3:00pm class with the jocks? I'm REALLY confused how this isn't giving different grades for the exact same coursework. — Glen Pierce, Jun 13 '18 at 14:28
@GlenPierce Let's say the hard class that attracts the strongest students is called Semiotics and Structuralism. In my university it would only be offered once per year. If the class mean is 60% but all the students who take it (assume there are >50 say) typically get 80+ in other classes then this may indicate the class was harder than typical. Maybe a fairer mean for the class is closer to 80% to reflect their true ability. — Simd, Jun 13 '18 at 14:35
@GlenPierce If you take an easy class full of weak students but the mean mark was 60%, than it may be fair to expect a strong student to get much more than 60% in that class. If the scaling were done right, the final mark the strong student gets, will be the same as it was had they taken the class with lots of strong students. That is it gets scaled up in one case, and down in the other, but maybe (hopefully) to the same mark. This is clearly an inexact game but as I say elsewhere, this sort of scaling goes on informally all the time. — Simd, Jun 13 '18 at 14:38
I don't see how this helps anyone and it seems to hurt students who are interested in complex subjects but didn't come from a background that supports them. This policy seems like it will further structural racism. Don't do it. — Glen Pierce, Jun 13 '18 at 14:53
@GlenPierce It just helps people get the marks they deserve, if done right. In other words its the direct opposite of what you say. For example, take your hypothetical student who is interested in a complex subject but didn't come from a background that supports them. Without the scaling they can be punished for taking the hard class if the mean is set to the mean over all classes despite its typically being taken by very strong students on average. — Simd, Jun 13 '18 at 14:57
How about letting students know the level of mastery they need to achieve to earn a particular grade, giving exams that fairly assess the level of mastery, and using them to assign a grade? — Scott Seidman, Jun 13 '18 at 15:55
@ScottSeidman That makes sense but doesn't fully address the issue. As I mentioned elsewhere, if the mean grade for a class is much higher than the mean grades for other classes, some explanation for this will need to be given to the department. A typical explanation that I have seen is that the students who took it were really strong on average. Evidence for this is the grades those students got on other classes. At least where I know, just stating that you as professor designed everything perfectly doesn't wash typically. — Simd, Jun 13 '18 at 16:00
Recommended: New York Times, 2016: "Why We Should Stop Grading Students on a Curve". https://www.nytimes.com/2016/09/11/opinion/sunday/why-we-should-stop-grading-students-on-a-curve.html — Daniel R. Collins, Jun 13 '18 at 18:12
@DanielR.Collins That is interesting but not directly relevant. The setup I am discussing is quite different from grading on a curve. Everyone could get 100% for example. In fact in some sense, it is the (really an) opposite. — Simd, Jun 13 '18 at 18:16

JWH2006 · Answer 1 · 2018-06-12T12:02:11.690

6

are you asking about just converting everything into standardized scores?

In which case, you take the mean of relative performance (for example, if you are teaching a senior level course, you might take the major gpa distribution rather than overall gpa distribution) assign everyone a z-score based on their position in the calculated normal distribution.

you do the same for your test, standardize the score. For example, if your test average is a 50 with an SD of 10, a person who got a 70 has a z of 2 while a 45 has a z of -.5.

Now if you are planning on basing their recorded grade based on expected performance you are probably opening up a serious can of worms. For example, a student who consistently scores in the z score range of 1.8-2.1 and then only scores at a 1.5 is given an 75 (assuming mean 80 score, and 10 sd) while a student who consistently scores at a -.5 to -.8 range and scores a zero is given an 85 on the test.

If this is your plan, I would seriously rethink it. Not only will it upset students, it penalizes high performing students and rewards low performers.

edited Jun 12 '18 at 12:02

answered Jun 11 '18 at 14:41

JWH2006

3,753
2
11
21

Thank you. I am not sure I understand " it penalizes high performing students and rewards low performers" though. If strong students choose a particular hard optional class, the idea is that their grades would be scaled up. So this benefits strong students. – Simd Jun 11 '18 at 14:45
3

@Anush Then you risk being unfair in the other direction. Just because students did poorly in the past, and they did well on this test, it doesn't necessarily mean the test was easy. Especially if the sample size is small. – Jun 11 '18 at 16:09
@user37208 You definitely need at least 50 students. But all scaling comes with risks of unfairness I think. – Simd Jun 11 '18 at 16:13
2

I think you are best served by piloting your course for a semester and then build up a norm for your tests. The reality is, we as academics are trained in our field, not necessarily in writing a good instrument to measure student understanding and knowledge in that field. I firmly believe that if you plan to put this plan into action, that it might be worth reaching out to the psychometricians in your university's college of education. They will be more able to properly guide you to putting your plan into action with fidelity and high probability of good results. – JWH2006 Jun 11 '18 at 18:11
Thanks @JWH2006. To be honest it was more out of intellectual interest as I know they used to do this in some major universities I studied at, but I can't remember what the exact formula was. – Simd Jun 11 '18 at 19:29

score 4 · Answer 2 · answered Jun 13 '18 at 06:20

4

I strongly recommend to look at this idea from a legal standpoint. At least in Germany, we would just not be allowed to do that. The result in one test may never affect the grading in an other course, even if they are closely related (e.g. I'm teaching Intro to CS and Programming for first semester students, both courses are highly interwoven with each other, sometimes even sharing or exchanging time slots, but the results are completely independent from each otherm which sometimes result in students which have to re-take only one of the courses next year).

An other thought: Students might be interested and high performers in topic A, but dislike topic B. If a good grade in A would affect the grading in B, this would be just unfair.

The only thing you can do (in my opinion) is, to give all of them high grades if they deserve it. But you will need your own mental model of which performance is required for each grade. You should define this prior to the exams and change it only if you made serious misassumptions.

This might have an interesting effect btw: The level in your course will lower over time, since "optimizing" students are recognizing, that all peaople in your corse are getting good grades and so they start taking it ;-). But usually you can demonstrate the required performance level by some short tests on the first weeks or so.

answered Jun 13 '18 at 06:20

OBu

13,113
3
34
59

That's very interesting and slightly surprising. Something similar to what I have described is standard practice in a number of leading universities (although often carried out without an explicit formula). I wonder if this comes under the category of things that are both impossible and commonplace :) – Simd Jun 13 '18 at 07:52
2

In fact you are implicitely doing it. I'm teaching an elective course which is only taken by very strong students. The grades are usually either very good, or the students quit after a few weeks because the workload is too high. I'm inviting very good students to select this course because I want to have them in - so in fact I'm pre-selecting them based on prior performance which has a similar effect. But I'm defining in the beginning, what are the requirements for good / very good grades. – OBu Jun 13 '18 at 07:59
Yes and at some point, if your university is anything like the ones I know, you will be asked by the department to justify why your students get higher grades in this class than the mean over all classes. There is no way to do this other than showing data that your students are stronger than average by using data from other classes that they have taken. – Simd Jun 13 '18 at 08:02
1

This is the good thing about German universities - at least for now no-one cares ;-))). But I would respont that they should show me statistical evidence that the individual students are getting better grades in my class then they did in others. And if the effect occurs, that it is not because I'm teching better ;-). That should keep them busy ;-). – OBu Jun 13 '18 at 08:09
You make German universities sound great :) Your sensible choice of reply to ask them to give statistical evidence that the individual students are getting better grades in my class then they did in others touches on the same as issue as my question. In the end you have to look at the grades those students got in other classes, which is exactly what the formula I can't remember was meant to address. – Simd Jun 13 '18 at 08:11

score 2 · Answer 3 · answered Jun 12 '18 at 13:21

You need to re-use questions over several years

A small subset of well-vetted questions needs to be reused across years to help distinguish the difficulty of tests from the skill of students. Let's say the mean percentage getting these questions right is 60%. In a year with a strong cohort of students, more than 60% will get them right. If these same (strong) students perform poorly on the other questions, it's a sign that the other questions were too hard, and indicates that the scores should be curved.

NB: care must be taken to avoid these questions being "leaked" to other students. In our course (I teach in a large coordinated course) we have a pool of vetted questions that we shuffle every year to reduce the potential for leaking/cheating.

Thanks for this. I am really referring to a class that always attracts stronger students. — Simd, Jun 12 '18 at 18:24
@Anush ah -- sorry I see your edit and this doesn't address your clarified question — sessej, Jun 12 '18 at 18:35

Federico Poloni · Answer 4 · 2018-06-13T08:45:09.660

[added after the edit to the question] First of all, this is a phylosophical decision. In some countries / universities / models, votes are intended as an absolute measure (how well does this student know Calculus I with respect to an unwritten golden standard of mathematical expertise?). In others, they are a relative measure (how well does this student know Calculus I with respect to the rest of their class?). From what I understand, the 'relative' thinking is more common in the US, and the 'absolute' one in Europe. You need to decide which one you wish to go for. Both have their merits and their drawbacks.

As for "how to do it in practice", I'm definitely not familiar with it, but you might be referring to models like the polytomous Rasch model.

Basically the idea is that the result of a test is given by f(difficulty of the test, ability of the student) + error, and you can obtain the two independent variables by fitting.

This produces an estimate of the difficulty of each test and the ability of each student, which you can then reuse to rescale things at your will.

There is plenty of software to compute these scores, for instance R and Python packages.

score 1 · Answer 5 · answered Jun 13 '18 at 15:19

1

"Scaling" to a curve, and adjusting the grade to reflect your expectations of a student's performance are two different things. The first is a common practice, though not without some controversy, and the second is a confirmational bias put into practice, and should be avoided. Every student in a course is entitled to the same treatment.

answered Jun 13 '18 at 15:19

Scott Seidman

31,120
4
52
121

Your last sentence is entirely right of course. The question is, what do departments do in practice with a class where the mean is too high or low? Don't they look at the students who took the class and see how strong they are to work out what actions to take? I suppose if they scale to a curve this never happens. But then the department needs to worry that giving the same mean for every class is unfair on those taking the really hard classes. – Simd Jun 13 '18 at 15:20
1

@Anush -- no. You don't look at who the students are and figure out what grade to give them. If you give an exam, and the mean grade is 95/100, you may well choose to make that mean grade an A- or some such, but not based on who the students are. – Scott Seidman Jun 13 '18 at 15:23
I am not sure what the mapping from numbers to letter grades is, sorry. Is that like scaling the mean to the mean over all classes in the year? – Simd Jun 13 '18 at 15:24
At least in universities that I know, if the mean is too high for a class there is a departmental discussion about what to do. One argument for not scaling the mean down that is used is that really strong students took the class. I think you would argue that that shouldn't be taken into account but it is something that is commonly said. – Simd Jun 13 '18 at 15:26

How to scale grades to take into account how strong the students are?

5 Answers5