58

Some tests have minimums in their possible score range. Cisco's 300-1000 point range, and the SAT's 200-800 point per section range come to mind.

What purpose does this serve? I assume there is some statistical logic behind it. Maybe it would make more sense to me if I understood how they go about calculating the score from a given number of (in)correct questions.

BowlesCR
  • 673
  • 5
  • 7
  • 32
    There's no statistical reason for it, as shifting the scores downward by 200 would result in the same variance and a mean shifted by 200. I've always assumed it was to spare the feelings of people who scored very low. –  Jan 06 '16 at 20:31
  • 9
    For example, a multiple-choice style test with 5 options for each question, a person with zero knowledge will get ~20% correct answers by pure chance, therefore it can make sense to set 20% as the minimum score, thus acknowledging that getting 20% answers in the test doesn't indicate a greater ability than somehow getting only 10% answers right. – Peteris Jan 07 '16 at 18:41
  • 4
    Some grading systems have a negative nonzero minimum score. – gerrit Jan 08 '16 at 11:23
  • 1
    I'm surprised to see that no one has challenged the question's assumption. I haven't taken SAT myself, but if this (http://www.snopes.com/college/exam/sat.asp) is true, then it is possible to get less than 200 points in SAT. The authorities just "don't report scores that are lower than 200." (It's a valid question to ask "why?", but that's a different, and likely a less interesting question.) – kfx Jan 12 '16 at 17:39
  • 1
    @gerrit: Neat link. I wonder why the numerical equivalents of the grades are 8, 7, 5, 1, -7, and -23. What an odd sequence. – anomaly Feb 19 '16 at 04:29

7 Answers7

61

According to the Encyclopedia of Research Design (page 629), it signals that these are interval variables, not ratio variables:

Standardized tests, including Intelligence Quotient (IQ), Scholastic Achievement Test (SAT), Graduate Record Examination (GRE), Graduate Management Admission Test (GMAT), and Miller Analogies Test (MAT) are also examples of an interval scale. For example, in the IQ scale, the difference between 150 and 160 is the same as that between 80 and 90. Similarly, the distance in the GRE scores between 350 and 400 is the same as the distance between 500 and 550.

Standardized tests are not based on a "true zero" point that represents the lack of intelligence. These standardized tests do not even have a zero point. The lowest possible score for these standardized tests is not zero. Because of the lack of a "true zero" point, standardized tests cannot make statements about the ratio of their scores. Those who have an IQ score of 150 are not twice as intelligent as those who have an IQ score of 75. Similarly, such a ratio cannot apply to other standardized tests including SAT, GRE, GMAT, or MAT.

Salkind, Neil J., ed. Encyclopedia of research design. Vol. 1. Sage, 2010.

ff524
  • 108,934
  • 49
  • 421
  • 474
  • 4
    I'm not sure this answers the question very well: it's certainly true that the scores are interval measurements, but it seems strange that the scores would be offset ONLY to to indicate that--it just seems oddly subtle. – Matt Jan 08 '16 at 01:12
  • 2
    @Matt I think what ff524 says is the relation between Celsius and Kelvin. Why is 0C != 0K but 0C = 273.15K? The answer is xC - yC = xK - yK for all x and y. And no other ratio would satisfy this equation. – padawan Jan 08 '16 at 02:04
  • The block quote seems authoritative and valuable and for me it is not clear enough to understand completely. Pointing out that the difference between 150 and 160 is the same as the difference between 80 and 90 does not illuminate anything to me because that's what is normally the case for scores that do start at 0. In short: I still don't get it. – Todd Wilcox Jan 08 '16 at 03:09
  • @ToddWilcox You're right: the first paragraph is also true of ratio scales (which do have a "true zero.") The second paragraph is true of interval scales but not true of ratio scales - that is, multiplication and division are valid operations for ratio scales but not valid for interval scales. Also see this wikibook. – ff524 Jan 08 '16 at 03:14
  • I did notice that but I don't understand the significance. Two questions come to mind: 1) Why use an interval scale instead of any other kind of scale (this seems to be the thrust of the original question)? And 2) Can an interval scale score be converted to a ratio scale score? If not, what does an interval scale score tell us in the first place? – Todd Wilcox Jan 08 '16 at 03:24
  • Ok I followed the comment link and I suggest this answer would be improved a lot by explaining some of what is in there. If I understand correctly, test scores that have no zero point can only tell you if you did better or worse than the others taking the test, and they can tell you how much better or worse you did, and they can tell you if you did better or worse than last time, but they do not tell you anything about how many questions you got right. Do I understand that correctly? – Todd Wilcox Jan 08 '16 at 03:36
  • @cagirici, sure, I don't disagree that the scores are meant to be interval (i.e., "Celsius") measurements, but I think that adding an offset to the scores (and hoping people notice) is an odd and subtle way of indicating that. Wouldn't it be easier to write "INTERVAL MEASUREMENT" or something somewhere in the guide to interpreting the scores? – Matt Jan 08 '16 at 06:28
  • @Todd Not really. The key idea is that the interval scale has no real reference point ("absolute zero "), and relationships involving division or multiplication (e.g. "twice as much") have no meaning without a real reference point. For example: is an 800 score twice as high as a 400 score? If I shift all the scores down by 200 (which I can do, because there's no true reference point), so those same scores become 600 and 200 - is 600 twice as good as 200? No to both. That's the idea. – ff524 Jan 08 '16 at 08:45
  • 4
    This doesn't answer the question at all. The question was not "what are the properties of an interval scale" but why does someone get 200 Points for handing in a blank paper (zero effort) ? – Falco Jan 08 '16 at 09:25
  • @Falco When you say "get 200 points" you assume that the reference is 0 (i.e. that you get 200 points more than 0.) That's not valid on an interval scale, where there's no absolute reference. You might also say that the 200 minimum score means you get 400 points for zero effort (400 more than -200) or 1 point for zero effort (1 more than 199.) All of those statements are meaningless on an interval scale. – ff524 Jan 08 '16 at 09:31
  • @ff524 exactly that makes the 200 completely arbitrary! Blank page could also be 5million points. But most people would probably assign intuitively 0 Points for zero effort, just because it feels natural. So why the arbitrary 200 ? Even 100 feels more natural than 200. – Falco Jan 08 '16 at 09:34
  • 2
    @Falco The reason we often start interval scales at a number that isn't zero is to signal that there's no true reference, and that people shouldn't apply operations that "feel natural" to this scale. Most people intuitively associate a zero with an absolute reference, as you know, which would be a wrong thing to do in this case. (The specific choice of non-zero number is not meaningful, but the fact that it's non-zero is a convention that serves as a deliberate signal.) – ff524 Jan 08 '16 at 09:38
  • You wrote "not really". Which one(s) of my statements is/are not correct? – Todd Wilcox Jan 08 '16 at 12:00
  • @ToddWilcox "they do not tell you anything about how many questions you got right." - It's possible for an exam score on any kind of scale, including ratio scale, to not tell you how many questions you got right. (e.g., if questions are not weighted uniformly.) But for a detailed explanation of different kinds of scales of measurement, try [stats.se] - it's probably out of scope of this answer. – ff524 Jan 08 '16 at 12:04
17

I might be able to help answer this from a background in Psychometrics. Where I work we produce many tests that are all standardised and then equated to be put onto the same scale. These scales however, from one test to another, are unrelateble, unless of course the two differing tests have an equating study completed to determine the shift factor to transfer a scale from say Test 1 to the scale of Test 2.

To construct a scale, we first analyse the test data, so student response data and item(question) data. We do the analysis using the Rasch Model, which only takes into account two variables, the students' abilities and the items' difficulties. This allows us to construct a dataset that contains the logit levels of the students' abilities and of the items' difficulties.

Definition of Logit:

A logit is a unit of measurement to report relative differences between candidate ability estimates and item difficulties. Logits are an equal interval level of measurement, which means that the distance between each point on the scale is equal (1-2=99-100).

Once the logit tables have been created they can be used to create a scale by applying a simple linear transformation, such as:

scale score = 10 * logit difficulty + 250

In some of the work I do we have scale scores that actually are below 0, however most of the work I do, scale scores are constructed such that the minimum is around 200 or so. The construction of the scale is for the most part entirely arbitrary.

If you wish to see how the logits of students and items are calculated please read:

https://en.wikipedia.org/wiki/Rasch_model#The_mathematical_form_of_the_Rasch_model_for_dichotomous_data

Also as an extra note: There are other models for doing test analysis, such as the 2PL (Introduces an additional parameter to Rasch Model(1PL), the items discrimination), the 3PL (Introduces an additional parameter to the 2PL, which is a guess factor, this creates a minimum probability of getting the item incorrect which depends on your guess value), there is also a 4PL which adds an additional parameter(the slip paremeter, that creates a ceiling probability, that is not 1, for getting an item correct).

I hope this helps and provides some extra information that may be of use.

TMP4
  • 186
  • 2
  • 1
    This seems the most helpful and plausible to me. Additionally, I dug this up on the SAT: https://sat.collegeboard.org/scores/how-sat-is-scored "We do a statistical analysis to make sure the test is an accurate representation of your skills... ...equating adjusts for slight differences in difficulty between test editions and ensures that a student's score ... on one edition of a test reflects the same ability ... on another edition of the test. Equating also ensures that a student's score does not depend on how well others did..." – BowlesCR Jan 08 '16 at 18:16
12

In addition to the reasons already mentioned: because we want a more natural scale for the answers: sometimes scores for an individual answer are on a scale 1-5 or 1-10, because it is more human-friendly than 0-4 or 0-9 (unless the human is a programmer). Adding individual scores up then results in a nonzero minumum.

Federico Poloni
  • 46,039
  • 18
  • 129
  • 194
  • 3
    Can you provide an example of a question where it makes sense to say "it is impossible to get it completely wrong" (which is what a scale of 1-5 seems to imply)? – Mike Ounsworth Jan 06 '16 at 22:19
  • @MikeOunsworth This sounds like a trick question. :) In my view, a 1-5 scale does not imply that it is impossible to get a question completely wrong. It's just an arbitrary range. – Federico Poloni Jan 06 '16 at 22:31
  • 4
    Understood. It just seems completely bizarre to get a non-zero score for handing in a blank page, nothing about that seems natural. – Mike Ounsworth Jan 06 '16 at 22:33
  • @MikeOunsworth Check https://en.wikipedia.org/wiki/Grading_systems_by_country for instance. Percentages are the most used system, I agree, but still there are many 1-based scales on the list. – Federico Poloni Jan 06 '16 at 22:44
  • 1
    Those scales are about cumulative grades. Your answer to this question is about grading individual answers on a non-zero scale. – Mike Ounsworth Jan 06 '16 at 22:47
  • @MikeOunsworth You are correct. I don't have similar data for individual answers, unfortunately. – Federico Poloni Jan 06 '16 at 22:55
  • For exams like the SAT, GRE, the sum of answer scores ("raw score") does start at zero. So at least for those exams, this isn't a likely reason. – ff524 Jan 07 '16 at 02:10
  • 4
    There are many, many questionnaires about subjective issues that employ Likert scales, which typically run from 1 to 5 or 7 or 10. Scores from multiple Likert scaled items are typically added up to give a total score, which necessarily is at least the number of items. Yes, this is not the SAT or GRE. – Stephan Kolassa Jan 07 '16 at 08:26
  • @FedericoPoloni you can't really compare grading systems to adding up points. Grading Systems are usually a fixed labeling for certain percentages (so more than 90% right will give you an A or a "1" or a "10" ) but in most countries you don't usually add up these grade marks ( it would be like adding up As and Bs in the US) you usually compute a mean score on the same scale. - So when handing out absolute scores AKA how many points did you score, almost any normal test will score you zero points for doing nothing. – Falco Jan 08 '16 at 09:29
3

My maths teacher from high school used to say that just by showing up and writing your name on the paper is worth something.. respect at the bare minimum.. thus you get something for the effort of being there. From a data management perspective, it sure is easier to use zero for special cases such as absent or kicked out, etc.. As from a statistical perspective, if said scores are compounded into a final GPA, then a zero would damage your average on a pessimistic side, and educators try to be optimistic about their pupils.

user283885
  • 131
  • 2
  • 3
    I suspect this is something teachers say when they don't have enough questions to make the exam points add up to 100. – ff524 Jan 08 '16 at 12:06
  • Not necessarily. From my educational background grading starts from 1 to 10, 10 being highest, however the general rule was to grade from 4 upwards, because this smoothed the scale of kids learning and trying but failing by a margin. We try and encourage those kids to pass. We even have "bonus points". The convention was to grade from 4 bellow as punishment for bad behavior. Remember there are kids that try to learn but have it hard, and there are brats that are in a dire need of correction. – user283885 Jan 08 '16 at 16:42
  • The reasoning for walking the 4 line was that if the kid had a score of 2 subjects graded less than 5, he will be held back a year. Or if the kid is a nuisance, then transferred to a special class. Thus we even use grades like 4.5 with the option of a 5 roundup if the kid agrees to take supplemental homework for example.. Remember, as a teacher the hope is that the kid will ultimately pass on his own work, and the purpose is not to crush him under a lost cause mark. However there is only so much you can do as a teacher and by no means I can fix through grades or talk bad parenting.. – user283885 Jan 08 '16 at 16:58
  • All in all, I think that score that give you something for nothing are also used in critical exams that affect the kid's outcome and are used as a correction to decrease the odds of failure. These kind of exams usually put lots of stress on the individual and some kids might just lock down and freeze under pressure. – user283885 Jan 08 '16 at 17:09
2

It may depend on the test.


The Wechsler SD15 IQ Test is intended to produce scores such that the mean score is 100 with a standard deviation of 15, so about 5 percent of the population has an IQ score below 75 points. Assuming that the scores are normally distributed, subjects receiving a score of zero would be so vanishingly rare (a billionth of a percent of test-takers) that it would be impossible to ensure that the scores remain valid so far out in the tails. It would also be very difficult to ensure that these very impaired subjects realize that/how they're being tested at all. Pinning down the precise value may not have much clinical value either, so extremely low scores can be reported as <20 (or whatever).
The SAT uses a scoring system that penalizes random guessing:
  • Correct answers increase the score by a point
  • Blank answers neither earn nor lose points
  • Incorrect answers decrease the score by a fraction of a point.

By choosing an appropriate fraction for the penalty, you can ensure that guessing has zero expected value. However, unless an offset is added, subjects can potentially receive scores below zero if they perform worse than chance. These very low results may not be particularly informative, so perhaps ETS reports something like max(earned score, chance).

Matt
  • 5,161
  • 2
  • 22
  • 28
  • 1
    The College Board does not report max(earned score, 0) for the SAT. If you get a negative raw score, it's scaled to a different (lower) adjusted grade than if you get a zero raw score (e.g. by handing in a blank paper.) – ff524 Jan 08 '16 at 20:23
  • This chart says that any raw scores below a -1 (math) or -2 (reading) are reported as a 200. – Matt Jan 08 '16 at 20:30
  • On that particular exam, yes (every exam administration can be scaled slightly differently.) And a 0 raw score is scaled to 220, which is higher. – ff524 Jan 08 '16 at 20:33
1

In the Netherlands, most children at the end of primary school make the Cito test, which is much like the SAT test, but with a score ranging from 501 to 550. According to this (dutch) article it is done to prevent parents from associating the score with school grades, ranging from 1 to 10, and IQ-tests, with an average of 100.

Ezra
  • 11
  • 1
0

I would believe that this lower number for test scores and the range are arbitrarily chosen at random, perhaps to make people feel better when they get a low score like 300 - 500 or there about Cisco and 200 to say 400 or so for SAT so it is made that way for Psychological reasons and is arbitrary... I hope this answers you question, there may be other possible possibilities, but I think this one stands out as being of the highest probability.

user47063
  • 25
  • 1
  • 1
    Do you have any evidence to back up your suggestions here, or is it just guesswork? – David Richerby Jan 07 '16 at 10:35
  • @DavidRicherby I don't think it's either. It seems more like a plausible deduction, if not of the intent of the scale, then of what is certainly an effect and a desired one at that. (I'm kind of echoing the answer though, which pretty clearly answers your question in the first place). –  Jan 07 '16 at 21:49
  • 2
    My point is that the guesswork of some anonymous stranger on the internet doesn't carry a whole lot of weight, since we have no way at all of evaluating whether your guess is likely to be correct. The reason I asked if you had any evidence is that the answer might have been "yes". It was possible that you'd phrased your answer very cautiously because you were basing it on something you vaguely remembered reading ten years ago, for example. – David Richerby Jan 07 '16 at 21:54
  • 1
    @DavidRicherby From this answer, I learned that setting a minimum score of 200 v. 0 has an important psychological impact on test takers. The answer presented no formal evidence but pointed out this effect. Now I believe it. I'm glad this answer was posted. –  Jan 07 '16 at 23:15
  • @djechlin You believe something after reading this answer that you did not believe before reading it? If so, that is not inherently valuable and in fact would be detrimental if what you now believe is demonstrably not the case. Plausible deductions are valuable as starting points for research, not as final answers. My understanding is that the latter is preferred on stack exchange. To me a perfectly plausible deduction is that a minimum score of 200 has no psychological benefit if everyone knows that's the minimum. Test takers are likely mentally subtracting 200 from whatever score they got. – Todd Wilcox Jan 08 '16 at 03:16
  • 1
    @ToddWilcox It would be detrimental if you read any answer and it later turns out to be demonstrably not the case. Your deduction is not plausible to me since it contradicts much about psychology, namely the fact that we don't usually perform math whenever we look at something. –  Jan 08 '16 at 04:02