Weird grade distribution. Statistics help?

Almo · May 5, 2011

So I just graded a test in my class. I got a weird distribution as shown in the attached picture. The grades in question are:

56%
55%
52%
52%
51%
51%
51%
48%
47%
45%
35%
13%
13%
9%

I feel like the extreme bunching is trying to tell me something, but I don't know what. It was a 15-question test, with each question worth 5 points. They were long answer/essay questions, and students often got partial credit.

I may post a distribution of which questions had how many points scored on average, once I have that data in an easy-to-use format.

Anyone have any interesting thoughts on this?

Professor Yaffle · May 5, 2011

Its probably telling you that there's a jump in difficulty for the last 50% of the marks? Did they all tend to get the same questions wrong?

ehcks · May 5, 2011

Well, all it tells me is that your class failed hard.

What was the test on?

Ivor the Engineer · May 5, 2011

Was there enough time to answer the number of questions required to obtain 100%?

elgarak · May 6, 2011

Did you specify a failing/passing level?

(In some of the university classes I attended were no grades as such; there was a level that let you 'pass'/get the certificate for this particular class. Such distributions were fairly common in such classes, since there simply was no point to do more work than necessary. Students could option to get a grade, usually required for justifying a stipend, but the vast majority of students just needed to pass and did not need a grade.)

Jorghnassen · May 6, 2011

Looks like a mixture (two subpopulations, the good and the bad) and that the test was too hard.

WhatRoughBeast · May 6, 2011

Almo said:
I may post a distribution of which questions had how many points scored on average, once I have that data in an easy-to-use format.

Anyone have any interesting thoughts on this?

That would be handy. Another useful tabulation (no chart necessary) would be the number of responses for each question. But I'm inclined to agree that the test was just too hard. Either that or you are the victim of a conspiracy by your students (But that way lies madness!)

Loss Leader · May 6, 2011

Almo said:
They were long answer/essay questions, and students often got partial credit.

I wonder if the source of the error isn't you. Maybe you didn't grade blindly enough. So, when you saw that a student had some idea what was going on, you gave partial credit in a generous way to get them up to the class median. However, if the student started out bombing heavily, you gave up on him/her and didn't show any mercy in later essays.

If you graded all tests from first question to last, this should show up as higher grades on later questions for the class median students.

I do not mean any insult. Such an error in subjective grading seems perfectly human.

dakotajudo · May 6, 2011

Not necessarily bunching. You may simply be seeing a reasonable but non-Gaussian distribution. Perhaps a Poisson distribution is more appropriate. Look more closely at the values in the 40-60 range, binned in increments of 2.

This might tell you that your underlying test is not uniform - the questions are not uniformly spread from easy to difficult (and, as others say, there are too many difficult questions).

Almo · May 6, 2011

Extra details:

It was a game design test. 8 weeks before the test, I told them there would be a test on Zuma Blitz, the Facebook game. I said they should play it enough to get at least to level 15, perhaps more to be sure.

I then gave them a sample test with answers for Bejeweled Blitz, so they would have some idea what I would be looking for. Several of them were on my friends list, and I can see what level they got to, and it wasn't higher than 5 for many of them.

I periodically warned them they had to play it enough, or the test would be quite hard for them.

There was plenty of time, as they had 4 hours to take the test. Most finished in less than 2, and started working on their final projects (some just left).

Judging by peoples' responses here, I had better do the per-question grade distribution. Will post that tomorrow.

Skeptic Ginger · May 6, 2011

Given a small sample size, (15 questions), and a non random sample (your teaching, the class material, the students), then why would you expect a bell curve?

I'd say your results suggest it was a bad test, bad teaching, or an under-prepared batch of students.

To determine which variable caused the results, you need to consider what changed? Is this new material for you, new test questions, a new grading method, or a new class of students?

Skeptic Ginger · May 6, 2011

Almo said:
Extra details:

It was a game design test. 8 weeks before the test, I told them there would be a test on Zuma Blitz, the Facebook game. I said they should play it enough to get at least to level 15, perhaps more to be sure.

I then gave them a sample test with answers for Bejeweled Blitz, so they would have some idea what I would be looking for. Several of them were on my friends list, and I can see what level they got to, and it wasn't higher than 5 for many of them.

I periodically warned them they had to play it enough, or the test would be quite hard for them.

There was plenty of time, as they had 4 hours to take the test. Most finished in less than 2, and started working on their final projects (some just left).

Was there some value in this game?

Unless I was studying game programing, I'd be annoyed that a professor dictated a single learning method, playing a game.

Almo said:
Judging by peoples' responses here, I had better do the per-question grade distribution. Will post that tomorrow.

Excellent idea.

DevilsAdvocate · May 6, 2011

My guess would be that:

1) If you played the game as required, a score of 45% to 55% is very easy and expected.
2) If you didn’t played the game as required, a score of 5% to 15% is very easy and expected.

So the test tests whether or not somebody played the game as required (and not much else). Everybody fits into those categories, except the 35% which could be someone who either didn’t play the game fully as required or who did but just didn’t do as well as expected on the test.

If you were expecting the test to test game design skills, my guess is that it actually tested who played the game and who didn’t without much relevance to game design skills.

DevilsAdvocate · May 6, 2011

I’m not sure why you think the distribution seems unusual. It looks to me like a “pop quiz” distribution. Everybody is assigned to read Chapter 13 from the textbook. The teacher gives a pop quiz on Chapter 13. Most students score in the same range, indicating that they read Chapter 13. A few are very low, indicating that they didn’t read Chapter 13. One or two are below the normal range, but not really low, indicating that they didn’t read all of Chapter 13 or that they did but are struggling.

Andrew Wiggin · May 6, 2011

Jorghnassen said:
Looks like a mixture (two subpopulations, the good and the bad) and that the test was too hard.

This would be my impression as well.

Andrew Wiggin · May 6, 2011

Random question? What does skill in playing a game have to do with skill in designing a game. I know it's possible for designers to make games that they personally can't beat; that's why there are dev-mode 'cheats' and back doors. I haven't personally played Zuma blitz, but if it's in the same vein as the other zuma games, I'd expect it to be dexterity based, and of the sort that gets faster and faster until the user can't keep up.

It's possible that your grade distribution could be because you have two populations, one that has fast enough reflexes to get to the level you specified for a pass, and one that does not, with a few players falling in the middle.

As a further aside, if someone made skill at a specific computer game, especially a dexterity based game, part of the grade for a game designing class, I'd be in the dean's office lodging a protest the very same day. Skill at zuma has nothing at all to do with skill at programming.

Rasmus · May 6, 2011

If you can see how far your students - or some of them - got in the game, maybe you should first see if there is a correlation between the levels they played and the score they reached.

I have to agree that having to play a game to a certain level in order to be able to pass a test is a bad idea - but I'd have to know the game and the questions to be able to judge that.

Tubbythin · May 6, 2011

Almo said:
I feel like the extreme bunching is trying to tell me something, but I don't know what. It was a 15-question test, with each question worth 5 points. They were long answer/essay questions, and students often got partial credit.

I'd suggest that you felt that a number of people gave reasonable but incomplete answers to most of the questions and thus got 2 or 3 out of 5 for many of the questions. Then it doesn't seem unreasonable that most people would get 40 to 60%.

Dancing David · May 6, 2011

Almo said:
Extra details:

It was a game design test. 8 weeks before the test, I told them there would be a test on Zuma Blitz, the Facebook game. I said they should play it enough to get at least to level 15, perhaps more to be sure.

I then gave them a sample test with answers for Bejeweled Blitz, so they would have some idea what I would be looking for. Several of them were on my friends list, and I can see what level they got to, and it wasn't higher than 5 for many of them.

I periodically warned them they had to play it enough, or the test would be quite hard for them.

There was plenty of time, as they had 4 hours to take the test. Most finished in less than 2, and started working on their final projects (some just left).

Judging by peoples' responses here, I had better do the per-question grade distribution. Will post that tomorrow.

Um what is the skill levels prior to testing? And age distribution?

Rasmus · May 6, 2011

Is it possible that you give sample questions?

I'd be interested in answering several times while increasing my zuma-skills.

You said players were advised to advance to level 15 at least, possibly more. So it wasn't about specific features that were released or used in specific levels, was it?

Weird grade distribution. Statistics help?

Masterblazer

Attachments

Butterbeans and Breadcrumbs

Illuminator

Penultimate Amazing

Illuminator

Illuminator

Graduate Poster

I would save the receptionist., Moderator

Critical Thinker

Masterblazer

Nasty Woman

Nasty Woman

Philosopher

Philosopher

Master Poster

Master Poster

Philosopher

Illuminator

Penultimate Amazing

Philosopher