• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Weird grade distribution. Statistics help?

Almo

Masterblazer
Joined
Aug 30, 2005
Messages
6,846
Location
Montreal, Quebec
So I just graded a test in my class. I got a weird distribution as shown in the attached picture. The grades in question are:

56%
55%
52%
52%
51%
51%
51%
48%
47%
45%
35%
13%
13%
9%

I feel like the extreme bunching is trying to tell me something, but I don't know what. It was a 15-question test, with each question worth 5 points. They were long answer/essay questions, and students often got partial credit.

I may post a distribution of which questions had how many points scored on average, once I have that data in an easy-to-use format.

Anyone have any interesting thoughts on this?
 

Attachments

  • GradeDist.png
    GradeDist.png
    11.6 KB · Views: 16
Its probably telling you that there's a jump in difficulty for the last 50% of the marks? Did they all tend to get the same questions wrong?
 
Well, all it tells me is that your class failed hard.

What was the test on?
 
Did you specify a failing/passing level?

(In some of the university classes I attended were no grades as such; there was a level that let you 'pass'/get the certificate for this particular class. Such distributions were fairly common in such classes, since there simply was no point to do more work than necessary. Students could option to get a grade, usually required for justifying a stipend, but the vast majority of students just needed to pass and did not need a grade.)
 
Looks like a mixture (two subpopulations, the good and the bad) and that the test was too hard.
 
I may post a distribution of which questions had how many points scored on average, once I have that data in an easy-to-use format.

Anyone have any interesting thoughts on this?

That would be handy. Another useful tabulation (no chart necessary) would be the number of responses for each question. But I'm inclined to agree that the test was just too hard. Either that or you are the victim of a conspiracy by your students (But that way lies madness!)
 
They were long answer/essay questions, and students often got partial credit.


I wonder if the source of the error isn't you. Maybe you didn't grade blindly enough. So, when you saw that a student had some idea what was going on, you gave partial credit in a generous way to get them up to the class median. However, if the student started out bombing heavily, you gave up on him/her and didn't show any mercy in later essays.

If you graded all tests from first question to last, this should show up as higher grades on later questions for the class median students.

I do not mean any insult. Such an error in subjective grading seems perfectly human.
 
Not necessarily bunching. You may simply be seeing a reasonable but non-Gaussian distribution. Perhaps a Poisson distribution is more appropriate. Look more closely at the values in the 40-60 range, binned in increments of 2.

This might tell you that your underlying test is not uniform - the questions are not uniformly spread from easy to difficult (and, as others say, there are too many difficult questions).
 
Extra details:

It was a game design test. 8 weeks before the test, I told them there would be a test on Zuma Blitz, the Facebook game. I said they should play it enough to get at least to level 15, perhaps more to be sure.

I then gave them a sample test with answers for Bejeweled Blitz, so they would have some idea what I would be looking for. Several of them were on my friends list, and I can see what level they got to, and it wasn't higher than 5 for many of them.

I periodically warned them they had to play it enough, or the test would be quite hard for them.

There was plenty of time, as they had 4 hours to take the test. Most finished in less than 2, and started working on their final projects (some just left).

Judging by peoples' responses here, I had better do the per-question grade distribution. Will post that tomorrow.
 
Given a small sample size, (15 questions), and a non random sample (your teaching, the class material, the students), then why would you expect a bell curve?

I'd say your results suggest it was a bad test, bad teaching, or an under-prepared batch of students.

To determine which variable caused the results, you need to consider what changed? Is this new material for you, new test questions, a new grading method, or a new class of students?
 
Extra details:

It was a game design test. 8 weeks before the test, I told them there would be a test on Zuma Blitz, the Facebook game. I said they should play it enough to get at least to level 15, perhaps more to be sure.

I then gave them a sample test with answers for Bejeweled Blitz, so they would have some idea what I would be looking for. Several of them were on my friends list, and I can see what level they got to, and it wasn't higher than 5 for many of them.

I periodically warned them they had to play it enough, or the test would be quite hard for them.

There was plenty of time, as they had 4 hours to take the test. Most finished in less than 2, and started working on their final projects (some just left).
Was there some value in this game?

Unless I was studying game programing, I'd be annoyed that a professor dictated a single learning method, playing a game.

Judging by peoples' responses here, I had better do the per-question grade distribution. Will post that tomorrow.
Excellent idea.
 
My guess would be that:

1) If you played the game as required, a score of 45% to 55% is very easy and expected.
2) If you didn’t played the game as required, a score of 5% to 15% is very easy and expected.

So the test tests whether or not somebody played the game as required (and not much else). Everybody fits into those categories, except the 35% which could be someone who either didn’t play the game fully as required or who did but just didn’t do as well as expected on the test.

If you were expecting the test to test game design skills, my guess is that it actually tested who played the game and who didn’t without much relevance to game design skills.
 
I’m not sure why you think the distribution seems unusual. It looks to me like a “pop quiz” distribution. Everybody is assigned to read Chapter 13 from the textbook. The teacher gives a pop quiz on Chapter 13. Most students score in the same range, indicating that they read Chapter 13. A few are very low, indicating that they didn’t read Chapter 13. One or two are below the normal range, but not really low, indicating that they didn’t read all of Chapter 13 or that they did but are struggling.
 
Random question? What does skill in playing a game have to do with skill in designing a game. I know it's possible for designers to make games that they personally can't beat; that's why there are dev-mode 'cheats' and back doors. I haven't personally played Zuma blitz, but if it's in the same vein as the other zuma games, I'd expect it to be dexterity based, and of the sort that gets faster and faster until the user can't keep up.

It's possible that your grade distribution could be because you have two populations, one that has fast enough reflexes to get to the level you specified for a pass, and one that does not, with a few players falling in the middle.

As a further aside, if someone made skill at a specific computer game, especially a dexterity based game, part of the grade for a game designing class, I'd be in the dean's office lodging a protest the very same day. Skill at zuma has nothing at all to do with skill at programming.
 
If you can see how far your students - or some of them - got in the game, maybe you should first see if there is a correlation between the levels they played and the score they reached.

I have to agree that having to play a game to a certain level in order to be able to pass a test is a bad idea - but I'd have to know the game and the questions to be able to judge that.
 
I feel like the extreme bunching is trying to tell me something, but I don't know what. It was a 15-question test, with each question worth 5 points. They were long answer/essay questions, and students often got partial credit.

I'd suggest that you felt that a number of people gave reasonable but incomplete answers to most of the questions and thus got 2 or 3 out of 5 for many of the questions. Then it doesn't seem unreasonable that most people would get 40 to 60%.
 
Extra details:

It was a game design test. 8 weeks before the test, I told them there would be a test on Zuma Blitz, the Facebook game. I said they should play it enough to get at least to level 15, perhaps more to be sure.

I then gave them a sample test with answers for Bejeweled Blitz, so they would have some idea what I would be looking for. Several of them were on my friends list, and I can see what level they got to, and it wasn't higher than 5 for many of them.

I periodically warned them they had to play it enough, or the test would be quite hard for them.

There was plenty of time, as they had 4 hours to take the test. Most finished in less than 2, and started working on their final projects (some just left).

Judging by peoples' responses here, I had better do the per-question grade distribution. Will post that tomorrow.

Um what is the skill levels prior to testing? And age distribution?
 
Is it possible that you give sample questions?

I'd be interested in answering several times while increasing my zuma-skills.

You said players were advised to advance to level 15 at least, possibly more. So it wasn't about specific features that were released or used in specific levels, was it?
 

Back
Top Bottom