Rating system for game

Almo · Apr 30, 2009

Hi!

I'll be working on a problem on my job for a bit. Since it's a relatively interesting problem, I thought maybe a few math weirdoes here might like to discuss it. I have to be vague about details of the game's mechanics because it's still a secret project, but I'm sure I can describe the challenge sufficiently.

Here's the issue. I'm working on a game intended to go out to an audience of about 500,000 people. More would be good, and obviously if it doesn't go over well, it might be less.

It is a game involving player-versus-player interactions, and we want a rating system that can automatically rate players' performance.

At its core, people will create challenges given a toolbox and an amount of points to spend. These will be uploaded to a server and will be persistent (note that I did not say that we will persist the challenges

). Then players can browse challenges, and attempt to overcome them. They do so by bringing a set of tools from their toolbox, which are selected using the same point system.

I've read about the Elo system and also found an interesting system for rating Bridge play.

We can't use Elo out of the box, because Elo responds to win/loss/draw. We have the potential for a player to attempt a 1000 point challenge with a 750 point toolbox. What happens if they lose, but with 100 points remaining on the other side? That looks like a win to me. The Bridge system up there deals with that by estimating how many points a player should win by, then counting it as a win or loss based on how close they get to that amount, not whether or not they actually win.

We intend to have a "stock market" that shows the current ratings of challenges, and we intend to give out bigger rewards for defeating higher-ranked challenges. Players also earn an income of sorts from maintaining a high rank. Of course this means abusers will want to try to break the system for personal gain. I'm aware it's hard or impossible to prevent that, but we'd at least like to minimize abuse.

Questions I have for anyone who might be interested in the problem:

1) Am I barking up the wrong tree looking at Elo and this bridge system?
2) How deep into conceptualization do I need to go? For example, Elo assumed chess play was nomally distributed around a player's "skill" (it turns out not to be).
3) Have I given enough information about the system for a meaningful discussion?

Ultimately, I know I'm asking for help with doing my job here, and that's on the weird side. So I won't take it personally if nobody answers.

Molinaro · Apr 30, 2009

Almo said:
Hi!

I'll be working on a problem on my job for a bit. Since it's a relatively interesting problem, I thought maybe a few math weirdoes here might like to discuss it. I have to be vague about details of the game's mechanics because it's still a secret project, but I'm sure I can describe the challenge sufficiently.

Here's the issue. I'm working on a game intended to go out to an audience of about 500,000 people. More would be good, and obviously if it doesn't go over well, it might be less.

It is a game involving player-versus-player interactions, and we want a rating system that can automatically rate players' performance.

At its core, people will create challenges given a toolbox and an amount of points to spend. These will be uploaded to a server and will be persistent (note that I did not say that we will persist the challenges ). Then players can browse challenges, and attempt to overcome them. They do so by bringing a set of tools from their toolbox, which are selected using the same point system.

I've read about the Elo system and also found an interesting system for rating Bridge play.

We can't use Elo out of the box, because Elo responds to win/loss/draw. We have the potential for a player to attempt a 1000 point challenge with a 750 point toolbox. What happens if they lose, but with 100 points remaining on the other side? That looks like a win to me. The Bridge system up there deals with that by estimating how many points a player should win by, then counting it as a win or loss based on how close they get to that amount, not whether or not they actually win.

We intend to have a "stock market" that shows the current ratings of challenges, and we intend to give out bigger rewards for defeating higher-ranked challenges. Players also earn an income of sorts from maintaining a high rank. Of course this means abusers will want to try to break the system for personal gain. I'm aware it's hard or impossible to prevent that, but we'd at least like to minimize abuse.

Questions I have for anyone who might be interested in the problem:

1) Am I barking up the wrong tree looking at Elo and this bridge system?
2) How deep into conceptualization do I need to go? For example, Elo assumed chess play was nomally distributed around a player's "skill" (it turns out not to be).
3) Have I given enough information about the system for a meaningful discussion?

Ultimately, I know I'm asking for help with doing my job here, and that's on the weird side. So I won't take it personally if nobody answers.

I can't seem to wrap my head around what you mean by "100 points left on the other side".

Are you saying that with that 1000 point challenge, the player uses up his 750 points worth of tools and the 1000 point challenge has been reduced to a 100 point challenge that he no longer has tools to solve?

JoeTheJuggler · Apr 30, 2009

Almo said:
What happens if they lose, but with 100 points remaining on the other side? That looks like a win to me.

It sounds like a loss to me. If the major part of the game is deciding how and when to spend your points, losing with a big cache of points represents really bad strategy.

ETA: I guess it matters what the stated objective of the game is. In chess, the objective is to put your opponent's King in checkmate. If you lose a game of chess, it doesn't matter at all that you kept, for example, all your pawns.

On the other hand, if it's the sort of game that everyone eventually loses, maybe you're deciding how many scenarios (or how long) you survived while spending a certain amount of points. Then, the only way I'd count unspent points as a credit would be in a tie break for two players who otherwise survived the same amount of time. If that's the case, conserving points at the end should be stated as part of the objective. If it's not, dying with zero points may have been better play than dying with a big cache of unused points.

Almo · Apr 30, 2009

Molinaro said:
I can't seem to wrap my head around what you mean by "100 points left on the other side".

Are you saying that with that 1000 point challenge, the player uses up his 750 points worth of tools and the 1000 point challenge has been reduced to a 100 point challenge that he no longer has tools to solve?

Ah sorry.

You go win with your 750 points worth of tools, and eliminate 900 points worth of the challenge using all 750 of your own points. If you could continue, there would still be 100 points worth of stuff to overcome.

So, yes: the challenge has been reduced to a 100 point challenge that he no longer has tools to solve.

Almo · Apr 30, 2009

JoeTheJuggler said:
It sounds like a loss to me. If the major part of the game is deciding how and when to spend your points, losing with a big cache of points represents really bad strategy.

If the 750 point challenger removes 900 points worth of the challenge, he fails to "win" because he doesn't remove all 1000 points of the challenge.

But, he removed more than 750, which indicates he played better than expected if you were just looking at the points. So I would call that a win for the purpose of ratings, and the 1000 point challenge would then have perhaps a value of 950.

NewtonTrino · Apr 30, 2009

Have you taken a look at the truskill research that MS has done? This is what the rankings on Xbox are based around.

http://research.microsoft.com/en-us/news/features/trueskill.aspx

JoeTheJuggler · Apr 30, 2009

I see, there are points allotted at the beginning AND points scored (presumably based on how many challenges a player overcomes or beats or whatever). So it's what I said: something about how long you survived. In other words, there's losing on the first challenge or losing on your 157th challenge--not equivalent losses.

Molinaro · Apr 30, 2009

In that case I would try and come up with a rating system that assigns a score based on:

- ratio of challenge points to tool points used in the attempt
- ratio of remaining tool points after completion to starting tool points
- ratio of challenge points remaining after tool points run out to starting challenge points

I would set some fixed scores for specific values of those ratios and scale accordingly.

For example:

1) Challenge points = Tool points would scale your score by a factor of 1, that is to say your score will not be increased or decreased.

2) Challenge points = 2 x Tool points would scale your score by a factor of 2

3) Challenge points = 1/2 Tool points would scale your score by a factor of 0.5

etc.

I would pick a base score such as 100 for an even challenge.

Then I would play with the ratios of remaining challenge points (on a failure) or tool points (on success) to further scale the score accordingly.

I would then play with those ratios to see what range of scores can reasonably be expected to occur (would someone ever have 90% of tool points remaining on an equal points challenge?) and adjust accordingly.

Almo · Apr 30, 2009

JoeTheJuggler said:
I see, there are points allotted at the beginning AND points scored (presumably based on how many challenges a player overcomes or beats or whatever). So it's what I said: something about how long you survived. In other words, there's losing on the first challenge or losing on your 157th challenge--not equivalent losses.

You're right: where you lose is important. Officially speaking, we won't be awarding points. You get to the end and succeed, or you don't. It's even possible to get to the end without eliminating some of the challenges as they can sometimes be bypassed, though I expect this effect to be reasonably low. For example, we'll redesign things if you could get through without eliminating 50% of the challenge points.

But I'm thinking we can use the point system to get an internal handle on the players' performances. Though I guess players do need to know what they're rated on, so we'd have to make it public.

NewtonTrino said:
Have you taken a look at the truskill research that MS has done? This is what the rankings on Xbox are based around.

http://research.microsoft.com/en-us/news/features/trueskill.aspx

Thanks for that. That was actually the next thing I was going to look for, though I didn't know what it was called so you saved me some hunting time.

Molinaro: I'll take a look at how that might work in our system. I'm in the investigative phase right now, and many ideas will be taken down and evaluated.

69dodge · Apr 30, 2009

I'm not sure I undertand what's going on here.

It seems like you're trying to simultaneously figure out (1) how good each player is, (2) how hard each challenge is, and (3) how useful each tool is, based on the outcomes of lots of games in which various players use various tools to attempt various challenges.

Is that about right?

And the whole points business is just a (possible) means to this end?

Or, does the number of points assigned to a tool, say, have some fixed significance that won't change regardless of the outcomes of any games in which it's used?

Almo · Apr 30, 2009

69dodge said:
I'm not sure I undertand what's going on here.

It seems like you're trying to simultaneously figure out (1) how good each player is, (2) how hard each challenge is, and (3) how useful each tool is, based on the outcomes of lots of games in which various players use various tools to attempt various challenges.

Is that about right?

And the whole points business is just a (possible) means to this end?

Or, does the number of points assigned to a tool, say, have some fixed significance that won't change regardless of the outcomes of any games in which it's used?

Players won't have ratings, only the challenges they produce have ratings. We will evaluate the usefulness of the tools using play data, but that's not something that will be adjusted on a daily basis. That's something we'd patch over the long term to make sure the point system works properly.

The system I've described is for rating the quality of a challenge given data for attempts on the challenge.

Rating system for game

Almo

Masterblazer

Molinaro

Illuminator

JoeTheJuggler

Penultimate Amazing

Almo

Masterblazer

Almo

Masterblazer

NewtonTrino

Illuminator

JoeTheJuggler

Penultimate Amazing

Molinaro

Illuminator

Almo

Masterblazer

69dodge

Illuminator

Almo

Masterblazer