Math/Statistics major needed

Stormich · #1 02-01-2011, 09:39 AM

Well basically name says it all. We are in need of a Math/Statistics major that would be willing to help us optimize the ELO calculations ladder uses to determine ratings. This is the last (and probably trickiest) part to be done before Ladder season 2 starts. Eso was doing this for us but he unfortunately vanished so now we need someone else to help us finish things up so we can finally start the new season.
Anyone with knowledge in team based rating system is free to apply. Unfortunately this is pro bono work and the only pay would be the recognition of doing so.
For contact you can message me or apply in this thread.

Thank you for your time

I hope we get someone so we can start pewpewing season 2 already TT

Ribilla · #2 02-01-2011, 05:25 PM

I do Engineering at Cambridge, but most the maths on our course centres around complex numbers and vector calculus, so I'm not familiar with "team based rating systems". If you need some number crunching once the basic idea is in place though, I'd be happy to help.

Wok3N^ · #3 02-01-2011, 05:40 PM

I just submitted a request to League of Legends devs to see what ELO system they use. They also have a "Ladder" which is 5v5 and the ELO system seems to work.

mikesol · #4 02-01-2011, 06:10 PM

Quote:

Originally Posted by Wok3N^

I just submitted a request to League of Legends devs to see what ELO system they use. They also have a "Ladder" which is 5v5 and the ELO system seems to work.

I was reading up on the LoL elo system and apparently there are a lot of people who think it's flawed and needs to be re-done. World of Warcraft started off with elo like chess but apparently that doesn't extend well into a multi-person team. Microsoft has developed a "TrueSkill" rating system that seems to be pretty good. You can read about it here. I think I understand it fairly well.

For those who read about it - any thoughts?

elxir · #5 02-01-2011, 06:23 PM

i like the current system but the biggest issue is when a god awful player is on one team, and the other team has 5-6 respectable people

perhaps reduce how many points you lose (but not gain) if a player on your team is under 1500?

alternatively pursue my brilliant plan of an 'open' ladder server 1, and an 'elite' 1800+ ladder server 2

win baby win

Rainmaker · #6 02-01-2011, 06:34 PM

Quote:

Originally Posted by Stormich

Well basically name says it all. We are in need of a Math/Statistics major that would be willing to help us optimize the ELO calculations ladder uses to determine ratings. This is the last (and probably trickiest) part to be done before Ladder season 2 starts. Eso was doing this for us but he unfortunately vanished so now we need someone else to help us finish things up so we can finally start the new season.
Anyone with knowledge in team based rating system is free to apply. Unfortunately this is pro bono work and the only pay would be the recognition of doing so.
For contact you can message me or apply in this thread.

Thank you for your time

I hope we get someone so we can start pewpewing season 2 already TT

I'm no major, but I'm studying industrial engineering @ University of Buenos Aires, and I've done the subjects "Statistics & Probability" and "Technical Statistics".
On a personal point of view I don't think that ELO works for a multilayer team-based ranking system, because ELO was/is designed for 1v1 match ups.
Cutting it down to words: the ranking represents only the performance on your team, and not you. Therefore, your ranking will change, and you will be paying the achievements or failures of your teams.
There could be addapt ELO as TrueySkill from MS does. It adds a second variable, which takes into account the individual contributions of the player to the team.
But I don't think this really would work. Why?
Well, first we have to understand that ELO system is about measuring the RELATIVE skill between 2 players, hence the sum of all player's ranking divided number of players = constant. On other ranking system the idea of "escaling up" (like MMORPG) is an ABSOLUTE skill system (where usually there is a top barrier, like "lvl 60" or 20M exp).

Explaining the math between ELO:
ELO works pretty simple, it takes your current score (you always start at the middle of table 1600 or 1500 for altitude I think) and it add scores if you win or removes if you loose.
The "add/remove" scores isn't always the same. This is based on the prediction of the result. ELO considers that you are less likely to loose a game if your rank is higher than your opponents, or less likely to win if your rank is lower than your opponents.
ie: A : rank 2500 ; B : rank 2000.

ELO has a prediction assuming a Normal distribution of the ranking, which variables are mu and sigma.
mu = your average ranking prediction
sigma = variation.
So, for example ELO assumes that B is little likely to win this much, because of the difference between A-B ranking. That way ELO rewards highly B if he wins (Added more ranking points) and gives a high penalty to A (removing the same amount ranking points) for loosing against a low ranking player.

As you can see this ranking system is pretty straight forward, and dependent on the fact that the only variable is the 2 players (A and B) and therefore doing a prediction on this subjects skill.
If we try to extrapolate this system to team based games, it won't work as well. You will start seeing some incongruities on the ranking on some players.
And most team based games have this problem: you can't measure a single players skill on a team-based game.
why? because the game result measures the skill of the teams, not the player. In the long run it doesn't matter in a TBD game if you killed 50 or 30 planes, if your team didn't score the 6 goals or didn't kill the enemy base. It is an assumption that killing more planes will make more likely for you to score, which may not always be true.
This is way adding variables to ELO based on team-based games isn't going to work.

But, for a non.professional purpose, and only for the heck of it, i would like to give it a try :/

elxir · #7 02-01-2011, 06:50 PM

well, it wouldn't be a bad idea to account for base dmg / kills / goals / assists in the ranking, but ****ty team players who just try to kill a lot (especially in ball) who don't help you win would benefit without actually being good.

elxir · #8 02-01-2011, 06:51 PM

ETA: what if you ranked each player on each team head to head with the person most similar in rank on the other team, roughly like what fake intothewow suggested?

Urpee · #9 02-01-2011, 07:08 PM

Quote:

Originally Posted by elxir

ETA: what if you ranked each player on each team head to head with the person most similar in rank on the other team, roughly like what fake intothewow suggested?

I think this is a good point and I thought the same when I saw some team compositions.

In fact I would recommend the following picking strategy:

1) Pick a team randomly to start, label it A, the other B.
2) Pick top ranked and bottom ranked player for team A of remaining pool
3) Pick top ranked and bottom ranked player for team B of remaining pool
4) Repeat 2) 3) until all slots are filled.

This matches best to best and worst to worst and coverges to the center in a balanced way. The previous pick is guaranteed to outperform the next. The starting team can have an advantage but it is randomized, so on average everybody has the same advantage.

But the second problem with ladder is that the ladder scores simply are not reflective of player ability so clearly the way the scores are arrived at or maintained or factored in needs some tweaking.

Zero-sum is nice because one can make claims about the stability of the ranking. But then again almost all adders I know are reset after some time, so very long term stability really isn't an issue. It's more the question if the scoring system has the right dynamics (i.e. good players with a low ranking will bubble up and bad players with a high ranking will plummet, rankings that are sensible have good stability properties).

TrueSkill looks interesting. Testing against expectation seems a very clean setup and makes a lot of sense.

Rainmaker · #10 02-01-2011, 07:09 PM

Quote:

Originally Posted by elxir

ETA: what if you ranked each player on each team head to head with the person most similar in rank on the other team, roughly like what fake intothewow suggested?

I didn't meant that, though I thought of that possibility too, but it wouldn't be fair.
There is another thing I hadn't though off, but heard someone say it:
How does the current scramble system work?
I heard that trys that the sum of all the players ranking on one side match the one of the other on the other.
So for example there is an average of 1700 ranking on team red and an average ranking of 1700 on team blue as well?

My personal oppinion would is that we should keep as it is currently, but making variable the ammount of point each player gets.
For example, looking at the current numbers every players gets +/- 24 points for each match played.
Instead (TBD), in a winning team there are 24*5 = 120 points.
We would redistribute this in a different way (is like having a ranking filter inside ELO).
For example, TBD, 30 out of the 120 are destinated to bombers: that way ~33 dmg done to base equals 10 ELO points. 5 dmg equals 1~2 points, 10 dmg 3 points. This could be simplified, instead of a linear, we could make it by blocks.
0~5 dmg = 1 points.
6~15 dmg = 3 points
30~34 dmg = 9 points
(always checking that all dmg done to base sums up 100 dmg and 30 ELO points)
60 points would be equally distributed between team members, 12 to each.
The other 30 missing, would be redistributed according to kill-death-assists performance of each team player.

This goes either way for each team, winning teams + points, loosers - points.
So for example, someone who looses, but does 30 dmg to enemy base, and has a 1.5 kill ratio should loose some many points, and someone with 0 base dmg, and a 0.7 ratio would loose more points.

Urpee · #11 02-01-2011, 07:27 PM

I don't think kill ratio should factor in. Winning should matter and contributions that directly lead to this. In TBD that's damage to base, in ball it's goals. I agree it'd be nice if a 6-5 game in ball had different impact than a 6-0 game, and perhaps game time can be factored in as well.

One can lose an incredibly balanced game and have the same impact as a sweep right now.

But rewarding kill ratio just encourages bad behavior. If you win with a 0.7 kill ratio it really shouldn't matter, you won. And if you cannot win with a 2.0 kill ratio, well clearly you had the wrong priorities.

Rainmaker · #12 02-01-2011, 07:58 PM

Agree on that Urpee. I was just brainstorming, on a second tought, kill ratio would lead unnintended bad behaviours.

Stormich · #13 02-01-2011, 08:46 PM

Giving bonuses to bomb runners/scorers would just make them go higher. If you look at ladder you'll see the top players are all bomb runners/scorers.

We tried head to head calculating as the first option (before ladder was even public) and Eso said that it was a pretty flawed system so he designed a new one for us (which still has a couple of flaws that need fixing)

Regarding Trueskill, it seems the best way to actually calculate this stuff but we'd need to start everything from scratch and it would be a big time commitment from anyone that would be willing to do this.

Anyway for the guys willing to help you'll have to wait til nobo chimes in about what exactly he needs help with (I did this without his direct approval lol).

blln4lyf · #14 02-01-2011, 08:58 PM

Putting things in for kills, goals, base hits, etc. are probably not a good idea. Some will focus more of getting these things than what is best for their team to win, which will result in worse overall play.

A Nipple · #15 02-01-2011, 10:29 PM

Quote:

Originally Posted by elxir

i like the current system but the biggest issue is when a god awful player is on one team, and the other team has 5-6 respectable people

perhaps reduce how many points you lose (but not gain) if a player on your team is under 1500?

alternatively pursue my brilliant plan of an 'open' ladder server 1, and an 'elite' 1800+ ladder server 2

win baby win

This idea Reminds me of SC2 I like it. Plus it means to stay in that half of the ladder you have to keep fighting to stay up.

Edit: the filter system intothewow mentioned seemed interesting whereby the plane played got different points (only that part). In general there is only a small handful of players whom always play heavy in ladder and maintain a high rating without resorting the need to bomb run with a light plane. It could be argued people could switch planes in game.Therefore the calculation would need to consider match time and the plane played for the majority of that match, unless there is another way?!

Nip nip

P.s. I get mah compoota back Thursday = yay! =D

Stormich · #16 02-01-2011, 10:36 PM

The 1800+ server is a nice idea, unfortunately we don't have a big enough playerbase for this to work

Urpee · #17 02-01-2011, 10:51 PM

I agree that rewarding individual performances in a team ladder is a really bad idea, at least in my mind that wasn't what I was promoting. My impression was that factoring in goals and bombings was not per player, but per team.

I.e. if a team wins 6-0 it is different than when it wins 6-5.

But I think this is a lesser problem than the scores not migrating in accordance to performance well and there being a bulk of inactive players making the summation over scores questionable.

dr. carbon · #18 02-02-2011, 01:50 AM

How bout we set the base as a zero-sum game and then change it a little bit making it a pseudo-zero sum game the economy by awarding the winners for 6-0/fullbasehealth and detracting less from the losers for a 6-5/1%basehealth on the winner's base.

Simple solution to a complex problem. If you want more info on non zero-sum properties, get a game theory major to help you... maybe Ingbo or donk may know something about it as a lot of Pro Poker Players have game theory or psych majors? idk ):

dr. carbon · #19 02-02-2011, 01:52 AM

Rating caps wont work, b/c if lets say 1500 became the new minimum for a server, that means the player base would consistently fall for that server b/c higher echelon players would be competing. If everyone becomes rich, then everyone becomes poor b/c rich would change into the new poor. same concept.

nobodyhome · #20 02-02-2011, 04:40 AM

Lotsa things going on in this thread. First of all, here is the current rating system, described fully at the bottom of this post: http://altitudegame.com/forums/showthread.php?t=2469 . That formula is all there is to it. It is simply ELO with "your rating" substituted with "the average of your team's rating". The only caveat (an important one, though) is that with the current balancing system, your team's average rating will almost nearly 100% be balanced to be exactly the other team's average rating, thus resulting in the +/- 24 or 25 pattern you see (26 is not as common because currently the system rounds down in case of decimals. The balancing system goes hand in hand with the rating system--consideration of it is important when deciding what to do with the rating system.

Here are a few of the current problems of the ladder rating system:

1. New players are placed in the exact center of the ratings distribution (at 1500), thus making them overrated nearly 100% of the time. This is an inevitable consequences of the zero-sum system: solutions like "oh why don't we just start people at 1000 instead of 1500" will only serve to shift the distribution 500 points downwards, making new players still overrated in relation to everybody else. The solution of course is that zero-sum should no longer be a property of the rating system--this is not a problem when you have a ladder that resets regularly. If a level of inflation were to be introduced into ladder (greater than we have now) then this problem could be solved because the distribution of points would shift upwards over time.

2. When a player's skill drastically changes in some way (maybe trying out a new plane, or maybe hasn't played in a while and is rusty, or fixed his internet connection so he no longer lags, or just plain had an epiphany and got better) this change is not reflected in the ladder as quickly as could be done. This is because the max point gain or loss for each game is constant (at 25 when balanced). This can be fixed by replacing the "50" in the ratings formula to a value "K" that represents the "uncertainty" of that player's rating. Thus, a new player would have a high K, vets would have a low K, and a player who's dropping a lot or winning a lot recently could also have a high K.

About segregated (1800+ and 1800-) servers: This won't work, not only because of our low population, but because in order for a ratings system to work correctly, you need to be able to play against a variety of opponents.

About in-game bonuses to points: This also won't work. A rating system must have no preconception of what the game actually is: it should only care about whether you win or lose. Arbitrarily defining some behaviors to be "good" and some to be "bad" will only cause players to be good at doing those things, not necessarily good at winning in Altitude. Even doing some sort of scale with basehealth and for goals is bad because sometimes, letting the opponent hit your base is good for winning (for example, it may be a good strategy to try to defend 4v5 and let one of your loopies try to sneak past them for a counterhit, whereas if basehealth was counted into your rating then you might be disincentivized to do that). Goals is a little bit different because the game state almost nearly resets after each goal but even then not entirely: the ball is given to the other team when you score a goal, so it might be good to let the other team score because that would mean you get the ball back).

Urpee · #21 02-02-2011, 05:36 AM

A note on the formula. For practical purposes I really only ever see a 25 or a 24 point swing. I'd say that's an insignificant difference as to hardly warrant the complexity of the formula used to determine it.

It certainly is in part due to the success of the balancing algorithm that just virtually always tends to get this close.

But I think the formula could be made more interesting simply by computing E like this:

E = 1 / [1 + 10^ ([(Avg rating of your opponents)-(your rating)] / 400)]

I.e. rather than weighing the team averages one is weighed is compared against the average level of the team. I.e. if a highranked player playes against high ranked competition they have less to lose than low ranked competition.

This will mean that in a game not everybody gets the same swing. Underranked players who compete in very high level games will rise faster and high ranked player who fail to compete in low ranked games will drop faster. A player who competes in a game with average rank of their standing will see the kinds of swings we already know.

A variation of this is a formula that takes the average of both teams:

E = 1 / [1 + 10^ ([(Avg rating of both teams)-(your rating)] / 400)]

(i.e. this proposal suggests making E player dependent, not the K).

Rainmaker · #22 02-02-2011, 07:22 AM

Quote:

Originally Posted by nobodyhome

The only caveat (an important one, though) is that with the current balancing system, your team's average rating will almost nearly 100% be balanced to be exactly the other team's average rating, thus resulting in the +/- 24 or 25 pattern you see (26 is not as common because currently the system rounds down in case of decimals. The balancing system goes hand in hand with the rating system--consideration of it is important when deciding what to do with the rating system.

Quote:

Originally Posted by Urpee

A note on the formula. For practical purposes I really only ever see a 25 or a 24 point swing. I'd say that's an insignificant difference as to hardly warrant the complexity of the formula used to determine it.

It certainly is in part due to the success of the balancing algorithm that just virtually always tends to get this close.

But I think the formula could be made more interesting simply by computing E like this:

E = 1 / [1 + 10^ ([(Avg rating of your opponents)-(your rating)] / 400)]

I.e. rather than weighing the team averages one is weighed is compared against the average level of the team. I.e. if a high ranked player players against high ranked competition they have less to lose than low ranked competition.

Yes, this is what I though was screwing with the rating system.
The ELO relays on measuring the relative difference between you and your opponent.
If both teams always have the same average rating, then the difference is minimum, thus making it less sensitive. This makes the model "perceive" as if everyone had the same hours played, the same skill. Making the points gain the same.

There 2 solutions possible:
1. Make teams random (then, imbalance)
2. Make the relative difference as Usurper said.

IMO the 2. is the answer.
For those who don't know, that K factors affect the sensibility of the system to changes.
A greater difference in ratings, means greater risks, means greater gains, which reflects in more points won/lost.
Usually most modern ELO based systems use a variable K factor (even TrueSkills). A high K factor means a high fluctuation in your points gained/loss; a low K factor means less points won/lost.
Usually in most multilayer games this K factor relays on one of both or either parameters:
*Total games played
*Total hours played
2 variables Altitude keeps record.

For example, TrueSkills (Microsoft's rating system for Xbox), uses a factor C multiplying to hours played, that is because you wanna minimize the points won during a faulty connection game (high pings); low number of players, quits, forfeits, etc; thus ensuring that only played games are the true base for your rating calculation.

I think this is something maybe Ladder Organizers should consider: low K factor ensure the stability of the rating on the long term, a high K factor seems better to adjust the first games, but requires resets to avoid overating/underating people (for example players ensuring their wins, picking matches etc).

My solution:
1. Correct the formula as Usurper suggested: "opponent teams average rating" - "my rating"
2. Make K factor a variable of hours played or/and games played.
This way more experienced players gain/loss rank slightly slower, thus ensuring stop having them at top leader board with a difference of 300 to 500 points (as seen now).*

I would like to make emphasis on point 1, as it my most concern of whats seems screwing with the rating.
K factor could remain constant (30~60 value is ok) as long as resets are made every 6 or 12 months. Also tends to eliminate the inflation/deflation effect.

*: this is another of the reason why the system seemed screwed: higher rated players were considered equal as low rated players, not taking into account one the premises of the ELO system, win percentage based on skill rating; being both teams equally rated, meant that the "point transaction" was always kept at miminum (24 points for the designed system); but making it even hard to climb in the leader board, because you would 8 games won in a row to make a difference of 200 points (the breakthrough difference on which is based).

ie:
A & B team of average 1700. Team A wins. Team B looses.

Player1A rated: 1675
points gained with corrected: 27 points
Player1B rated: 1675
points lost with corrected: -23 points

Player2A rated: 1900
points gained with corrected: 12 points
Playe2B rated: 1900
points lost with corrected: -38 points

Player3A rated: 1700
points gained with corrected: 25 points
Player3B rated: 1700
points lost with corrected: -25 points

Player4A rated: 1725
points gained with corrected: 23 points
Player4B rated: 1725
points lost with corrected: -27 points

NewPlayerA rated: 1500
points gained with corrected: 38 points
NewPlayerB rated: 1500
points lost with corrected: -12 points

(i did the math my self using the formulas provided from the topic by nobo, using what Usurper suggested)

Here you can see what i explained of ELO works: see that the system cares about the relative difference of rating:
a player 200 above the team average looses 38 points
a player 200 below the team average gains 38 points

Now, a player on the average (1700) gains/looses 25 points. This is what was happening with the current system, it "assumed" everyone was equally rated, because the teams were arranged that way. (which is just a collateral damage of trying to make teams balanced)

Now to adjust it, we could change the factor K (here it is 50), so points gain/lost are higher; or smaller. This depends on the ranking.
Initially I would keep at 50. But, I would like to make it vary with games played/hours played. Making it smaller, player looses or gains lesser points each match won/lost.

elxir · #23 02-02-2011, 07:29 AM

the problem with factoring in hours played is that a shocking number of people are only decent at one mode, and god awful at all other modes

nobodyhome · #24 02-02-2011, 07:33 AM

@Urpee: your first formula is the system originally used when ladder first started a year ago. It was used for a week until Eso pointed out that the system was wrong. Here is the original post where Eso layed out his arguments: http://altitudegame.com/forums/showt...ge=2#post34206 . Here is a much more recent thread in which your formula was proposed and discussed: http://altitudegame.com/forums/showthread.php?t=5730

Your second formula is more interesting but it completely misinterprets what the "E" in the formula is: it is a PROBABILITY that is assigned to each team whether they will win or not. Much more strange are the effects of that formula. Since the balancer forces complete balance near 100% of the time, assume that there is a 50/50 chance of winning. Assume also that the average ladder game has a average rating of both teams of ~1900 or so (as is the case currently). Then for a player whose rating is higher than 1900, they will lose >25 points every time they lose, and win <25 points every time they win. Since he wins ~50% of the time (as forced by the balancer), his rating will be converge towards 1900. Those who have ratings lower than 1900 will experience the opposite effect. Thus you end up with a ladder in which everybody has ~1900 rating.

Rainmaker · #25 02-02-2011, 07:51 AM

Quote:

Originally Posted by nobodyhome

Your second formula is more interesting but it completely misinterprets what the "E" in the formula is: it is a PROBABILITY that is assigned to each team whether they will win or not. Much more strange are the effects of that formula. Since the balancer forces complete balance near 100% of the time, assume that there is a 50/50 chance of winning. Assume also that the average ladder game has a average rating of both teams of ~1900 or so (as is the case currently). Then for a player whose rating is higher than 1900, they will lose >25 points every time they lose, and win <25 points every time they win. Since he wins ~50% of the time (as forced by the balancer), his rating will be converge towards 1900. Those who have ratings lower than 1900 will experience the opposite effect. Thus you end up with a ladder in which everybody has ~1900 rating.

Keep in mind you added a variation after Esotheric initial introduction of ELO. You made an auto balance, which screws with ELO.
Now Esotheric's argument that for example the game isn't "zero sum" is false, as auto balance, keeps the average sum of all ratings = 1500 (as each point gained in one player, is lost from another). This is because now teams are not imbalanced.

Quote:

Originally Posted by Esotheric

Example:

Player A 1500 and Player B 2300 play against two 1500s.
The 1500s win!

Player A loses 50*(1/2), Player B loses 50*(100/101)
The 1500s gain 50*(10/11) each.

Total loss= -74.2574257
Total gain= 90.9090909

Doing the math with modifications, but we will have to change the numbers a little:
2300+1500/2 = 1900

Team A:
A.Player 1: rated 2300
A.Player 2: rated 1500

Team B:
B.Player 1: rated 1900
B.Player 2: rated 1900

Team A wins:
A.Player 1: +4.5
A.Player 2: +45.5
B.Player 1: -25
B.Player 2: -25

Team B wins:
A.Player 1: -45.5
A.Player 2: -4.5
B.Player 1: +25
B.Player 2: +25

(see that sum of all is 0; you have to be careful with decimals though!)

Let me break this down for those who aren't following the math:
Players from Team B are close to the avrg rating of the match, so they get the medium points gained which is 25 (24 if numbers are rounded down).
High ranked A.Player gains little because he is "risking" little, he is playing against low rated players (exactly 800 points below each), you have to keep in mind that relations are not linear here, the MORE you risk (the greater the difference between ratings) the more you will loose/gain.
So, a highly rated player, playing against of lowly rated (compared to him) will gain little points, and will lose a lot. This was explained by Esotheric before, the system is designed this way, because IT IS EXPECTED that A.Player.1 wins this match.
But as i stated waaaay before, ELO wasn't designed for team based games, but its trying to be adapted. ELO isn't cosideraing that the 2300 rated player is playing with a 1500 rated player, against 1900 players.
We forced the teams to be evenly rated on average.
So what this ranking is doesnt actually is comparing YOUR skill against your opponents AVERAGE SKILL.
IMO this is the most accurate we can get, without complicating too much.

Furthermore, you are assuming wrong with the 1900 example; you have to think that the system HAS TO BE RESET in order to take into accounts the new effects. (I know you run all previous records with every modification done to ELO). That's why I made emphasis on the K factor, if you ranking system is a long term, K should variable with games played, if not the solution is to reset the ranking often, so there is no inflation/deflation effects. (this well explained in ELO system in Wikipedia).
When you run your old records through the new system you didn't take into account the "auto balance" featured, which ensured that the zero sum was accomplished, that's when the old rating started adding or loosing ELO points.

-------

Quote:

Originally Posted by elxir

the problem with factoring in hours played is that a shocking number of people are only decent at one mode, and god awful at all other modes

Make 2 different rating boards, plus make it only based on games played.

andy · #26 02-02-2011, 11:54 AM

The balancer should give you a 50% win chance, if youre rated higher than all the players in the ladder (e.g youre playing with all 1500-2000 players and your a 2500) you shouldnt lose more points than the others considering the balancer will make sure your winning chance is 50% otherwise you can just never climb the ladder if you get +12 and -38 changes at a 52% winrate.

I will look into some solutions later just thinking this one wouldnt work.

CCN · #27 02-02-2011, 11:59 AM

Something I like is how I can judge how i'm playing based on my rating. ergo 2300 6 months ago is (+200 points now) = now and I can use that to judge where i am at.

If too much inflation it will be hard to figure out long term true skill levels (the small risk of just those who play the most in the short space of time benefiting to much from inflation).

Urpee · #28 02-02-2011, 02:38 PM

Quote:

Originally Posted by nobodyhome

@Urpee: your first formula is the system originally used when ladder first started a year ago. It was used for a week until Eso pointed out that the system was wrong. Here is the original post where Eso layed out his arguments: http://altitudegame.com/forums/showt...ge=2#post34206 . Here is a much more recent thread in which your formula was proposed and discussed: http://altitudegame.com/forums/showthread.php?t=5730

Your second formula is more interesting but it completely misinterprets what the "E" in the formula is: it is a PROBABILITY that is assigned to each team whether they will win or not. Much more strange are the effects of that formula. Since the balancer forces complete balance near 100% of the time, assume that there is a 50/50 chance of winning. Assume also that the average ladder game has a average rating of both teams of ~1900 or so (as is the case currently). Then for a player whose rating is higher than 1900, they will lose >25 points every time they lose, and win <25 points every time they win. Since he wins ~50% of the time (as forced by the balancer), his rating will be converge towards 1900. Those who have ratings lower than 1900 will experience the opposite effect. Thus you end up with a ladder in which everybody has ~1900 rating.

I see. I think there clearly is the issue encoded. Now take the system as is and apply it to something like arena in WoW. As the teams are fixed using the team scores for balancing and weighing it for win% if it's not perfect works sensibly well.

However on Ladder we have a random mix of people. One may have lots of great players who just don't have compatible play styles etc. It's not at all clear that the average score of a team compared to the average score of another team is a good measure of the team's likely performance.

That said, looking at the actual win/loose percentages for people with more than 200 games, it's not bad. People who play very long do not manage to have a win% exceedingly far from 50% which is a good sign.

Still I don't think it's actually correct to assume that the average gives a prediction of the winning probability with the same stability as set teams.

Note that the second formula actually too has an interpretation of probability. It says, what the likelihood is that you as individual out (or under) performs the average player current competing.

How about this:

w1 = weight of team (perhaps 0.7)
w2 = weight of individual (perhaps 0.3)
w1+w2 is required to be 1.

E = w1*E1+w2*E2
E1 = E = 1 / [1 + 10^ ([(Avg rating of your opponents)-(Avg rating of your team)] / 400)]
E2 = 1 / [1 + 10^ ([(Avg rating of both teams)-(your rating)] / 400)]

Second adjustments thinkable is increasing K to increase the impact of E, or to adjust 400 to be smaller, again to increase the impact. Is there any particular justification why 400 and not a smaller number?

To justify the formula, the idea is that if a 2800 player plays in a game of average 1500 clearly that player should have an impact (more easily outplay defenders etc etc) and that should be reflected in the scoring. If a player cannot meet the expectations of having that impact that should show. Conversely if a player with a score of 1000 competes in a game of 1900 and wins clearly the player was less of a drag than expected and should rise faster.

blln4lyf · #29 02-02-2011, 02:45 PM

Quote:

Originally Posted by IntoTheWalls

Team A:
A.Player 1: rated 2300
A.Player 2: rated 1500

Team B:
B.Player 1: rated 1900
B.Player 2: rated 1900

Team A wins:
A.Player 1: +4.5
A.Player 2: +45.5
B.Player 1: -25
B.Player 2: -25

Team B wins:
A.Player 1: -45.5
A.Player 2: -4.5
B.Player 1: +25
B.Player 2: +25

It is a team game though, not 1v1. Just because a player is rated 2300 compared to the average 1900 does not mean they should lose more points..mainly because it is not a 1v1, and they have 5 other players affecting their team play. Likewise, a player rated 1500 or lower can be placed on the highest players team due to balance, and if the higher player does carry the team and the lower player wins without doing much, he will get a huge point increase. Factor this over a large sum of games and what you will get is player A(2300) and player B(1500) will both be a lot closer in rank than they should be. While player A will be pushed below his real value and player B pushed above his, anytime when such players are winning/losing near 50% of there games they will be pushed well below/above where they should be ranked.

You said that player 1 expects to win, but this isn't true. If player A-1(2300 rating) had 5 others like him facing a team with 6 players rated much lower, then yes they would expect to win, but it does not work that way in regards to a team with 2300, 1500 vs. a team with 1900, 1900, since player 1 only expects to win about 50% of the time in this set up, if the players are near their actual value.

I also don't understand why you would compare the 2300 rating player to the average rating of the other team, because like I stated above, your team minus you(the 2300 rating player) is going to be rated below the average rating of the other team. No matter how I chose to look at the way you set it up, I see an unsuccessful system. :/

Quote:

Originally Posted by andy

The balancer should give you a 50% win chance, if youre rated higher than all the players in the ladder (e.g youre playing with all 1500-2000 players and your a 2500) you shouldnt lose more points than the others considering the balancer will make sure your winning chance is 50% otherwise you can just never climb the ladder if you get +12 and -38 changes at a 52% winrate.

I will look into some solutions later just thinking this one wouldnt work.

Didn't see this, but yeah this sums up what I was saying.

mlopes · #30 02-02-2011, 03:02 PM

Just 2 cents. Rewarding goals and bomb hits, while ignoring kills is a bad idead. In TBD a bomb run is a result of a team push that would only reward the runner and in ball defenders and map people who help pushing would also get very low ranks no matter how much they're helping the team.

Urpee · #31 02-02-2011, 03:19 PM

Quote:

Originally Posted by andy

The balancer should give you a 50% win chance, if youre rated higher than all the players in the ladder (e.g youre playing with all 1500-2000 players and your a 2500) you shouldnt lose more points than the others considering the balancer will make sure your winning chance is 50% otherwise you can just never climb the ladder if you get +12 and -38 changes at a 52% winrate.

I will look into some solutions later just thinking this one wouldnt work.

Let me actually adress this.

Now psychologically it is nice if one can just, in an absolute sense climb the score. But frankly I don't think that is even the intent of the ladder right now. The intent is that the scores balance out at the level that a player has.

But I think you put your finger at an important point. Right now the score is in a very strict and specific way linked to win% and in fact overall long-term win%.

Given that for practical purposes the score change is 25, one can with good accuracy compute the ladder score from overall win% and #gamesplayed.

I think that describes some of the flaws of the system.

This works for people who are veterans and are close to their skill ceiling. But it does not work for people who go through changes in win ratio thanks to learning.

Let me take my case as example. I started of not that great and tanked down into the 40s%. Since then I have maintained a win% above 50%. But just now I have reached an overall win% of 50.

Note that I have done this competing in games that quite regularly have an average team score above 1800, and me having a win % of 50, I'm sitting at 1498. Clearly there is something broken here in that a player who competes well against tough competition does not rise sufficiently. And players who put in a good start and them perform at 50% will only very very slowly converge to the median of the ladder (as slowly as the win% itself converges).

It's quite clear to me that the score has to react stronger than swings in win% to actually converge people to their true score in a sensible time.

And yes that will mean that being high on the ladder will be more difficult to maintain and to grow. But that's the pesky psychology of this. Surely people who are high on the ladder do not want the system to make it harder for them. That isn't really the point though. The point is that one gets a sensible convergence of scores to the true means, because the balancing itself is reliant on the scores being close to true means.

Currently every time I am added into a team score I'd claim that that team score does not reflect the true mean because I sure as heck am not scored correctly. But the whole system is built on the premise that the team mean is a good way to estimate game outcomes. Well given that player scores so slowly converge clearly in a lot of games that is just not a correct assumption.

P.S. TL;DR: In other words if the swings more closely reflect recent performance rather than overall win% we get better convergance and more accurate scores to balance teams with.

andy · #32 02-02-2011, 03:31 PM

The system is built to give you a 50% winrate when you reach your rating. This is a team game if youre ranked 3000 and the other players are ranked 1000 or 2000 you will get more 1000 players than 2000 players while the other team will have more 2000 players, by no means you will be able to win easier if you play against lower ranked players, the teams will always be balanced. You will start winning more if you improve.

Quoted from Esoteric:

Quote:

ELO is a system that works based on assigning each player/team "odds" of winning, and then rewarding based on that. For instance if you have two players, player a with 1900 and player b with 1500, player a should beat player b 10 times for each time player b beats player a -- A wins 10/11 times.

However, we're dealing with teams rather than a 1v1 so we need to estimate the odds that team a beats team b. You, however, compare a player's ELO against the enemy team's ELO. You are, essentially saying that each person on my team has different odds of beating the enemy team. Obviously, this is somewhat nonsensical. I can't have a 2/3 chance of winning while my teammate has a 1/2 chance--we have an equal chance of winning as we're on the same team.

Urpee · #33 02-02-2011, 03:41 PM

Isn't it obvious that this is predicated on the assumption that current scores are a decent reflection of ranking and that the scores converge sensibly fast to their true score?

I have had a winning ratio for quite some time now. My score is still way low. I'm at the zero sum mean, but still below the median score and clearly below the average game mean score.

The problem is that convergence is too poor. Hence lots of games are balanced on a false assumption of scores rather than actual balance. Some people may be close to their score but many may be far away. It seems luck not balancing if it depends if you get someone who is scored well versus someone who is misrated and waits for a long chain of games to converge.

If it takes hundreds of games to reach ones true ranking clearly it's broken if for every game the assumption is that the score is a sensibly good reflection of the team.

Clearly if one is underrated ones should rise fast. If one is overrated one should drop fast. And if one is rated correctly one should oscillate with little change about ones rating.

This is not what we are seeing.

blln4lyf · #34 02-02-2011, 03:42 PM

Urpee: why are you not rated correctly? You are winning around 50% of your games at a 1500 rating, meaning you normally get at least one very high rated player on your team. If your team was made with you having say a 2000 rating, who says you would win anywhere near 50%?

Edit: no ones rating will be perfect, but I think it is safe to assume most who have 150 games played are real close. Example: You have over 450 games played, and you have won 53 of your last 100 games. So 100 games ago you were rated at 1425 and managed to climb to 1500(rounded) after 100 games. Seems pretty accurate to me.

Urpee · #35 02-02-2011, 03:50 PM

Cut the first say 50 games from my track record and reevaluate. My score will be different. In fact if I started ladder fresh now and we take my last 50 games as what my performance would be my score would be different.

Which of these scenarios reflect my performance? Clearly I'm misscored at 1500 given how I played.

As said this is fine if there is no learning. But do you know how long I had to put up a winning ratio to get up from 40s to 50%? Convergence is way slow.

If you truly believe that this is working I cannot really say much.

Take someone who early one gets a bunch of wins and then maintains a 50% and another player looses the same and then maintains 50%, both will converge very slowly. In fact look at even great players. Having a long losing or winning streak is to be had by many. It's just luck if you have them early or late and that early luck will be hard to mitigate.

I don't think we disagree about the mechanism. I'm just saying that very obviously the system does not help convergance, but it creeps at the change of overall win%. People who play lots of games at some win percentage above 50% are very stable. Because one game will not change this drastically. Same for people who have a sensible number of games at 40%. Danielle, even if she played a massive winning streak, will never dig herself out of that hole. There is not enough pull to converge. Percentage swings scale with number of games played. And given that now it's directly linked, at a certain point the system is locked in and it becomes very hard to change ones score (independent of where that score actually should be!).

blln4lyf · #36 02-02-2011, 04:00 PM

I edited above to include information. Quite frankly, I don't think you are underrated at the moment. You have to realize your teams get better when you are rated lower, and that helps you move up if you in fact improve. Winning just above 50% means you are damn close to your true value at the moment.

This is the info I added to my above post, I think before you saw it and posted a response: No ones rating will be perfect, but I think it is safe to assume most who have 150 games played are real close. Example: You have over 450 games played, and you have won 53 of your last 100 games. So 100 games ago you were rated at 1425 and managed to climb to 1500(rounded) after 100 games. Seems pretty accurate to me.

Back when ladder started I played TA and didn't play as a team player much, I just TA'ed all over the place and would score a good amount but not provide any defense, any killing, and anything really besides ball movement/offense. At the time my ranking tanked down to the bottom 1/2 pages and I was rated I think 1150 or so, which was real low for the time being. When I did improve(not so much my skill, but my play style) I shot up to 1700ish fairly quickly because at 1150 I was very unrated once I got better, and if my true value was 1700 I was playing under 500 points below my true value, which lead to around a 60% winning rate, likely higher tbh. I may have the highest ball rating currently, but I have been where you are/were, and if you were truly underrated by that much, you would have jumped a lot more in your last 100ish games.

Quote:

Originally Posted by Urpee

Cut the first say 50 games from my track record and reevaluate. My score will be different. In fact if I started ladder fresh now and we take my last 50 games as what my performance would be my score would be different.

Which of these scenarios reflect my performance? Clearly I'm misscored at 1500 given how I played.

If you cued the first 50 games then you would have been playing at a different rating while you made your climb. Therefore, such climb may not of existed. The higher your rating, the worse rated players your team will have, and vice versa. Who says you would win 50% of your games if....you removed a top 5 player from your teams and gave you a player ranked 50 instead, and the player ranked 50 on the other team was replaced with a top 5 player. This is an extreme example, but it shows that winning 50% at a 1500 rating is different than winning 50% at a 1000 rating, which is also different thank winning 50% at a 2000, 2500, or 3000 rating.

Urpee · #37 02-02-2011, 04:20 PM

Quote:

Originally Posted by blln4lyf

IWinning just above 50% means you are damn close to your true value at the moment.

Not at all. This is the myth that seems to be encoded in this discussion.

Let me give you a toy ladder. 12 people compete and we seed them randomly with these scores: 4 have 3000, 4 have 1500, 4 have 0. But they actually are all equally good.

We play and in fact it turns out that everybody wins 50%. The system will allow this and given people's actual skill they maintain their 50% ratio.

This is our current system. Is this working? You truly want to claim that people's scores are well reflected and converge properly?

It's a myth that someone playing at an overall 50% win ratio is properly ranked. The system encourages locking them into place whereever they are and no matter their actual correct ranking.

If the system worked, we would get a convergence of everybody to 1500 and this convergence would be sensibly fast. Currently there is no such mechanism. Because the player who is scored 0 competing in an average 1500 game gets no benefit over a player who is scored 3000 in a 1500 game. That is the convergence mechanism that would be needed to fix this example, but it's nowhere to be found.

And there is this myth that just because I have a 1500 now and a 50% win ratio that it's swell. In fact it may just be that me being misranked gets matched against another player who is misranked and we end up at 50% win and don't move. Only if the system converged properly would it be fair to assume that that other player I'm balanced against is actually about at the right spot. But it's blatantly obvious this isn't the case. You will find people above 1500 who aren't all that good. And you will find people below 1500 who are quite good. The reason for this is simple: The evaluation has gone awry and once you are at a wrong spot the system has insufficient correctives.

blln4lyf · #38 02-02-2011, 05:39 PM

K I've been at both ends and I disagree. Surely not everyone's rating is on point but its a lot closer than you give it credit for. The reason for the change is so that when people come into ladder they don't get grossly overrated by being 1500 to start, causing an imbalance in teams(due to the zero-sum). There is absolutely no reason to penalize a player besides giving them lower rated teammates, because a higher rated player is not supposed to win well over 50% of there games if their team is set up to be balanced with the other team. Once again it is not 1v1. I'm giving up after this post, but just because someone is rated higher does not mean they should lose 42.5 points for a game that is supposed to be balanced team wise, while someone on the same team, with the same balance(team wise) should only lose 5. It would work that way if it was 1v1, but it is not 1v1. Maybe I am wrong here, but I just don't see it at all. Your idea that everyone is rated incorrectly and that is why your still not moving up that fast to me is just you being unwilling to accept that you just are only playing at a level around 1500 rating at the moment.

andy · #39 02-02-2011, 05:50 PM

Im not gonna bother to discuss this with you when you have no understanding of how the ladder works. You are perfectly rated if you are winning 50% of your last games, you havent improved, im sorry for you. Let me give you a brief explanation: if you were underrated the ladder would balance the teams thinking that they both have 50% chance of winning but then your in one and are underrated so your team should probably have a 60% chance of winning (totally arbitrary number, it depends on how underrated you are).

We saw how fast you can climb the ladder when people used smurfs (check bllns smurf its still out there).

In the end your win % should be around 50% +/- 1% if you play enough games what gives you the rating is the difference between wins and losses.

Pieface · #40 02-02-2011, 06:14 PM

To play the devil's advocate here, I believe what Urpee was trying to get across is that the situation of being underrated only benefits you if you assume the other team is correctly rated to start with. If you have someone equally underrated (or overrated) on the other team, you will still win about 50% of the time even though compared to the total ladder population's skill you should be rated higher. In essence, the prevalence of misrated players in ladder prevents you from following your predicted ranking trajectory: winning if you're underrated and losing if you're overrated.

I've also experienced these huge win/loss streaks that dramatically change your rating, but I wouldn't necessarily attribute them to ladder working the way it was designed to. If they do follow from that, it's clear that you need a certain set of conditions to achieve a large change in ranking. These situations only come every so often, which is why it takes so long to achieve your true rating. In turn, the fact that you haven't yet achieved your predicted rank prevents others playing with you from following their projected behavior as well. With the current system it's a cycle that's only broken when you get games where everyone except yourself is perfectly rated.