PDA

View Full Version : Math/Statistics major needed


Stormich
02-01-2011, 09:39 AM
Well basically name says it all. We are in need of a Math/Statistics major that would be willing to help us optimize the ELO calculations ladder uses to determine ratings. This is the last (and probably trickiest) part to be done before Ladder season 2 starts. Eso was doing this for us but he unfortunately vanished so now we need someone else to help us finish things up so we can finally start the new season.
Anyone with knowledge in team based rating system is free to apply. Unfortunately this is pro bono work and the only pay would be the recognition of doing so.
For contact you can message me or apply in this thread.

Thank you for your time :)

I hope we get someone so we can start pewpewing season 2 already TT

Ribilla
02-01-2011, 05:25 PM
I do Engineering at Cambridge, but most the maths on our course centres around complex numbers and vector calculus, so I'm not familiar with "team based rating systems". If you need some number crunching once the basic idea is in place though, I'd be happy to help.

Wok3N^
02-01-2011, 05:40 PM
I just submitted a request to League of Legends devs to see what ELO system they use. They also have a "Ladder" which is 5v5 and the ELO system seems to work.

mikesol
02-01-2011, 06:10 PM
I just submitted a request to League of Legends devs to see what ELO system they use. They also have a "Ladder" which is 5v5 and the ELO system seems to work.

I was reading up on the LoL elo system and apparently there are a lot of people who think it's flawed and needs to be re-done. World of Warcraft started off with elo like chess but apparently that doesn't extend well into a multi-person team. Microsoft has developed a "TrueSkill" rating system that seems to be pretty good. You can read about it here (http://research.microsoft.com/en-us/projects/trueskill/details.aspx). I think I understand it fairly well.

For those who read about it - any thoughts?

elxir
02-01-2011, 06:23 PM
i like the current system but the biggest issue is when a god awful player is on one team, and the other team has 5-6 respectable people

perhaps reduce how many points you lose (but not gain) if a player on your team is under 1500?

alternatively pursue my brilliant plan of an 'open' ladder server 1, and an 'elite' 1800+ ladder server 2

win baby win

Rainmaker
02-01-2011, 06:34 PM
Well basically name says it all. We are in need of a Math/Statistics major that would be willing to help us optimize the ELO calculations ladder uses to determine ratings. This is the last (and probably trickiest) part to be done before Ladder season 2 starts. Eso was doing this for us but he unfortunately vanished so now we need someone else to help us finish things up so we can finally start the new season.
Anyone with knowledge in team based rating system is free to apply. Unfortunately this is pro bono work and the only pay would be the recognition of doing so.
For contact you can message me or apply in this thread.

Thank you for your time :)

I hope we get someone so we can start pewpewing season 2 already TT

I'm no major, but I'm studying industrial engineering @ University of Buenos Aires, and I've done the subjects "Statistics & Probability" and "Technical Statistics".
On a personal point of view I don't think that ELO works for a multilayer team-based ranking system, because ELO was/is designed for 1v1 match ups.
Cutting it down to words: the ranking represents only the performance on your team, and not you. Therefore, your ranking will change, and you will be paying the achievements or failures of your teams.
There could be addapt ELO as TrueySkill from MS does. It adds a second variable, which takes into account the individual contributions of the player to the team.
But I don't think this really would work. Why?
Well, first we have to understand that ELO system is about measuring the RELATIVE skill between 2 players, hence the sum of all player's ranking divided number of players = constant. On other ranking system the idea of "escaling up" (like MMORPG) is an ABSOLUTE skill system (where usually there is a top barrier, like "lvl 60" or 20M exp).

Explaining the math between ELO:
ELO works pretty simple, it takes your current score (you always start at the middle of table 1600 or 1500 for altitude I think) and it add scores if you win or removes if you loose.
The "add/remove" scores isn't always the same. This is based on the prediction of the result. ELO considers that you are less likely to loose a game if your rank is higher than your opponents, or less likely to win if your rank is lower than your opponents.
ie: A : rank 2500 ; B : rank 2000.

ELO has a prediction assuming a Normal distribution of the ranking, which variables are mu and sigma.
mu = your average ranking prediction
sigma = variation.
So, for example ELO assumes that B is little likely to win this much, because of the difference between A-B ranking. That way ELO rewards highly B if he wins (Added more ranking points) and gives a high penalty to A (removing the same amount ranking points) for loosing against a low ranking player.

As you can see this ranking system is pretty straight forward, and dependent on the fact that the only variable is the 2 players (A and B) and therefore doing a prediction on this subjects skill.
If we try to extrapolate this system to team based games, it won't work as well. You will start seeing some incongruities on the ranking on some players.
And most team based games have this problem: you can't measure a single players skill on a team-based game.
why? because the game result measures the skill of the teams, not the player. In the long run it doesn't matter in a TBD game if you killed 50 or 30 planes, if your team didn't score the 6 goals or didn't kill the enemy base. It is an assumption that killing more planes will make more likely for you to score, which may not always be true.
This is way adding variables to ELO based on team-based games isn't going to work.


But, for a non.professional purpose, and only for the heck of it, i would like to give it a try :/

elxir
02-01-2011, 06:50 PM
well, it wouldn't be a bad idea to account for base dmg / kills / goals / assists in the ranking, but ****ty team players who just try to kill a lot (especially in ball) who don't help you win would benefit without actually being good.

elxir
02-01-2011, 06:51 PM
ETA: what if you ranked each player on each team head to head with the person most similar in rank on the other team, roughly like what fake intothewow suggested?

Urpee
02-01-2011, 07:08 PM
ETA: what if you ranked each player on each team head to head with the person most similar in rank on the other team, roughly like what fake intothewow suggested?

I think this is a good point and I thought the same when I saw some team compositions.

In fact I would recommend the following picking strategy:

1) Pick a team randomly to start, label it A, the other B.
2) Pick top ranked and bottom ranked player for team A of remaining pool
3) Pick top ranked and bottom ranked player for team B of remaining pool
4) Repeat 2) 3) until all slots are filled.

This matches best to best and worst to worst and coverges to the center in a balanced way. The previous pick is guaranteed to outperform the next. The starting team can have an advantage but it is randomized, so on average everybody has the same advantage.

But the second problem with ladder is that the ladder scores simply are not reflective of player ability so clearly the way the scores are arrived at or maintained or factored in needs some tweaking.

Zero-sum is nice because one can make claims about the stability of the ranking. But then again almost all adders I know are reset after some time, so very long term stability really isn't an issue. It's more the question if the scoring system has the right dynamics (i.e. good players with a low ranking will bubble up and bad players with a high ranking will plummet, rankings that are sensible have good stability properties).

TrueSkill looks interesting. Testing against expectation seems a very clean setup and makes a lot of sense.

Rainmaker
02-01-2011, 07:09 PM
ETA: what if you ranked each player on each team head to head with the person most similar in rank on the other team, roughly like what fake intothewow suggested?
I didn't meant that, though I thought of that possibility too, but it wouldn't be fair.
There is another thing I hadn't though off, but heard someone say it:
How does the current scramble system work?
I heard that trys that the sum of all the players ranking on one side match the one of the other on the other.
So for example there is an average of 1700 ranking on team red and an average ranking of 1700 on team blue as well?

My personal oppinion would is that we should keep as it is currently, but making variable the ammount of point each player gets.
For example, looking at the current numbers every players gets +/- 24 points for each match played.
Instead (TBD), in a winning team there are 24*5 = 120 points.
We would redistribute this in a different way (is like having a ranking filter inside ELO).
For example, TBD, 30 out of the 120 are destinated to bombers: that way ~33 dmg done to base equals 10 ELO points. 5 dmg equals 1~2 points, 10 dmg 3 points. This could be simplified, instead of a linear, we could make it by blocks.
0~5 dmg = 1 points.
6~15 dmg = 3 points
30~34 dmg = 9 points
(always checking that all dmg done to base sums up 100 dmg and 30 ELO points)
60 points would be equally distributed between team members, 12 to each.
The other 30 missing, would be redistributed according to kill-death-assists performance of each team player.

This goes either way for each team, winning teams + points, loosers - points.
So for example, someone who looses, but does 30 dmg to enemy base, and has a 1.5 kill ratio should loose some many points, and someone with 0 base dmg, and a 0.7 ratio would loose more points.

Urpee
02-01-2011, 07:27 PM
I don't think kill ratio should factor in. Winning should matter and contributions that directly lead to this. In TBD that's damage to base, in ball it's goals. I agree it'd be nice if a 6-5 game in ball had different impact than a 6-0 game, and perhaps game time can be factored in as well.

One can lose an incredibly balanced game and have the same impact as a sweep right now.

But rewarding kill ratio just encourages bad behavior. If you win with a 0.7 kill ratio it really shouldn't matter, you won. And if you cannot win with a 2.0 kill ratio, well clearly you had the wrong priorities.

Rainmaker
02-01-2011, 07:58 PM
Agree on that Urpee. I was just brainstorming, on a second tought, kill ratio would lead unnintended bad behaviours.

Stormich
02-01-2011, 08:46 PM
Giving bonuses to bomb runners/scorers would just make them go higher. If you look at ladder you'll see the top players are all bomb runners/scorers.

We tried head to head calculating as the first option (before ladder was even public) and Eso said that it was a pretty flawed system so he designed a new one for us (which still has a couple of flaws that need fixing)

Regarding Trueskill, it seems the best way to actually calculate this stuff but we'd need to start everything from scratch and it would be a big time commitment from anyone that would be willing to do this.

Anyway for the guys willing to help you'll have to wait til nobo chimes in about what exactly he needs help with (I did this without his direct approval lol).

blln4lyf
02-01-2011, 08:58 PM
Putting things in for kills, goals, base hits, etc. are probably not a good idea. Some will focus more of getting these things than what is best for their team to win, which will result in worse overall play.

A Nipple
02-01-2011, 10:29 PM
i like the current system but the biggest issue is when a god awful player is on one team, and the other team has 5-6 respectable people

perhaps reduce how many points you lose (but not gain) if a player on your team is under 1500?

alternatively pursue my brilliant plan of an 'open' ladder server 1, and an 'elite' 1800+ ladder server 2

win baby win

This idea Reminds me of SC2 I like it. Plus it means to stay in that half of the ladder you have to keep fighting to stay up.

Edit: the filter system intothewow mentioned seemed interesting whereby the plane played got different points (only that part). In general there is only a small handful of players whom always play heavy in ladder and maintain a high rating without resorting the need to bomb run with a light plane. It could be argued people could switch planes in game.Therefore the calculation would need to consider match time and the plane played for the majority of that match, unless there is another way?!

Nip nip

P.s. I get mah compoota back Thursday = yay! =D

Stormich
02-01-2011, 10:36 PM
The 1800+ server is a nice idea, unfortunately we don't have a big enough playerbase for this to work

Urpee
02-01-2011, 10:51 PM
I agree that rewarding individual performances in a team ladder is a really bad idea, at least in my mind that wasn't what I was promoting. My impression was that factoring in goals and bombings was not per player, but per team.

I.e. if a team wins 6-0 it is different than when it wins 6-5.

But I think this is a lesser problem than the scores not migrating in accordance to performance well and there being a bulk of inactive players making the summation over scores questionable.

dr. carbon
02-02-2011, 01:50 AM
How bout we set the base as a zero-sum game and then change it a little bit making it a pseudo-zero sum game the economy by awarding the winners for 6-0/fullbasehealth and detracting less from the losers for a 6-5/1%basehealth on the winner's base.

Simple solution to a complex problem. If you want more info on non zero-sum properties, get a game theory major to help you... maybe Ingbo or donk may know something about it as a lot of Pro Poker Players have game theory or psych majors? idk ):

dr. carbon
02-02-2011, 01:52 AM
Rating caps wont work, b/c if lets say 1500 became the new minimum for a server, that means the player base would consistently fall for that server b/c higher echelon players would be competing. If everyone becomes rich, then everyone becomes poor b/c rich would change into the new poor. same concept.

nobodyhome
02-02-2011, 04:40 AM
Lotsa things going on in this thread. First of all, here is the current rating system, described fully at the bottom of this post: http://altitudegame.com/forums/showthread.php?t=2469 . That formula is all there is to it. It is simply ELO with "your rating" substituted with "the average of your team's rating". The only caveat (an important one, though) is that with the current balancing system, your team's average rating will almost nearly 100% be balanced to be exactly the other team's average rating, thus resulting in the +/- 24 or 25 pattern you see (26 is not as common because currently the system rounds down in case of decimals. The balancing system goes hand in hand with the rating system--consideration of it is important when deciding what to do with the rating system.

Here are a few of the current problems of the ladder rating system:

1. New players are placed in the exact center of the ratings distribution (at 1500), thus making them overrated nearly 100% of the time. This is an inevitable consequences of the zero-sum system: solutions like "oh why don't we just start people at 1000 instead of 1500" will only serve to shift the distribution 500 points downwards, making new players still overrated in relation to everybody else. The solution of course is that zero-sum should no longer be a property of the rating system--this is not a problem when you have a ladder that resets regularly. If a level of inflation were to be introduced into ladder (greater than we have now) then this problem could be solved because the distribution of points would shift upwards over time.

2. When a player's skill drastically changes in some way (maybe trying out a new plane, or maybe hasn't played in a while and is rusty, or fixed his internet connection so he no longer lags, or just plain had an epiphany and got better) this change is not reflected in the ladder as quickly as could be done. This is because the max point gain or loss for each game is constant (at 25 when balanced). This can be fixed by replacing the "50" in the ratings formula to a value "K" that represents the "uncertainty" of that player's rating. Thus, a new player would have a high K, vets would have a low K, and a player who's dropping a lot or winning a lot recently could also have a high K.


About segregated (1800+ and 1800-) servers: This won't work, not only because of our low population, but because in order for a ratings system to work correctly, you need to be able to play against a variety of opponents.

About in-game bonuses to points: This also won't work. A rating system must have no preconception of what the game actually is: it should only care about whether you win or lose. Arbitrarily defining some behaviors to be "good" and some to be "bad" will only cause players to be good at doing those things, not necessarily good at winning in Altitude. Even doing some sort of scale with basehealth and for goals is bad because sometimes, letting the opponent hit your base is good for winning (for example, it may be a good strategy to try to defend 4v5 and let one of your loopies try to sneak past them for a counterhit, whereas if basehealth was counted into your rating then you might be disincentivized to do that). Goals is a little bit different because the game state almost nearly resets after each goal but even then not entirely: the ball is given to the other team when you score a goal, so it might be good to let the other team score because that would mean you get the ball back).

Urpee
02-02-2011, 05:36 AM
A note on the formula. For practical purposes I really only ever see a 25 or a 24 point swing. I'd say that's an insignificant difference as to hardly warrant the complexity of the formula used to determine it.

It certainly is in part due to the success of the balancing algorithm that just virtually always tends to get this close.

But I think the formula could be made more interesting simply by computing E like this:

E = 1 / [1 + 10^ ([(Avg rating of your opponents)-(your rating)] / 400)]

I.e. rather than weighing the team averages one is weighed is compared against the average level of the team. I.e. if a highranked player playes against high ranked competition they have less to lose than low ranked competition.

This will mean that in a game not everybody gets the same swing. Underranked players who compete in very high level games will rise faster and high ranked player who fail to compete in low ranked games will drop faster. A player who competes in a game with average rank of their standing will see the kinds of swings we already know.

A variation of this is a formula that takes the average of both teams:

E = 1 / [1 + 10^ ([(Avg rating of both teams)-(your rating)] / 400)]

(i.e. this proposal suggests making E player dependent, not the K).

Rainmaker
02-02-2011, 07:22 AM
The only caveat (an important one, though) is that with the current balancing system, your team's average rating will almost nearly 100% be balanced to be exactly the other team's average rating, thus resulting in the +/- 24 or 25 pattern you see (26 is not as common because currently the system rounds down in case of decimals. The balancing system goes hand in hand with the rating system--consideration of it is important when deciding what to do with the rating system.

A note on the formula. For practical purposes I really only ever see a 25 or a 24 point swing. I'd say that's an insignificant difference as to hardly warrant the complexity of the formula used to determine it.

It certainly is in part due to the success of the balancing algorithm that just virtually always tends to get this close.

But I think the formula could be made more interesting simply by computing E like this:

E = 1 / [1 + 10^ ([(Avg rating of your opponents)-(your rating)] / 400)]

I.e. rather than weighing the team averages one is weighed is compared against the average level of the team. I.e. if a high ranked player players against high ranked competition they have less to lose than low ranked competition.

Yes, this is what I though was screwing with the rating system.
The ELO relays on measuring the relative difference between you and your opponent.
If both teams always have the same average rating, then the difference is minimum, thus making it less sensitive. This makes the model "perceive" as if everyone had the same hours played, the same skill. Making the points gain the same.

There 2 solutions possible:
1. Make teams random (then, imbalance)
2. Make the relative difference as Usurper said.

IMO the 2. is the answer.
For those who don't know, that K factors affect the sensibility of the system to changes.
A greater difference in ratings, means greater risks, means greater gains, which reflects in more points won/lost.
Usually most modern ELO based systems use a variable K factor (even TrueSkills). A high K factor means a high fluctuation in your points gained/loss; a low K factor means less points won/lost.
Usually in most multilayer games this K factor relays on one of both or either parameters:
*Total games played
*Total hours played
2 variables Altitude keeps record.

For example, TrueSkills (Microsoft's rating system for Xbox), uses a factor C multiplying to hours played, that is because you wanna minimize the points won during a faulty connection game (high pings); low number of players, quits, forfeits, etc; thus ensuring that only played games are the true base for your rating calculation.


I think this is something maybe Ladder Organizers should consider: low K factor ensure the stability of the rating on the long term, a high K factor seems better to adjust the first games, but requires resets to avoid overating/underating people (for example players ensuring their wins, picking matches etc).



My solution:
1. Correct the formula as Usurper suggested: "opponent teams average rating" - "my rating"
2. Make K factor a variable of hours played or/and games played.
This way more experienced players gain/loss rank slightly slower, thus ensuring stop having them at top leader board with a difference of 300 to 500 points (as seen now).*

I would like to make emphasis on point 1, as it my most concern of whats seems screwing with the rating.
K factor could remain constant (30~60 value is ok) as long as resets are made every 6 or 12 months. Also tends to eliminate the inflation/deflation effect.

*: this is another of the reason why the system seemed screwed: higher rated players were considered equal as low rated players, not taking into account one the premises of the ELO system, win percentage based on skill rating; being both teams equally rated, meant that the "point transaction" was always kept at miminum (24 points for the designed system); but making it even hard to climb in the leader board, because you would 8 games won in a row to make a difference of 200 points (the breakthrough difference on which is based).

ie:
A & B team of average 1700. Team A wins. Team B looses.

Player1A rated: 1675
points gained with corrected: 27 points
Player1B rated: 1675
points lost with corrected: -23 points

Player2A rated: 1900
points gained with corrected: 12 points
Playe2B rated: 1900
points lost with corrected: -38 points

Player3A rated: 1700
points gained with corrected: 25 points
Player3B rated: 1700
points lost with corrected: -25 points

Player4A rated: 1725
points gained with corrected: 23 points
Player4B rated: 1725
points lost with corrected: -27 points

NewPlayerA rated: 1500
points gained with corrected: 38 points
NewPlayerB rated: 1500
points lost with corrected: -12 points

(i did the math my self using the formulas provided from the topic by nobo, using what Usurper suggested)

Here you can see what i explained of ELO works: see that the system cares about the relative difference of rating:
a player 200 above the team average looses 38 points
a player 200 below the team average gains 38 points

Now, a player on the average (1700) gains/looses 25 points. This is what was happening with the current system, it "assumed" everyone was equally rated, because the teams were arranged that way. (which is just a collateral damage of trying to make teams balanced)

Now to adjust it, we could change the factor K (here it is 50), so points gain/lost are higher; or smaller. This depends on the ranking.
Initially I would keep at 50. But, I would like to make it vary with games played/hours played. Making it smaller, player looses or gains lesser points each match won/lost.

elxir
02-02-2011, 07:29 AM
the problem with factoring in hours played is that a shocking number of people are only decent at one mode, and god awful at all other modes

nobodyhome
02-02-2011, 07:33 AM
@Urpee: your first formula is the system originally used when ladder first started a year ago. It was used for a week until Eso pointed out that the system was wrong. Here is the original post where Eso layed out his arguments: http://altitudegame.com/forums/showthread.php?t=2469&page=2#post34206 . Here is a much more recent thread in which your formula was proposed and discussed: http://altitudegame.com/forums/showthread.php?t=5730

Your second formula is more interesting but it completely misinterprets what the "E" in the formula is: it is a PROBABILITY that is assigned to each team whether they will win or not. Much more strange are the effects of that formula. Since the balancer forces complete balance near 100% of the time, assume that there is a 50/50 chance of winning. Assume also that the average ladder game has a average rating of both teams of ~1900 or so (as is the case currently). Then for a player whose rating is higher than 1900, they will lose >25 points every time they lose, and win <25 points every time they win. Since he wins ~50% of the time (as forced by the balancer), his rating will be converge towards 1900. Those who have ratings lower than 1900 will experience the opposite effect. Thus you end up with a ladder in which everybody has ~1900 rating.

Rainmaker
02-02-2011, 07:51 AM
Your second formula is more interesting but it completely misinterprets what the "E" in the formula is: it is a PROBABILITY that is assigned to each team whether they will win or not. Much more strange are the effects of that formula. Since the balancer forces complete balance near 100% of the time, assume that there is a 50/50 chance of winning. Assume also that the average ladder game has a average rating of both teams of ~1900 or so (as is the case currently). Then for a player whose rating is higher than 1900, they will lose >25 points every time they lose, and win <25 points every time they win. Since he wins ~50% of the time (as forced by the balancer), his rating will be converge towards 1900. Those who have ratings lower than 1900 will experience the opposite effect. Thus you end up with a ladder in which everybody has ~1900 rating.

Keep in mind you added a variation after Esotheric initial introduction of ELO. You made an auto balance, which screws with ELO.
Now Esotheric's argument that for example the game isn't "zero sum" is false, as auto balance, keeps the average sum of all ratings = 1500 (as each point gained in one player, is lost from another). This is because now teams are not imbalanced.
Example:

Player A 1500 and Player B 2300 play against two 1500s.
The 1500s win!

Player A loses 50*(1/2), Player B loses 50*(100/101)
The 1500s gain 50*(10/11) each.

Total loss= -74.2574257
Total gain= 90.9090909
Doing the math with modifications, but we will have to change the numbers a little:
2300+1500/2 = 1900

Team A:
A.Player 1: rated 2300
A.Player 2: rated 1500

Team B:
B.Player 1: rated 1900
B.Player 2: rated 1900

Team A wins:
A.Player 1: +4.5
A.Player 2: +45.5
B.Player 1: -25
B.Player 2: -25


Team B wins:
A.Player 1: -45.5
A.Player 2: -4.5
B.Player 1: +25
B.Player 2: +25

(see that sum of all is 0; you have to be careful with decimals though!)

Let me break this down for those who aren't following the math:
Players from Team B are close to the avrg rating of the match, so they get the medium points gained which is 25 (24 if numbers are rounded down).
High ranked A.Player gains little because he is "risking" little, he is playing against low rated players (exactly 800 points below each), you have to keep in mind that relations are not linear here, the MORE you risk (the greater the difference between ratings) the more you will loose/gain.
So, a highly rated player, playing against of lowly rated (compared to him) will gain little points, and will lose a lot. This was explained by Esotheric before, the system is designed this way, because IT IS EXPECTED that A.Player.1 wins this match.
But as i stated waaaay before, ELO wasn't designed for team based games, but its trying to be adapted. ELO isn't cosideraing that the 2300 rated player is playing with a 1500 rated player, against 1900 players.
We forced the teams to be evenly rated on average.
So what this ranking is doesnt actually is comparing YOUR skill against your opponents AVERAGE SKILL.
IMO this is the most accurate we can get, without complicating too much.


Furthermore, you are assuming wrong with the 1900 example; you have to think that the system HAS TO BE RESET in order to take into accounts the new effects. (I know you run all previous records with every modification done to ELO). That's why I made emphasis on the K factor, if you ranking system is a long term, K should variable with games played, if not the solution is to reset the ranking often, so there is no inflation/deflation effects. (this well explained in ELO system in Wikipedia).
When you run your old records through the new system you didn't take into account the "auto balance" featured, which ensured that the zero sum was accomplished, that's when the old rating started adding or loosing ELO points.





-------
the problem with factoring in hours played is that a shocking number of people are only decent at one mode, and god awful at all other modes
Make 2 different rating boards, plus make it only based on games played.

andy
02-02-2011, 11:54 AM
The balancer should give you a 50% win chance, if youre rated higher than all the players in the ladder (e.g youre playing with all 1500-2000 players and your a 2500) you shouldnt lose more points than the others considering the balancer will make sure your winning chance is 50% otherwise you can just never climb the ladder if you get +12 and -38 changes at a 52% winrate.

I will look into some solutions later just thinking this one wouldnt work.

CCN
02-02-2011, 11:59 AM
Something I like is how I can judge how i'm playing based on my rating. ergo 2300 6 months ago is (+200 points now) = now and I can use that to judge where i am at.

If too much inflation it will be hard to figure out long term true skill levels (the small risk of just those who play the most in the short space of time benefiting to much from inflation).

Urpee
02-02-2011, 02:38 PM
@Urpee: your first formula is the system originally used when ladder first started a year ago. It was used for a week until Eso pointed out that the system was wrong. Here is the original post where Eso layed out his arguments: http://altitudegame.com/forums/showthread.php?t=2469&page=2#post34206 . Here is a much more recent thread in which your formula was proposed and discussed: http://altitudegame.com/forums/showthread.php?t=5730

Your second formula is more interesting but it completely misinterprets what the "E" in the formula is: it is a PROBABILITY that is assigned to each team whether they will win or not. Much more strange are the effects of that formula. Since the balancer forces complete balance near 100% of the time, assume that there is a 50/50 chance of winning. Assume also that the average ladder game has a average rating of both teams of ~1900 or so (as is the case currently). Then for a player whose rating is higher than 1900, they will lose >25 points every time they lose, and win <25 points every time they win. Since he wins ~50% of the time (as forced by the balancer), his rating will be converge towards 1900. Those who have ratings lower than 1900 will experience the opposite effect. Thus you end up with a ladder in which everybody has ~1900 rating.

I see. I think there clearly is the issue encoded. Now take the system as is and apply it to something like arena in WoW. As the teams are fixed using the team scores for balancing and weighing it for win% if it's not perfect works sensibly well.

However on Ladder we have a random mix of people. One may have lots of great players who just don't have compatible play styles etc. It's not at all clear that the average score of a team compared to the average score of another team is a good measure of the team's likely performance.

That said, looking at the actual win/loose percentages for people with more than 200 games, it's not bad. People who play very long do not manage to have a win% exceedingly far from 50% which is a good sign.

Still I don't think it's actually correct to assume that the average gives a prediction of the winning probability with the same stability as set teams.

Note that the second formula actually too has an interpretation of probability. It says, what the likelihood is that you as individual out (or under) performs the average player current competing.

How about this:

w1 = weight of team (perhaps 0.7)
w2 = weight of individual (perhaps 0.3)
w1+w2 is required to be 1.


E = w1*E1+w2*E2
E1 = E = 1 / [1 + 10^ ([(Avg rating of your opponents)-(Avg rating of your team)] / 400)]
E2 = 1 / [1 + 10^ ([(Avg rating of both teams)-(your rating)] / 400)]

Second adjustments thinkable is increasing K to increase the impact of E, or to adjust 400 to be smaller, again to increase the impact. Is there any particular justification why 400 and not a smaller number?

To justify the formula, the idea is that if a 2800 player plays in a game of average 1500 clearly that player should have an impact (more easily outplay defenders etc etc) and that should be reflected in the scoring. If a player cannot meet the expectations of having that impact that should show. Conversely if a player with a score of 1000 competes in a game of 1900 and wins clearly the player was less of a drag than expected and should rise faster.

blln4lyf
02-02-2011, 02:45 PM
Team A:
A.Player 1: rated 2300
A.Player 2: rated 1500

Team B:
B.Player 1: rated 1900
B.Player 2: rated 1900

Team A wins:
A.Player 1: +4.5
A.Player 2: +45.5
B.Player 1: -25
B.Player 2: -25


Team B wins:
A.Player 1: -45.5
A.Player 2: -4.5
B.Player 1: +25
B.Player 2: +25

It is a team game though, not 1v1. Just because a player is rated 2300 compared to the average 1900 does not mean they should lose more points..mainly because it is not a 1v1, and they have 5 other players affecting their team play. Likewise, a player rated 1500 or lower can be placed on the highest players team due to balance, and if the higher player does carry the team and the lower player wins without doing much, he will get a huge point increase. Factor this over a large sum of games and what you will get is player A(2300) and player B(1500) will both be a lot closer in rank than they should be. While player A will be pushed below his real value and player B pushed above his, anytime when such players are winning/losing near 50% of there games they will be pushed well below/above where they should be ranked.

You said that player 1 expects to win, but this isn't true. If player A-1(2300 rating) had 5 others like him facing a team with 6 players rated much lower, then yes they would expect to win, but it does not work that way in regards to a team with 2300, 1500 vs. a team with 1900, 1900, since player 1 only expects to win about 50% of the time in this set up, if the players are near their actual value.

I also don't understand why you would compare the 2300 rating player to the average rating of the other team, because like I stated above, your team minus you(the 2300 rating player) is going to be rated below the average rating of the other team. No matter how I chose to look at the way you set it up, I see an unsuccessful system. :/

The balancer should give you a 50% win chance, if youre rated higher than all the players in the ladder (e.g youre playing with all 1500-2000 players and your a 2500) you shouldnt lose more points than the others considering the balancer will make sure your winning chance is 50% otherwise you can just never climb the ladder if you get +12 and -38 changes at a 52% winrate.

I will look into some solutions later just thinking this one wouldnt work.

Didn't see this, but yeah this sums up what I was saying.

mlopes
02-02-2011, 03:02 PM
Just 2 cents. Rewarding goals and bomb hits, while ignoring kills is a bad idead. In TBD a bomb run is a result of a team push that would only reward the runner and in ball defenders and map people who help pushing would also get very low ranks no matter how much they're helping the team.

Urpee
02-02-2011, 03:19 PM
The balancer should give you a 50% win chance, if youre rated higher than all the players in the ladder (e.g youre playing with all 1500-2000 players and your a 2500) you shouldnt lose more points than the others considering the balancer will make sure your winning chance is 50% otherwise you can just never climb the ladder if you get +12 and -38 changes at a 52% winrate.

I will look into some solutions later just thinking this one wouldnt work.

Let me actually adress this.

Now psychologically it is nice if one can just, in an absolute sense climb the score. But frankly I don't think that is even the intent of the ladder right now. The intent is that the scores balance out at the level that a player has.

But I think you put your finger at an important point. Right now the score is in a very strict and specific way linked to win% and in fact overall long-term win%.

Given that for practical purposes the score change is 25, one can with good accuracy compute the ladder score from overall win% and #gamesplayed.

I think that describes some of the flaws of the system.

This works for people who are veterans and are close to their skill ceiling. But it does not work for people who go through changes in win ratio thanks to learning.

Let me take my case as example. I started of not that great and tanked down into the 40s%. Since then I have maintained a win% above 50%. But just now I have reached an overall win% of 50.

Note that I have done this competing in games that quite regularly have an average team score above 1800, and me having a win % of 50, I'm sitting at 1498. Clearly there is something broken here in that a player who competes well against tough competition does not rise sufficiently. And players who put in a good start and them perform at 50% will only very very slowly converge to the median of the ladder (as slowly as the win% itself converges).

It's quite clear to me that the score has to react stronger than swings in win% to actually converge people to their true score in a sensible time.

And yes that will mean that being high on the ladder will be more difficult to maintain and to grow. But that's the pesky psychology of this. Surely people who are high on the ladder do not want the system to make it harder for them. That isn't really the point though. The point is that one gets a sensible convergence of scores to the true means, because the balancing itself is reliant on the scores being close to true means.

Currently every time I am added into a team score I'd claim that that team score does not reflect the true mean because I sure as heck am not scored correctly. But the whole system is built on the premise that the team mean is a good way to estimate game outcomes. Well given that player scores so slowly converge clearly in a lot of games that is just not a correct assumption.

P.S. TL;DR: In other words if the swings more closely reflect recent performance rather than overall win% we get better convergance and more accurate scores to balance teams with.

andy
02-02-2011, 03:31 PM
The system is built to give you a 50% winrate when you reach your rating. This is a team game if youre ranked 3000 and the other players are ranked 1000 or 2000 you will get more 1000 players than 2000 players while the other team will have more 2000 players, by no means you will be able to win easier if you play against lower ranked players, the teams will always be balanced. You will start winning more if you improve.

Quoted from Esoteric:

ELO is a system that works based on assigning each player/team "odds" of winning, and then rewarding based on that. For instance if you have two players, player a with 1900 and player b with 1500, player a should beat player b 10 times for each time player b beats player a -- A wins 10/11 times.

However, we're dealing with teams rather than a 1v1 so we need to estimate the odds that team a beats team b. You, however, compare a player's ELO against the enemy team's ELO. You are, essentially saying that each person on my team has different odds of beating the enemy team. Obviously, this is somewhat nonsensical. I can't have a 2/3 chance of winning while my teammate has a 1/2 chance--we have an equal chance of winning as we're on the same team.

Urpee
02-02-2011, 03:41 PM
Isn't it obvious that this is predicated on the assumption that current scores are a decent reflection of ranking and that the scores converge sensibly fast to their true score?

I have had a winning ratio for quite some time now. My score is still way low. I'm at the zero sum mean, but still below the median score and clearly below the average game mean score.

The problem is that convergence is too poor. Hence lots of games are balanced on a false assumption of scores rather than actual balance. Some people may be close to their score but many may be far away. It seems luck not balancing if it depends if you get someone who is scored well versus someone who is misrated and waits for a long chain of games to converge.

If it takes hundreds of games to reach ones true ranking clearly it's broken if for every game the assumption is that the score is a sensibly good reflection of the team.

Clearly if one is underrated ones should rise fast. If one is overrated one should drop fast. And if one is rated correctly one should oscillate with little change about ones rating.

This is not what we are seeing.

blln4lyf
02-02-2011, 03:42 PM
Urpee: why are you not rated correctly? You are winning around 50% of your games at a 1500 rating, meaning you normally get at least one very high rated player on your team. If your team was made with you having say a 2000 rating, who says you would win anywhere near 50%?

Edit: no ones rating will be perfect, but I think it is safe to assume most who have 150 games played are real close. Example: You have over 450 games played, and you have won 53 of your last 100 games. So 100 games ago you were rated at 1425 and managed to climb to 1500(rounded) after 100 games. Seems pretty accurate to me.

Urpee
02-02-2011, 03:50 PM
Cut the first say 50 games from my track record and reevaluate. My score will be different. In fact if I started ladder fresh now and we take my last 50 games as what my performance would be my score would be different.

Which of these scenarios reflect my performance? Clearly I'm misscored at 1500 given how I played.

As said this is fine if there is no learning. But do you know how long I had to put up a winning ratio to get up from 40s to 50%? Convergence is way slow.

If you truly believe that this is working I cannot really say much.

Take someone who early one gets a bunch of wins and then maintains a 50% and another player looses the same and then maintains 50%, both will converge very slowly. In fact look at even great players. Having a long losing or winning streak is to be had by many. It's just luck if you have them early or late and that early luck will be hard to mitigate.

I don't think we disagree about the mechanism. I'm just saying that very obviously the system does not help convergance, but it creeps at the change of overall win%. People who play lots of games at some win percentage above 50% are very stable. Because one game will not change this drastically. Same for people who have a sensible number of games at 40%. Danielle, even if she played a massive winning streak, will never dig herself out of that hole. There is not enough pull to converge. Percentage swings scale with number of games played. And given that now it's directly linked, at a certain point the system is locked in and it becomes very hard to change ones score (independent of where that score actually should be!).

blln4lyf
02-02-2011, 04:00 PM
I edited above to include information. Quite frankly, I don't think you are underrated at the moment. You have to realize your teams get better when you are rated lower, and that helps you move up if you in fact improve. Winning just above 50% means you are damn close to your true value at the moment.

This is the info I added to my above post, I think before you saw it and posted a response: No ones rating will be perfect, but I think it is safe to assume most who have 150 games played are real close. Example: You have over 450 games played, and you have won 53 of your last 100 games. So 100 games ago you were rated at 1425 and managed to climb to 1500(rounded) after 100 games. Seems pretty accurate to me.

Back when ladder started I played TA and didn't play as a team player much, I just TA'ed all over the place and would score a good amount but not provide any defense, any killing, and anything really besides ball movement/offense. At the time my ranking tanked down to the bottom 1/2 pages and I was rated I think 1150 or so, which was real low for the time being. When I did improve(not so much my skill, but my play style) I shot up to 1700ish fairly quickly because at 1150 I was very unrated once I got better, and if my true value was 1700 I was playing under 500 points below my true value, which lead to around a 60% winning rate, likely higher tbh. I may have the highest ball rating currently, but I have been where you are/were, and if you were truly underrated by that much, you would have jumped a lot more in your last 100ish games.

Cut the first say 50 games from my track record and reevaluate. My score will be different. In fact if I started ladder fresh now and we take my last 50 games as what my performance would be my score would be different.

Which of these scenarios reflect my performance? Clearly I'm misscored at 1500 given how I played.

If you cued the first 50 games then you would have been playing at a different rating while you made your climb. Therefore, such climb may not of existed. The higher your rating, the worse rated players your team will have, and vice versa. Who says you would win 50% of your games if....you removed a top 5 player from your teams and gave you a player ranked 50 instead, and the player ranked 50 on the other team was replaced with a top 5 player. This is an extreme example, but it shows that winning 50% at a 1500 rating is different than winning 50% at a 1000 rating, which is also different thank winning 50% at a 2000, 2500, or 3000 rating.

Urpee
02-02-2011, 04:20 PM
IWinning just above 50% means you are damn close to your true value at the moment.

Not at all. This is the myth that seems to be encoded in this discussion.

Let me give you a toy ladder. 12 people compete and we seed them randomly with these scores: 4 have 3000, 4 have 1500, 4 have 0. But they actually are all equally good.

We play and in fact it turns out that everybody wins 50%. The system will allow this and given people's actual skill they maintain their 50% ratio.

This is our current system. Is this working? You truly want to claim that people's scores are well reflected and converge properly?

It's a myth that someone playing at an overall 50% win ratio is properly ranked. The system encourages locking them into place whereever they are and no matter their actual correct ranking.

If the system worked, we would get a convergence of everybody to 1500 and this convergence would be sensibly fast. Currently there is no such mechanism. Because the player who is scored 0 competing in an average 1500 game gets no benefit over a player who is scored 3000 in a 1500 game. That is the convergence mechanism that would be needed to fix this example, but it's nowhere to be found.

And there is this myth that just because I have a 1500 now and a 50% win ratio that it's swell. In fact it may just be that me being misranked gets matched against another player who is misranked and we end up at 50% win and don't move. Only if the system converged properly would it be fair to assume that that other player I'm balanced against is actually about at the right spot. But it's blatantly obvious this isn't the case. You will find people above 1500 who aren't all that good. And you will find people below 1500 who are quite good. The reason for this is simple: The evaluation has gone awry and once you are at a wrong spot the system has insufficient correctives.

blln4lyf
02-02-2011, 05:39 PM
K I've been at both ends and I disagree. Surely not everyone's rating is on point but its a lot closer than you give it credit for. The reason for the change is so that when people come into ladder they don't get grossly overrated by being 1500 to start, causing an imbalance in teams(due to the zero-sum). There is absolutely no reason to penalize a player besides giving them lower rated teammates, because a higher rated player is not supposed to win well over 50% of there games if their team is set up to be balanced with the other team. Once again it is not 1v1. I'm giving up after this post, but just because someone is rated higher does not mean they should lose 42.5 points for a game that is supposed to be balanced team wise, while someone on the same team, with the same balance(team wise) should only lose 5. It would work that way if it was 1v1, but it is not 1v1. Maybe I am wrong here, but I just don't see it at all. Your idea that everyone is rated incorrectly and that is why your still not moving up that fast to me is just you being unwilling to accept that you just are only playing at a level around 1500 rating at the moment.

andy
02-02-2011, 05:50 PM
Im not gonna bother to discuss this with you when you have no understanding of how the ladder works. You are perfectly rated if you are winning 50% of your last games, you havent improved, im sorry for you. Let me give you a brief explanation: if you were underrated the ladder would balance the teams thinking that they both have 50% chance of winning but then your in one and are underrated so your team should probably have a 60% chance of winning (totally arbitrary number, it depends on how underrated you are).

We saw how fast you can climb the ladder when people used smurfs (check bllns smurf its still out there).

In the end your win % should be around 50% +/- 1% if you play enough games what gives you the rating is the difference between wins and losses.

Pieface
02-02-2011, 06:14 PM
To play the devil's advocate here, I believe what Urpee was trying to get across is that the situation of being underrated only benefits you if you assume the other team is correctly rated to start with. If you have someone equally underrated (or overrated) on the other team, you will still win about 50% of the time even though compared to the total ladder population's skill you should be rated higher. In essence, the prevalence of misrated players in ladder prevents you from following your predicted ranking trajectory: winning if you're underrated and losing if you're overrated.

I've also experienced these huge win/loss streaks that dramatically change your rating, but I wouldn't necessarily attribute them to ladder working the way it was designed to. If they do follow from that, it's clear that you need a certain set of conditions to achieve a large change in ranking. These situations only come every so often, which is why it takes so long to achieve your true rating. In turn, the fact that you haven't yet achieved your predicted rank prevents others playing with you from following their projected behavior as well. With the current system it's a cycle that's only broken when you get games where everyone except yourself is perfectly rated.

Rainmaker
02-02-2011, 06:44 PM
The system is built to give you a 50% winrate when you reach your rating.
Yes & No.
WHEN you reach your ideal rating (thats what ELO is meant to calculate through a large data entry). If you play against someone with your same rating you would have 50% win rate.
When you play against a random player, the system calculate that possibility by its self.
¿How? That's the value E.

E = 1/ {1+10^[(opponent rating)-(my rating)]/400}

Whats that 400? In chess a difference of 200 means you have a probability of 0,75 of winning/loosing.
As the rating is assumed to follow a normal distribution, you have to take into consideration this 200 points difference both ways.
Someone 200 points over you, and someone 200 points below you.

If we use the formula for E, considering the rating difference being -200 (i m 200 points over my opponent) my probability for winning is:

E = 1 / {1 + 10^[(1700-1900)/400]}
E = 1 / [1+ 10^-0.5]
E = 1 / [1+ 3.16228]
E = 0.240253073

I have a 0.24 probability of loosing, and 0.76 of winning. (actually there is a chance of draw)

I tried to explain this to nobodyhome:
You have to stop forcing false data to the system. That's whats screwing with the ELO system.
The auto balancer, is the assumption (organizer / community / etc) that an average rating means a fair match (a 1700 rated team against a 1700 rated team ON AVERAGE). It means that each team is on average the same rating, then making it a "fair" match, and the win rate calculated by ELO 50%.

E = 1 / [1 + 10 ^ [team1.avrg.rating - team2.avrg.rating/400]
E = 1 / [1 + 10 ^ (0/400)] = 1 / [1 + 1]
E = 1 / [1 + 1] = 0.5

So here is one thing that was messing ELO. This is why a 1500 rated player in a 1700 rated game would earn 24 points, and a 3000 rated player in a 1700 rated match would earn 24 points as well. Wait, what?
That's wrong!

Now, ELO is designed to reward players who challenge (and win) higher rated opponents. That is, because they measured rating is perceived underrated.
So ELO's system for point rewards is:

new rating-old rating = K*[S - E]

S = 1 if you win, 0 if you loose
E = probability
K = factor.

K is an arbitrary number. You could set it to 500, to 2 or to 50 (currently).
A higher K means you are making more fluctuations. Players will loose or gain too many points if few matches, so they would never stay in one rating.
A high K is used to get a first rating value that is an approximation to the "real" rating of the player.
A lower K means a steadier rating, points gained or lost affect slightly your rating. A lower K means you need a lot of games played (won or lost) to change your rating.




This is a team game if youre ranked 3000 and the other players are ranked 1000 or 2000 you will get more 1000 players than 2000 players while the other team will have more 2000 players, by no means you will be able to win easier if you play against lower ranked players, the teams will always be balanced. You will start winning more if you improve.
This is what I mean that we are forcefeeding false data to the ELO system.

Let me try to explain this again. YOU ARE NOT SUPPOSED TO WIN HALF OF YOUR GAMES. That would be an idealist scenario where everyone is equally skill wise and equally rated.
You are supposed to have a 0.5 ratio only if you play against equally rated players to you. We were forcing this kind of data to the system by team balancing.
I agree with Esotheric on that points: this system was designed essentially for 1v1, its really hard to make work for a team based game.
¿How you simplify 5 people into one rating?
That's one of the problems we are having.
We think that the average rating of a team if the real representation of that team skill (The sum of the parts). This is what most system are criticized for, team dynamics.


I agree with Usurpers that the current system is flawed (in a very bad bad way).
Only that slight change in the formula, would stabilize the ranking at first.

On another matters its to decide on the K factor.
I suggested that everyone's first 20 games be calculated with a high K.
¿Why? The rating becomes more accurate as long as you keep playing. The more data entry you feed, the better it works.
This is was many were complaining for: you can't climb the leader board.
The first matches will make you loose/win a lot of points, thus ensuring a rough first approach to your rating.
After that the K should we lowered gradually to a value (and this is just criteria) to reflect skill improvement.
For example:
K = 50
A player rated 1500, playing in a 1700 rated game.
Will gain 38 points if his team wins
Will loose 12 point if his team looses

¿How does work?
The player is rated below the average, he is actually on that 200 point border (win 0.25, loose 0.75). So if this players wins (his team actually) the system knows he is misrated, and correct his rating adding a high value:
His new rating is 1538

If he plays again this same setup:
A player rated 1538, playing in a 1700 rated game.
Will gain 36 points if his team wins
Will loose 14 point if his team looses

And again...:
A player rated 1574, playing in a 1700 rated game.
Will gain 34 points if his team wins
Will loose 16 point if his team looses

And again...:
A player rated 1608, playing in a 1700 rated game.
Will gain 31 points if his team wins
Will loose 19 point if his team looses

And again....:
A player rated 1639, playing in a 1700 rated game.
Will gain 29 points if his team wins
Will loose 21 point if his team looses

(I assumed the team or this player won EVERY match).
His ranking changed by 139 points. It was a 5 winning game streak.
With the actual system he would gain 125 points.

Now, lets get real. Lets assume he won 5 matches, but he lost 5 as well.
After his 5 win streak, he is in a 5 loose streak:

A player rated 1639, playing in a 1700 rated game.
Will gain 29 points if his team wins
Will loose 21 point if his team looses

And again....:
A player rated 1618, playing in a 1700 rated game.
Will gain 31 points if his team wins
Will loose 19 point if his team looses

And again....:
A player rated 1599, playing in a 1700 rated game.
Will gain 32 points if his team wins
Will loose 18 point if his team looses

And again....:
A player rated 1581, playing in a 1700 rated game.
Will gain 33 points if his team wins
Will loose 17 point if his team looses

And again....:
A player rated 1564, playing in a 1700 rated game.
Will gain 34 points if his team wins
Will loose 16 point if his team looses

His rating would be 1548. WHAT?!?!?
But he had a 0.5 win ratio; shouldn't he be at 1500 again?
NO, the system takes into consideration that he was playing in a higher rated environment. He is a 1500 playing in a 1700 rated teams; his chances are supposed to be slimmer for winning. If he is winning games, means the system is underating him, thus making him climb faster the ranking.

Visceversa happens with a high rated player in a low rated environment.

Hope this helped clear out some of your doubts, and explain the system mechanics.

(KEEP in mind i used the modify system, and not the current. The current system had a constant gain of points of 24~25. Not being able to perceive a low rated player from a high rated player. He would give +/- 125 points for a win/loose streak to a 1500 rated player in a 1800 game, as well as a 3000 rated player in a 2000 rated game).

Urpee
02-02-2011, 06:47 PM
Thanks Pieface, that pretty much exactly paraphrases what I'm trying to say.

The point certainly is not to make life for good players unfairly hard or bad players unfairly easy. The point is mostly to try to find a system that has desirable properties, elevate good players and drop bad ones while encouraging a good chance of overall balanced and competitive games.

I don't want to give the impression that I think that the current system is awful. I don't think it is. I think it's pretty good. I think it converges just a tad too slowly, and it predictably slows down the longer ladder is active (because overall win% directly correlates with ladder score and win% variability slows down the longer one plays).

Basically the goal is to keep the score agile but reflective of skill/performance. The simple alternative is to simply reset ladder more frequently. That works fine and it does essentially achieve the same goal.

andy
02-02-2011, 06:49 PM
Once you reach your rating you should win 50% of your games.

Ill add on later.

Rainmaker
02-02-2011, 07:04 PM
Once you reach your rating you should win 50% of your games.

Ill add on later.
No. You will have 50% of winning only if you play equally rated players.
The assumption that equally rated teams = equally rated players is an approximate; nothing more, nothing less.

You ill achieve a 0.5 win rate only when you play against equally rated players to you.
If you are rated 1680 and lets assume this is your "real" rate. If you keep playing people rated 1680, you should win 0.5 of the match. Thus, your rate shouldn't fluctuate more than 1730 and below 1630.

sunshineduck
02-02-2011, 07:34 PM
just a disclaimer i have no idea about any of this math or if it's even possible but..

i would really like to see plane choices implemented into the team balancing formula somehow. like, player X plays plane A 80% of the time, and the formula could account for that somehow. would actually make for more fun and balanced games than having 5 mirandas on one team just because their elo ratings fit together perfectly

Rainmaker
02-02-2011, 07:52 PM
just a disclaimer i have no idea about any of this math or if it's even possible but..

i would really like to see plane choices implemented into the team balancing formula somehow. like, player X plays plane A 80% of the time, and the formula could account for that somehow. would actually make for more fun and balanced games than having 5 mirandas on one team just because their elo ratings fit together perfectly

That should be server sided. You shouldn't be able to pick miranda if there are already 2 mirandas on your team. Which a quite simple and efficient rules.
No more 2 of the same. Worst case scenario: 2 randas, 2 biplanes, 1 loopy.

If its supposed to be a ladder, people will agree (or be force.agreed) to choose bomber or explodet too.

On another side of this kind of suggestion:
Make the ladder choose a total of 12 players. Only 5 would be playing, but in case of disconection or leave, the remaining is able to join. Focus on this, it isnt a 6 people team, there should be some random spectator picked so he can fill in the slot left.

blln4lyf
02-02-2011, 08:16 PM
I fully understand what you are saying Into The Walls...but there is an issue with this. If you are rated 3000 in a average 1700 game, you still shouldn't get less points if your team's total average is 1700, because even though you are 3000, the other 5 on your team will have an average rating of 1440, which is meant to balance your team to the other team.

Why would someone rated higher lose more points if the total team average is the same as the other team? They wouldn't, as it is a double disadvantage if you stack them with lesser teammates AND lesser winning potential.

ELO is a 1v1 ranking system, and what you are saying is 100% correct if ladder was 1v1, but being a team game it changes how you have to address ladder and the system, and if you set it up in such way the ladder rankings will fail miserably. I understand the current system can be upgraded, but not like this. You are applying 1v1 type data to a team game, when in reality, by your "false" data claim does nothing to change the fact that team A will have the same chance of winning, meaning that the player rated 3000 will have the same change of winning as the player rated 1000 on his team. They should not be judged differently UNLESS there is also something in place that makes the player rated 3000 somehow independent from the player rated 1000, which CANNOT happen in a team oriented game.

The team has to be treated the same, as it is team versus team, not (player 1 on team A vs player 2 on team A) vs (player 1 on team B vs player 2 on team B).

blln4lyf
02-02-2011, 08:28 PM
To play the devil's advocate here, I believe what Urpee was trying to get across is that the situation of being underrated only benefits you if you assume the other team is correctly rated to start with. If you have someone equally underrated (or overrated) on the other team, you will still win about 50% of the time even though compared to the total ladder population's skill you should be rated higher. In essence, the prevalence of misrated players in ladder prevents you from following your predicted ranking trajectory: winning if you're underrated and losing if you're overrated.

I've also experienced these huge win/loss streaks that dramatically change your rating, but I wouldn't necessarily attribute them to ladder working the way it was designed to. If they do follow from that, it's clear that you need a certain set of conditions to achieve a large change in ranking. These situations only come every so often, which is why it takes so long to achieve your true rating. In turn, the fact that you haven't yet achieved your predicted rank prevents others playing with you from following their projected behavior as well. With the current system it's a cycle that's only broken when you get games where everyone except yourself is perfectly rated.

I understand this, and have understood that Urpee was saying this, but while its true to an extent, it isn't the case in full.

If you are underrated, after playing enough games(say 100) you WILL overcome any such obstacles you described that can hold you back from reaching your true ranking. Note that your true ranking is usually plus or minus 200 points from where you are after a good amount of game, which is decently accurate.

As for proof, I've already stated my ball TA story how I climbed fairly quickly at 60% or above when I changed my playstyle, and also when I introduced my smurf to ball ladder I hit 2200 rating or something with like a 70% win percentage, showing that random variables aside, if you are underrated, you will climb and make that up, and it won't take as long as Urpee suggested. Point blank, his first 50 or so games that he says has caused him to still be underranked, have virtually no effect on him anymore because he has played 400+ games since then and that is WAYY more than enough for him to reach whatever his true value is.

blln4lyf
02-02-2011, 08:34 PM
No. You will have 50% of winning only if you play equally rated players.
The assumption that equally rated teams = equally rated players is an approximate; nothing more, nothing less.

You ill achieve a 0.5 win rate only when you play against equally rated players to you.
If you are rated 1680 and lets assume this is your "real" rate. If you keep playing people rated 1680, you should win 0.5 of the match. Thus, your rate shouldn't fluctuate more than 1730 and below 1630.

Dude, but you don't penalize a high rated player for playing with lesser rated players when the teams are equally rated BECAUSE the teams are equally rated. It may be an approximate, but it is still just that. If you want to penalize higher rated players from an 1v1 ELO standpoint, then you pretty much have to put all the higher players on the same team and all the lower players on the other team and say, okay since these players/team are much higher rated, they will have a great chance of winning so they only get +5 if they win while they get -50 if they lose. And frankly, that is stupid because the games won't be close. You gotta stop thinking of it from an ELO standpoint because you are letting the basic logic eclipse you due to the fact that you are trying to forcefeed 1v1 logic into a team game by treating each individual player as their own entity. THAT IS WRONG.

Rainmaker
02-02-2011, 08:47 PM
I fully understand what you are saying Into The Walls...but there is an issue with this. If you are rated 3000 in a average 1700 game, you still shouldn't get less points if your team's total average is 1700, because even though you are 3000, the other 5 on your team will have an average rating of 1440, which is meant to balance your team to the other team.

Why would someone rated higher lose more points if the total team average is the same as the other team? They wouldn't, as it is a double disadvantage if you stack them with lesser teammates AND lesser winning potential.
A 3000 rated player in a 1700 rated game is supposed to have a winning probability of 0.999437974.
Take into account WHAT would take for someone to get to rating 3000. He would have to be extremely good according to this "new" rating.
To the "actual" rating he would just have to have more than 60 wins than looses.
Having 63 wins / 0 looses is rated equally as someone who has 1000 wins / 937 looses.

This is what happens with the current rating system:
Someone rated 3000, playing against a bunch of 2000 will lose 25 points if he looses the match.
The same person playing against 1000 rated people will still loose 25 points.

¿Whats the difference between been ranked 3000 and 1000?
You have intended or unintended crippled the rating system.

Rainmaker
02-02-2011, 09:07 PM
Dude, but you don't penalize a high rated player for playing with lesser rated players when the teams are equally rated BECAUSE the teams are equally rated.
I'm not penalizing them on purpose. It's how it works.
I know that it is messed up. But that feature, was introduced not to have unbalanced teams (a bunch of 3000 rated against 1000 rated people)

It may be an approximate, but it is still just that. If you want to penalize higher rated players from an 1v1 ELO standpoint, then you pretty much have to put all the higher players on the same team and all the lower players on the other team and say, okay since these players/team are much higher rated, they will have a great chance of winning so they only get +5 if they win while they get -50 if they lose. And frankly, that is stupid because the games won't be close. You gotta stop thinking of it from an ELO standpoint because you are letting the basic logic eclipse you due to the fact that you are trying to forcefeed 1v1 logic into a team game by treating each individual player as their own entity. THAT IS WRONG.
I understand but the problem is the other way around, a 1v1 system is trying to be forced into a 5v5.
Somehow you/we have to find a way to simplify those 5 people into 1 rating:
peoples skills
team dynamics
past experience
etc

As i said this is one of the most important critics to the system adaptation. It's not only us who encounter this problem, every multiplayer system has this. Microsoft encountered this and designed TrueSkills.
Chess Tournaments encounter this problem (and many more like inflation/deflation).

¿Your critic is against the higher ranked players loosing too many points?
There is a solution to that, and has also been introduced to chess.
Make K vary with the rank

* Players below 2100 -> K factor of 32 used
* Players between 2100 and 2400 -> K factor of 24 used
* Players above 2400 -> K factor of 16 used

Why? some filters or rules tend to leave gaps for unintended bad habits in players.
For example opponent picking. Chess players would play against a highly rated computer, with a previously known strategy that worked against, meaning free ELO points for them.
Some players would "stop" playing to keep high rank.

How will affect a diff K factor?

A 2000 player playing in a 1700 rated game:
P of winning = 0.849

K = 60
Win: +9
Loss:-51

K = 50
Win: +8
Loss: -42

K = 40
Win: +6
Loss: -34

K = 30
Win: +5
Loss: -25

K = 20
Win: +3
Loss: -17

K = 15
Win: +2
Loss:-13

Mind you ballin: with the new system is highly improbable of having someone with 3000 rating like we do now. My guess is most rating will be close to each other, between 1000 and 2000. There won't be 1 player ahead for 500 points like now. The 1st could be 2224, and the 2nd would be 2220. But, it wouldn't be easy for a player ranked 2000 to reach 2230.
Why? because there aren't people ranked 3000, so he would have to play certain amount of 1800 rated games to win those 200 points (my guess is around 12 consecutive wins for a 2000 rated person in 1800 rated games)

If you don't like ratings so close you can always change the scale. (1500 zero sum, and 200 being a rate breakthrough)


Any Ladder Organizer mind clarifying this?
First Eso implemented his modified ELO system (in which the skill difference was calculated as "team 1 avrg"-"teams 2 avrg")
Later an autoteambalance feature was added.

Pieface
02-02-2011, 09:33 PM
If you are underrated, after playing enough games(say 100) you WILL overcome any such obstacles you described that can hold you back from reaching your true ranking.

But that's exactly the problem. If the system necessitates playing at least 100 (most of the time more) imbalanced games before you even get close to achieving your "true" skill rating, then it's not working as efficiently as it could be. Factor in the almost constant influx of people who play a certain game mode sporadically or are new to ladder and you effectively ensure that there are very few games that are balanced according to the players' "true" skill since the newcomers have not played enough to correctly balance the teams.

To be honest, I think what makes establishing a rating system so hard for ladder is the presence of our current autobalance system. It's extremely difficult to come up with something good that would also meld with the way teams are assembled at the present.

A Nipple
02-02-2011, 10:54 PM
it's important to remember peoples skills fluctuate. At least speeking for myself

Niipneeep

=]

andy
02-02-2011, 11:38 PM
A 3000 rated player in a 1700 rated game is supposed to have a winning probability of 0.999437974.
Take into account WHAT would take for someone to get to rating 3000. He would have to be extremely good according to this "new" rating.
To the "actual" rating he would just have to have more than 60 wins than looses.
Having 63 wins / 0 looses is rated equally as someone who has 1000 wins / 937 looses.

This is what happens with the current rating system:
Someone rated 3000, playing against a bunch of 2000 will lose 25 points if he looses the match.
The same person playing against 1000 rated people will still loose 25 points.

¿Whats the difference between been ranked 3000 and 1000?
You have intended or unintended crippled the rating system.



You seem to not understand the current system at all. You get +25 because your winning percentage is 50% (and this is because the teams are AUTOBALANCED).
If you are rated 1000 or 3000 you will still be put in the conditions of having a 50% win chance.

Answer this:
Lets say we have 2 balanced teams

Team A
1 Pro 3000 rating
2 vets 2000 rating
2 noobs 1000 rating

Team B
1 Pro 3000 rating
2 vets 2000 rating
2 noobs 1000 rating

How the hell could the guy rated 3000 have a .9 chance of winning and the 1000 guy on his team have a .1 chance of winning..!??

You will be rated 3000 if you can stand getting better players against you if you are rated 1000 but your skill is like a 3000 player you will start getting a bigger winning percentage till you get to your true rating at that point the winning percentage will tend to 50%.

Ribilla
02-03-2011, 12:44 AM
You seem to not understand the current system at all. You get +25 because your winning percentage is 50% (and this is because the teams are AUTOBALANCED).
If you are rated 1000 or 3000 you will still be put in the conditions of having a 50% win chance.

Answer this:
Lets say we have 2 balanced teams

Team A
1 Pro 3000 rating
2 vets 2000 rating
2 noobs 1000 rating

Team B
1 Pro 3000 rating
2 vets 2000 rating
2 noobs 1000 rating

How the hell could the guy rated 3000 have a .9 chance of winning and the 1000 guy on his team have a .1 chance of winning..!??

You will be rated 3000 if you can stand getting better players against you if you are rated 1000 but your skill is like a 3000 player you will start getting a bigger winning percentage till you get to your true rating at that point the winning percentage will tend to 50%.

THIS

you always have a 50% chance of winning (assuming that 1 pro + 1 noob == 2 vets) because there is a equivalent counter-part on your team.

If someone had 0.9... chance of winning then whoever was on their team would also have that same chance of winning. IMO the system needs only minor adjustments.

Rainmaker
02-03-2011, 01:45 AM
You seem to not understand the current system at all. You get +25 because your winning percentage is 50% (and this is because the teams are AUTOBALANCED)
Exactly, thats my point.
Why would you implement an ELO system and cripple it. While you an just reward winning teams with +25 points and loosers -25 points.
The current rating system isn't doing anything, it doesn't touch your rating.

If you are rated 1000 or 3000 you will still be put in the conditions of having a 50% win chance.You will be placed in a team, which in average has a 50% chance to win.
You are mixing the correlation between individual player rating and average team rating


Answer this:
Lets say we have 2 balanced teams

Team A
1 Pro 3000 rating
2 vets 2000 rating
2 noobs 1000 rating

Team B
1 Pro 3000 rating
2 vets 2000 rating
2 noobs 1000 rating

How the hell could the guy rated 3000 have a .9 chance of winning and the 1000 guy on his team have a .1 chance of winning..!??
Someone rated 3000 is expected to have a 0.9 winning chance in 2000 rated environment. Because as he is most skilled (proved by previous rating) he is expected to perform much greater than the 2000 rated guy, and even greater than by a 1000 rated guy.
On a different matter, the team has a 50% chance of winning.
So, the pro will win only (hypothetically, lets say the math is 5 points) +5
The vet gets +10 and the noob gets +30.
Why the discrepancy?
The noob isn't expected to outperform in a 2000 rated environment (team rated); but he did. Obviously he is underrated or was a statistical abortion. So his rate gets a huge boost
That is bloody unfair! the vet carried the whole team on his shoulders
It wasn't expected nothing from the noob, though somehow he must 've contributed for the team to win.
If it was pure chance, then most games in a 2000 rated environment he will lose, this loosing points, given to the "rightful" owners.

My point is that with the new system you will no longer have those overrated players.

You will be rated 3000 if you can stand getting better players against you if you are rated 1000 but your skill is like a 3000 player you will start getting a bigger winning percentage till you get to your true rating at that point the winning percentage will tend to 50%.
Excuse me the expression, but thats pure bullcrap.
I 've stated with math basing it.
The only way to get a 50% win chance if you play someone your skill level (your same rate).
The win % then is calculated depending on both players rating. It has NOTHING TO DO if you have achieved your TRUE rating or not.
If you stare long enough to the formula you will see it:

E = 1 / [1 + 10^[(His rating - my rating)/400]
E = chance to loose
1 - E = chance to win.

PERIOD, there is nothing more to it. the only variables are YOUR rating and your OPPONENTS rating. If you keep winning against bad odds (1 out of 4 games only wins), then the system gives you a boost, because you are being underrated.
If you achieved a high rate (2000), but you lack skills and were just lucky on the team balance on the last 5 matches, then on the next matches you will see your rate drop drastically, because you were overrated due to abnormalities (playing with high rate players in a high rate environment)



Let's take for example the current rating system and the top #25:

http://img199.imageshack.us/img199/1250/tbdrating.jpg

Players like Nipzor shouldn't be so close to the top: 0.5 win rate, and only 33 neat difference (win-lost).
eth & mikesol with only 40 neat difference, and only 100~180 games played.

The current system could be easily replaced for what i said first:
1. Autobalance system
2. add +25 to each player on winning team, withdraw -25 points to each player on looser team.

Because the only way to climb to the top is to have a neat difference of X games (between 40 and 60)
So it doesn't matter if your win ratio is 0.7 with only 100 games played or a win ratio of 0.52 with 1000 games. So long as your diff of win-loss > than 60 you will be on the top of the leaderboard.

Ladder has to be reset for ANY new system to be implemented correctly, if not it will continue to carry the previous system's flaws (over/under rated players)

elxir
02-03-2011, 02:35 AM
but uhh, nipple is like the second best bomber on that list sooo obviously something is working

nobodyhome
02-03-2011, 02:47 AM
This is ridiculous. There is so much misunderstanding of the concept behind ELO and the current system's implementation of it here in this thread. Urpee's proposed system (where you average either the other team's ratings or the all the players' ratings and then compare it against an individual player) is plain wrong. Don't try to "argue" for the system because it is simply mathematically wrong.

In my previous post I posted two links in which the system was explained and justified, please read those two links before you try to post against it. If after reading those links you still do not see why Urpee's proposed system is incorrect, please don't post here arguing that Urpee's system is actually better than the current system. Because it is not. Rather, you may post questions asking why it is wrong in order to gain a better understanding. I will not let this thread devolve into people arguing for something mathematically incorrect and then having the people who actually understand the system shoot them down, and then arguing back and forth to see who is right.

This is not to say that the ladder system is completely fine (otherwise we would not be having this thread at all). There are flaws in the system, but Urpee's proposal is not a solution. First of all let's examine how the system actually works. Given two teams, it first assigns a probability that one of the two teams will win using the following formula:

E1 = 1 / [1 + 10^ ([(Avg rating of team2)-(Avg rating of team1)] / 400)]

It then takes the probability and, depending on whether the team won or lost, assigns each team a point gain or loss:

New Rating = Old Rating + [ 50 * ( (1 if won, 0 if loss) - E ) ]

Notice if you find the expected value of this match (probability of win * point gain if win + probability of loss * point decrease if loss), you will find it to be exactly 0. This is because if these two teams are rated correctly, then there should be no change. If they are rated incorrectly, then the result of a series of games should result in each team's ratings moving closer to their correct value.

Now, one of ladder's primary problems is that sometimes, people take too many games to converge to their correct rating. We can solve this by replacing the "50" in the equation with a variable, K, which represents the uncertainty of one's rating. Notice how changing the 50 to any other value will not change the expected value of the equation at all. This is desirable. We can fluctuate the K up and down based on several things. For one, K can start off large for a new player (whose rating is very uncertain as they have not displayed their skill much yet) and decrease the more games you play. For two, K can also increase if you start streaking (either win streak or loss streak), as this can mean that your skill suddenly is changing because you are no longer balanced correctly. For three, K can increase if you also undergo a long period of inactivity (the ladder is no longer certain of your rating because it has not seen you play much recently). We will not be discussing the implementation of this new K variable here, as I have found extensive sources to read up on this and thus it shouldn't be a problem (if you have any suggestions/questions on this particular topic feel free to shoot me a PM or talk to me on altitude).

The second problem (the one I'd like to discuss now) is that we are taking ELO and forcing an adaption of it for team games. Notice how in the formula description above, nowhere in the entire thing is the concept of an individual player even mentioned. This is a reflection of the nature of how we are ranking things. Basically, the only way that we can test skill in Altitude is by gathering two teams, and then pitting them against each other. Consider each game to be a "test", and the output of this test is either "team1 wins" or "team2 wins". Now, if say, team1 wins, then this is a datapoint from which we can gather that team1 played better than team2 in this particular test (this game) and this test only. We would like to reflect this result in the ratings themselves so we decide that team1 should get some points and team2 should lose some points.

However, here's where it gets fuzzy: We decide that team1 as a whole has played better than team1's rating. Here we have defined team1's rating to be "the average of the players of team 1's ratings", but this is not necessarily true. Because of things like synergy, a team consisting of five players rated 2000 may not necessarily be just as good as another team consisting of five players rated 2000 (plane composition comes to mind here). How do we take a team composed of five individual player ratings and use that to form a composite "team rating"?

Furthermore, in our current system we assume that if team1 beat team2, which means that team1 played better than its aggregate rating, this means that each of team1's players played better than their individual ratings. We thus reward each player in team1 with equal amounts of points. This is also not necessarily true--it may be that players A, B, and C in team1 played better than their ratings and players D and E in team1 played worse than their ratings. Without looking into the actual in-game factors (individual kills/deaths, bomb hits, etc), is there a better way we can determine the distribution of points to the winner other than just "everybody gets the same"?

ryebone
02-03-2011, 02:59 AM
Just to add a bit of history here, in case anyone has forgotten or didn't know. When ladder was first implemented, it was the standard ELO system, but without any sort of autobalance mechanism. Unfortunately, there were two major flaws with this system:

1) Teams were usually picked by arbitrarily-chosen captains. This process was tedious beyond belief. It often took ten minutes to organize an eight-minute game.

2) People began dodging teams. If they felt the team they were picked to didn't stand a chance of winning, they would say something like "I'm busy" and refuse to join. From a personal-gain standpoint, I totally understand that logic; it's sometimes beneficial to stay away from huge risks for huge gains, and play only when the chance of winning is higher. This is especially true when a large portion of the final result is dictated by forces outside of their immediate control, aka teammates.

For those two reasons, autobalance was implemented. It's clearly a flawed solution, as it takes an already-flawed system (using a 1v1 rating model in a 5v5 setting) and placing additional restrictions on it. But from what I have experienced, it actually works fairly well in rating players relatively appropriately.

I'll be honest and say that I tl;dr'ed most of the mathematical posts in this thread (sorry ITW), but from skimming the thread I take it that the two main issues that's being discussed is 1) the time it takes to get from 1500 to your appropriate rating, and 2) the staleness of always having 50/50 games.

For the first, it can be fixed by allowing for huge rating variability for a person's first x amount of games, and gradually decrease that variability as the person plays more games. A moving average could also by implemented by increasing the rating variability if someone enters a trend of winning/losing (say, 60-70% win or loss over the last 10 games), which could happen if someone decides to try a new plane, has an epiphany and suddenly gets better, etc. Obviously this wouldn't be zero-sum, but that's the least of our concerns.

For the second, I personally think it would be good to give the autobalancer some leeway, and allow games to have up to 30-20 point changes. As it stands now, your record directly correlates with your rating, which makes the whole thing relatively pointless. Allowing up to 30-20 point swings will introduce a fair level of variation to keep things interesting without completely overtaking the current system. It would also make spectating more interesting (in specchat of course), when there is a clear underdog to root for.

Pieface
02-03-2011, 03:23 AM
Totally off subject, but are there any plans for some sort of rating degradation in ladder 2? It's sort of strange to have a large amount of people in the top 30 who simply played enough to make it there and then quit.

Rainmaker
02-03-2011, 04:06 AM
@ ryebone:

Looking it far away (after reading the WHOLE Altitude ladder, where many complains against this system were made).
I've come to this:

If the system "feels" accurate in your opinion keep it as it is.

As i recommended Nobody on irc:
If you want to rates being "guessed" faster make the K variable.
There many ways, but the principle ideas would be:

First X amount of games K is incredible huge. So you may gain/loose 100 points.

Make K gradually smaller as you rank up.
For example (just a rough approximation):

K' = (1/1.001^x)*K . X being amount of games

Example:
http://img15.imageshack.us/img15/50/kfactor.jpg

After 400~600 games played I would pick a constant K' = K / 2 (instead of 24 points, you only gain/loose 12)
On the good side, people on the high ranks won't be highly influenced if they keep loosing because autobalance, pairs them up with all 1000 rated people.
On the bad side, after 700 games you only gain/loss 12 points, making it harder to climb up/down the board.
I mean, if a player gets suddenly better because a patch, or his training, it won't be preceived, unless we add the "inflated K due to streaks".

I agree changing the K due to streaks. (add a factor which increases as streaks)
ie: K' = 1.2^(n-1) * K
C factor being 1.2^(n-1)
being n the consecutive games won lost.
For 2 games: K' = 1.2*K (a 20% increase)
For 3 games: K' = 1.44*K (a 44% increase)
For 10 games: K' = 5.15*K ( 500% increase)

A player who wins 5 games in rows currently only adds:
5*24 = 120 points

Using C factor:
1st: 24 (C = 1)
2nd: 29 (C = 1.2)
3rd: 35 (C =1.44)
4th: 41 (C =1.73)
5th: 50 (C= 2.5)
total: 179 points (120 base from before, plus a 59 bonus for streak)

Same happens the other way, the more you loose consecutively, the faster you lower on the ranking.

(wrote down the formulas so nobo doesn't get mad :D )

Urpee
02-03-2011, 04:28 AM
This is ridiculous. There is so much misunderstanding of the concept behind ELO and the current system's implementation of it here in this thread. Urpee's proposed system (where you average either the other team's ratings or the all the players' ratings and then compare it against an individual player) is plain wrong. Don't try to "argue" for the system because it is simply mathematically wrong.

In my previous post I posted two links in which the system was explained and justified, please read those two links before you try to post against it. If after reading those links you still do not see why Urpee's proposed system is incorrect, please don't post here arguing that Urpee's system is actually better than the current system. Because it is not. Rather, you may post questions asking why it is wrong in order to gain a better understanding. I will not let this thread devolve into people arguing for something mathematically incorrect and then having the people who actually understand the system shoot them down, and then arguing back and forth to see who is right.


My math isn't wrong, at least no more wrong than what's already there. But rather than just say that you don't like my proposal you have to do this.

But if we must, go ahead and explain why my math is "wrong". I'm used to having my work critiqued. Happens all the time if you publish math in academic journals. Will be educational to discuss additivity of expectation values, the expectation of a single sample against a population mean and modeling of mixed expectations.

For reference, what I have done is added a model for the expectation of individual in the team compared to the team average. There is nothing wrong with setting up such a model, one can choose to model it or not. That neither right nor wrong, but a modeling choice. And yes, the current model does not do it. But that is exactly the crux of the problem. ELO was never designed to operate on randomized ensembles of ELO scores and taken the mean. In fact it's clearly not trivially valid and some of the symptoms we see are precisely because the current method are treats randomized ensemble expectations as if they are single sample expectations. Now the right thing to do is something like TrueSkill, hence why I have favored it. That was too complex, so I was left with brainstorming a simple solution. It models a mixed expectation. Now how much a player contributes to the team is a model assumption that depends on the game and other unknown factors. One sure could try to formalize this, but all it would change is the weights I propose. I submit that such a formalization is hard, but am happy to be convinced otherwise.

But let's back off. I was under the impression the goal was to brainstorm various solutions. That's what I did. I offered a suggestion. That's all it is. Don't like it, don't take it. But don't make up some story that supposedly the math is wrong and I don't understand what is on the table. I perfectly fine understand what's there. I'm just making suggestions that happens to be not exactly the same as what is currently there. No need to worry or instruct everybody to understand how wrong the math supposedly is. Just say you don't like it. It's quite sufficient.

Rainmaker
02-03-2011, 06:32 AM
I have to agree with Usurpeer here.
The way you are trying the ELO model is absolutely wrong. Esotheric pointed that out in the beginning (autobalance would definitely screw what he proposed the first time).

So you have kicked, twisted, and distortioned the model (a square block) into what fills your assumptions (a round gap).
I 've understand, by the way you expressed your opinions, that you are not planning to change that point of view you have. Not because you haven't seen the evidence, but because you are not willing to put it under the microscope.

It's frustrating arguing with someone that's not fully capable of understanding the math behind the model (not you specifically nobo, talking about the thread in general). This is in no way patronizing, but because plenty of people have come up with a bunch of irreal ad-hoc examples where they think to have proven wrong. Though they can't neither present the math that's backs their assumptions, nor confirm in any way their predictions.

Just as a mere way of showing one of my points:
Nobody stated that you can't compare "avrg rating" and "player rating".
There is so much misunderstanding of the concept behind ELO and the current system's implementation of it here in this thread. Urpee's proposed system (where you average either the other team's ratings or the all the players' ratings and then compare it against an individual player) is plain wrong. Don't try to "argue" for the system because it is simply mathematically wrong.

Maybe realistically its hard to think that the uncertainty between them makes them no way comparable.
But, from a mathematical point of view, if this two variables are proven* to follow a Normal distribution (mu, sigma; as parameters); one of the most important properties is that the addition or subtraction of two Normal variables corresponds, to a new Normal variables, with its parameters being: mu: "mu1-mu2" and sigma "sqrt(sigma1^2 + sigma2^2)".

There are millions of examples that backs up this little mathematical property.
To calculate the weight of a 6 pack:
each bottle follow a weight as: 1 bottle (mu=500 grams, sigma=70), and the cardboard (mu=100,sigma=20).
The total weight would be: bottle1+bottle2+...+bottle6 + cardboard:

mu.weight= 500*6 +100 =3200
sigma.weight = 420
We could state with a 90% confidence that the minimum weight of a six pack is: 2662 grams = 2.6 kilograms.

There are a lot of practical applications, most are used in production control (to check that you keep your faulty production ratio below certain proportion, like 0,05)

See? you can mix the same kind of measurement from 2 different type of objects.
For ELO model: the mix variable is rating, the objects are "team" and "players".
If someone doesn't know this, how can I expect to have a reasonable discussion over what ELO model is best?

I know this sounds really cocky, it's not my intention (maybe I suck at expressing myself in English), but it's my point of view on the matter.



Keeping focused for Ladder Season 2: i've provided you with some examples nobo, I'm still willing to cooperate with you on the project. :)


*: There is a Theorem known as "Theorem of the Central Limit" which stats that ANY TYPE (could follow any distribution such as: Gamma, Beta, Patetto, Weilbull, Maximum Gumbel, Minimum Gumbel, exponential, and a large etcetera) of variable is added over 30 times; or its mu/sigma coefficient<0,2; then it can be approximated by a Normal Distribution

EDIT:

Now, one of ladder's primary problems is that sometimes, people take too many games to converge to their correct rating. We can solve this by replacing the "50" in the equation with a variable, K, which represents the uncertainty of one's rating. Notice how changing the 50 to any other value will not change the expected value of the equation at all.
This is not technically correct. K doesn't represent the uncertainty of rating.
K can be any number you want. (you couldn't pick arbitrarily the uncertainty now, could you?)
K is a factor that changes the scale in which you want the points to be assigned based on the probability of winning or losing. If you want big trades of points you are going to assign a high K value, if you want a low transaction of points you will use a low K value.
What's the difference? a high K value is desired to get a first approach on someones rating (everyone starts at a zero sum, like 1500 for altitude, in chess is 1000).
But, a high K value has too many sensitivity, it will change drastically your rating in a few games; so to widen the specter in the high rankings (2400 and above), you want a low K value. That way its easier to follow an expert's skill improvement; and the influence of a statistical abnormality is reduced.

¿What measures uncertainty then?
In the E formula:

E= 1/{1+10^[(opp rating - my rating)/400]}
That 400 is containing the uncertainty information. That 400 represents the 200 to the right and the 200 to the left to the mean of the Normal distribution.
In ELO, a 200 difference rating, is considered as the top player having a 0.75 probability of winning.

new rating = old rating + K* [S-E]
And here is K, and you can see that higher K means higher points 'added to'/'drawn from' your rating.

Tekn0
02-03-2011, 08:32 AM
just a disclaimer i have no idea about any of this math or if it's even possible but..

i would really like to see plane choices implemented into the team balancing formula somehow. like, player X plays plane A 80% of the time, and the formula could account for that somehow. would actually make for more fun and balanced games than having 5 mirandas on one team just because their elo ratings fit together perfectly

I cannot agree with this more.

In fact why not club it into groups? Like say player X plays Loopy 50% & Miranda 50% of the time, he is considered an offence player, then player Y plays (bomber or whale or biplane) he can be considered defense and support. Some players will be more or less same % of all planes but then they can be placed on either team based on their rating.

Then balance teams to have 2 or 3 offense players. Of course there can be nothing done if all 12 players are play offense but clearly that usually does not happen.

Yes I know the "people should play all planes or one heavy, one light plane" argument but in reality not everyone are equally good at all planes.

Point is, the auto balance algorithm can take these statistics into account too.

Stormich
02-03-2011, 08:37 AM
I cannot agree with this more.

In fact why not club it into groups? Like say player X plays Loopy 50% & Miranda 50% of the time, he is considered an offence player, then player Y plays (bomber or whale or biplane) he can be considered defense and support. Some players will be more or less same % of all planes but then they can be placed on either team based on their rating.

Then balance teams to have 2 or 3 offense players. Of course there can be nothing done if all 12 players are play offense but clearly that usually does not happen.

Yes I know the "people should play all planes or one heavy, one light plane" argument but in reality not everyone are equally good at all planes.

Point is, the auto balance algorithm can take these statistics into account too.


This isn't as simple as it sounds+your solution only works on ball.

Tekn0
02-03-2011, 09:34 AM
This isn't as simple as it sounds+your solution only works on ball.

if you have the stats for it, it shouldn't be really hard to implement. Also yes only ball, is it not possible to have this "additional" criteria taken into account if the game mode is ball?

Again we only classify as player as offense (eg. if they play > 85% of the time as loopy or miranda or biplane) and we classify as defense if they play (85% of the time as whale or bomber). For others we classify them as "Whatever" class and we need not bother with what plane they will play.

Also I think since Random was ruled out in Ladder this system makes sense.


Extra checks during autobalance algorithm for game mode ball:

1. Balance teams using the existing rankings etc.

2. Count number of offense players assigned to Left team as cL.

3. Count number of offense players assigned to Right team as cR.

4. if cL == cR, do nothing, exit.

5. if cR - cL == 3, else go step 6. Right team has 3 more offense players than Left, get average of all cR rankings of offensive players and move the offensive player closest to average to Left team. Take one defense player of closest ranking to chosen offense player (who will be switched) and assign the player to the Right team. (essentially swapping offense and defense averaged players). If we don't have sufficient defense players we can use "Whatever" class.

6. Repeat step 5, for cR - cL == 4 and 6 and move 2 and 3 players respectively.

6. do same as step 5 and 6 except reversive cL and cR.


I just wrote this algorithm off hand without too much thought.. feel free to discuss. I'm sure this algorithm can be tweaked a lot better but I don't have time at the moment to put more thought into it.

Just a very very rought draft.

VipMattMan
02-03-2011, 12:56 PM
Not at all. This is the myth that seems to be encoded in this discussion.

Let me give you a toy ladder. 12 people compete and we seed them randomly with these scores: 4 have 3000, 4 have 1500, 4 have 0. But they actually are all equally good.

We play and in fact it turns out that everybody wins 50%. The system will allow this and given people's actual skill they maintain their 50% ratio.

This is our current system. Is this working? You truly want to claim that people's scores are well reflected and converge properly?

It's a myth that someone playing at an overall 50% win ratio is properly ranked. The system encourages locking them into place whereever they are and no matter their actual correct ranking.

If the system worked, we would get a convergence of everybody to 1500 and this convergence would be sensibly fast. Currently there is no such mechanism. Because the player who is scored 0 competing in an average 1500 game gets no benefit over a player who is scored 3000 in a 1500 game. That is the convergence mechanism that would be needed to fix this example, but it's nowhere to be found.

And there is this myth that just because I have a 1500 now and a 50% win ratio that it's swell. In fact it may just be that me being misranked gets matched against another player who is misranked and we end up at 50% win and don't move. Only if the system converged properly would it be fair to assume that that other player I'm balanced against is actually about at the right spot. But it's blatantly obvious this isn't the case. You will find people above 1500 who aren't all that good. And you will find people below 1500 who are quite good. The reason for this is simple: The evaluation has gone awry and once you are at a wrong spot the system has insufficient correctives.

The semi-random nature of auto-balancer and times that different people play creates that convergence. Your perfect-world ladder where there would be no change in ratings with players of equal skill is entirely predicated on EVERYONE in ladder having the exact same skill level. Enter people of differing skill levels and all of a sudden there will be much more rating change.

The current system actually works pretty damn effectively for determining influence in a game. We've seen it time and time before. People who aren't appropriately rated essentially have control of ladder (which is the real fault of ladder, and is what I'm pretty sure nobo is mostly concerned about). That includes people who hold ratings well above your 1400 rating which you say is inaccurate.

Ball'ns smurf account instantly shot up the rankings and then hovered around rank 50 in ladder as he played different planes . When he didn't care and decided to play other planes, his team lost. When he did care and played miranda his team won. People got tired of this and complained. He got irritated by those complaints and decided to play solely randa. Within a day he had gotten to the 10-20 range.

Another instance distinct in my mind was Goose apparently letting someone else play his account for a couple of days when he was ranked in the 20 range. The person on his account wasn't playing at the same skill level and lost something like 27 out of 30 games. When Goose subsequently started playing normally he won 19 out of 23 games. His rating trended very quickly right back to where he was before. He had full control of those games with a rating well above what yours currently is.

Goose's page from those days - look towards the bottom:
http://64.191.124.60/matchlist.php?id=c347778f-11f7-42f6-8a34-e9c3c47adb7c&mode=ball_6v6&grf=200&sort=played_d

You aren't being held down by ladder's flaws, or your history. Your rating is nearly as if you had just entered ladder for the first time. Most people when they enter ladder for the first time with your rating either win or lose lots of games as their rating adjusts to their actual skill level. If a player is lowly rated and at any point they have a sudden increase of skill then they'll begin winning a higher percentage of their games until they reach their appropriate rating.

CCN
02-03-2011, 01:26 PM
without restricting plane setups that problem is hard to fix.

Urpee
02-03-2011, 01:49 PM
Matt, no offense but what happens a lot here is that anecdotes are taken to be more valuable than actual math.

I have given an actual proof that the ladder does not converge under certain conditions and that this is a property of the design of the ladder. That neither implies nor means to imply that people cannot raise or drop. What it does prove without any shadow of a doubt is that people playing at a 50% is no sign at all that they are at a rating that reflects their performance.

It doesn't really matter if one can give exceptional cases where people rise drastically. What does matter is that this is what the model actually does.

In any model proposed balln and goose would perform well, wins would still be scored positively, and losses would be scored negatively, so I'm not sure what exactly you are trying to say except repeat the disproven mantra that the scores converge properly.

But yes, I should never have mentioned my own case, because it allows people to make it about me. It isn't.

Look if you like the system as is, that's fine. Here is what I have already said about it:

"I don't want to give the impression that I think that the current system is awful. I don't think it is. I think it's pretty good."

It's kind of hard to try to have nuanced discussion when people climb on barricades the moment one tries to point out properties of what is there.

A Nipple
02-03-2011, 02:09 PM
but uhh, nipple is like the second best bomber on that list sooo obviously something is working

well, I did crap the first few hundred games and learned most planes from scratch in allot more as it was the best environment to practise them. I'll get my number one bomber back for next APL hopefully if I'm not too busy =]

VipMattMan
02-03-2011, 02:24 PM
Matt, no offense but what happens a lot here is that anecdotes are taken to be more valuable than actual math.

I have given an actual proof that the ladder does not converge under certain conditions and that this is a property of the design of the ladder. That neither implies nor means to imply that people cannot raise or drop. What it does prove without any shadow of a doubt is that people playing at a 50% is no sign at all that they are at a rating that reflects their performance.

It doesn't really matter if one can give exceptional cases where people rise drastically. What does matter is that this is what the model actually does.

In any model proposed balln and goose would perform well, wins would still be scored positively, and losses would be scored negatively, so I'm not sure what exactly you are trying to say except repeat the disproven mantra that the scores converge properly.

But yes, I should never have mentioned my own case, because it allows people to make it about me. It isn't.

Look if you like the system as is, that's fine. Here is what I have already said about it:

"I don't want to give the impression that I think that the current system is awful. I don't think it is. I think it's pretty good."

It's kind of hard to try to have nuanced discussion when people climb on barricades the moment one tries to point out properties of what is there.

Under very specific conditions the convergence isn't there. Due to random dynamics, it's likely that over time the issues of potential non-convergence work themselves out. I don't think you've proven that the convergence doesn't exist in ladder.

The 50% rule is more of an issue of likelihood. It's likely that if you've played a large number of games and you're only winning about 50% of your games, then your skill and rating have probably settled.

Obviously if you win 50% of your games for months at 1500 rating, then you have an improvement of skill you're going to increase your rating. Depending on your number of games, your overall win percentage may only increase very very slightly during that time. At some point you're going to hit a rating wall dependent on your skill and then you're back to winning approximately 50% of your games.

Whether the 50% rule "exists" goes back to your belief that ladder has a non-convergent behavior and that ratings are inherently inaccurate. I suppose we could go in circles all day about that.

Urpee
02-03-2011, 02:39 PM
Well I have never claimed that convergence doesn't exist. All I have done is give an actual proof that people's claim that they converge to their actual rating isn't trivially true.

It's not my belief that ladder has non-convergent behavior. It's obvious that it doesn't. That's exactly what I have shown. There is a rather massive difference between the two claims.

To be precise there are two equilibria to be had. One is an actual rate ceiling as you call it. The other, and this I have shown, is an actually balanced team. Now the system encourages balanced teams, so I won't just accept that what I say is some sort of rare singular case and that it is easily discrupted by randomness. You'd have to show that.

You given a hand-wavy argument that randomness will disturb those cases. I don't actually disagree with the rough notion. But it is just plainly not correct to say that if you see someone performing at 50% for a while that they are their true rating and this I have shown.

If you want to give an argument that under a certain kind of likely random disturbance of team balance this is reliably broken, I'd be interested in seeing that proof. But I won't just accept it because you say so or because you have some fitting exceptional anecdotes.

blln4lyf
02-03-2011, 02:49 PM
Matt, no offense but what happens a lot here is that anecdotes are taken to be more valuable than actual math.

I have given an actual proof that the ladder does not converge under certain conditions and that this is a property of the design of the ladder. That neither implies nor means to imply that people cannot raise or drop. What it does prove without any shadow of a doubt is that people playing at a 50% is no sign at all that they are at a rating that reflects their performance.

It doesn't really matter if one can give exceptional cases where people rise drastically. What does matter is that this is what the model actually does.

In any model proposed balln and goose would perform well, wins would still be scored positively, and losses would be scored negatively, so I'm not sure what exactly you are trying to say except repeat the disproven mantra that the scores converge properly.

But yes, I should never have mentioned my own case, because it allows people to make it about me. It isn't.

Look if you like the system as is, that's fine. Here is what I have already said about it:

"I don't want to give the impression that I think that the current system is awful. I don't think it is. I think it's pretty good."

It's kind of hard to try to have nuanced discussion when people climb on barricades the moment one tries to point out properties of what is there.

What your saying would never happen though...at least not anywhere near the extend in which you think it would. Seriously dude I'm not trying to make this about you, nor is matt, or andy. We strongly disagree with you though, and it seems neither of us will budge, but please get off your high horse, its getting old as schit. As for converging, that is why nobo wants to introduce K, as a high K value when you first join ladder would help you reach it, along with it increasing during streaks, etc.

Urpee
02-03-2011, 03:08 PM
I thought the intent of the thread was about math discussions. Now a few posters have tried to contribute to math modeling. I think this is what it should be.

Frankly it matters if people like the system and believe strongly in it, but I really am not arguing this here.

This isn't a matter of budging. If there are mistakes or ideas that can be demonstrated to be wrong they can be corrected.

But I don't have to have the discussion. If devs rather select their input that's fine. Perhaps this never should have been a thread debate to begin with. It's bound to mix interest with analysis.

VipMattMan
02-03-2011, 03:08 PM
Well I have never claimed that convergence doesn't exist. All I have done is give an actual proof that people's claim that they converge to their actual rating isn't trivially true.

It's not my belief that ladder has non-convergent behavior. It's obvious that it doesn't. That's exactly what I have shown. There is a rather massive difference between the two claims.

To be precise there are two equilibria to be had. One is an actual rate ceiling as you call it. The other, and this I have shown, is an actually balanced team. Now the system encourages balanced teams, so I won't just accept that what I say is some sort of rare singular case and that it is easily discrupted by randomness. You'd have to show that.

You given a hand-wavy argument that randomness will disturb those cases. I don't actually disagree with the rough notion. But it is just plainly not correct to say that if you see someone performing at 50% for a while that they are their true rating and this I have shown.

If you want to give an argument that under a certain kind of likely random disturbance of team balance this is reliably broken, I'd be interested in seeing that proof. But I won't just accept it because you say so or because you have some fitting exceptional anecdotes.

The randomness = different teams, the fact that literally NOONE has exactly the same skill set/response methodology as any other person, and the fact that noone is ever 100% accurately rated.

It seems as if you're suggesting that rating has literally no skill dynamic whatsoever. If you're suggesting that despite personal experience and despite seeing the algorithms used, i don't know what to say.

All i can tell you is that the game sure does get alot easier any time my rating dips down too much, and alot harder when it goes too high. I'm sure that 99% of other ladder players have the same experience. This difference in experience directly correlates to my changing rating. As i've had this experience over a long period of time, i've bounced around the same rating range, and the more games i play the closer my win % comes to 50%.

elxir
02-03-2011, 05:21 PM
i play time anchor like most of the time even though it's like my fifth best setup

so that would skew things

Urpee
02-03-2011, 05:34 PM
It seems as if you're suggesting that rating has literally no skill dynamic whatsoever.

I'm saying no such thing at all.

shrode
02-03-2011, 05:58 PM
what planes people play should not be factored into ladder (as far as the code goes, or limiting # of planes). One thing that helps people rise is the ability to play multiple planes and do whatever is needed to help the team win. In ball, i'm a fairly good thermo and do it when my team needs me to. Having that ability has helped me win multiple games that i would have lost. This should continue to be rewarded in the next ladder system.

Rainmaker
02-03-2011, 09:22 PM
I cannot agree with this more.

In fact why not club it into groups? Like say player X plays Loopy 50% & Miranda 50% of the time, he is considered an offence player, then player Y plays (bomber or whale or biplane) he can be considered defense and support. Some players will be more or less same % of all planes but then they can be placed on either team based on their rating.

Then balance teams to have 2 or 3 offense players. Of course there can be nothing done if all 12 players are play offense but clearly that usually does not happen.

Yes I know the "people should play all planes or one heavy, one light plane" argument but in reality not everyone are equally good at all planes.

Point is, the auto balance algorithm can take these statistics into account too.


This isn't as simple as it sounds+your solution only works on ball.
Stormich is right.
You can't use previous statistics of plane use, to predict a player choice for a plane. I mean, you actually could do it, the problem is the effectiveness.
The likelihood is small, because the pick in ladder is situational. If there are already 2 explodets, its not likely that someone will pick a 3rd explodet, but a more agile plane.
I think my suggestion would work better in this case, if you want to start "forcing" players to play real "plane formations".
But as i said, it's only a suggestion, you might think that you want weird setups like 5 explodets team to be available.

As for converging, that is why nobo wants to introduce K, as a high K value when you first join ladder would help you reach it, along with it increasing during streaks, etc.
I've seen that no one is making specific replies to my topics, either you are going over them because you think I'm right, or because you are ignoring them.
Either way, I've already proven this point before.
THE CURRENT SYSTEM ISN'T A CONVERGING ONE.
The current system works pretty much like any MMORPG ranking system, rating the players based on who stacks the biggest amount of neat wins (= total wins - total looses).
The 50% win, when you are at your rating is pure speculation and it's in no way reflecting the real ratings.

On the K value: rather than converging is divergent.
The problem of the original ELO, is that everyone starts at 1500, and to make an educated guess you would need a good amount of games (nearly 100 is good for a 1st approach).
So, an easy way (but wtih HIGH uncertainty) you would use a high K, so his rating varies a lot, after you would gradually make K tend to a real sensitive value to the scale. This way you will polish until you reach a steady rating. This is accomplished, because the player will have good odds against people below his rating, but odds against people above his rating, keeping him always on a +/-50 rating. (if his real rating is 1764; his current could be 1730 +/-50).
The smaller the K, the more accurate is the guess, but the more games you need to have that low uncertainty guess.
The higher the K, the less accurate is the guess (the value could have a +/-100 rating) but the faster is achieved.

chess example:
A new player is given the rating of 1000.
Through his first games he will have a high K value, K=32
After he reaches the 2100 rating, he is given K=24
After he reaches the 2400 rating, he is given K=16

This helps have a "zoom" effect on high ranking values, making the spectrum broader. In the high rankings, a difference of 5 points represents a much greater skill, that a different of 5 points on the 1000 rating.
This is because of K, but also because the skill is measured with a distribution that isn't linear; nor is the comparison used (E formula).

Mind you, the K value can be chosen arbitrarily. Usually it depends on how accurate you want to be with the rating system, and how many games are played.
Chess, is a kind of sport that the player can have a lot of matches through his career so you are able to pick lower values for K.
Also, K a smaller K is less "sensitive" to player improvements, like new tactics (altitude=new plane setups), new technology achieved(altitude=nerf or buffs), etc.
Golf is a sport that doesn't have THAT many "matches" (courses) compared to chess, hence, you have to pick a higher value.

VipMattMan
02-04-2011, 02:40 AM
I've seen that no one is making specific replies to my topics, either you are going over them because you think I'm right, or because you are ignoring them.
Either way, I've already proven this point before.
THE CURRENT SYSTEM ISN'T A CONVERGING ONE.
The current system works pretty much like any MMORPG ranking system, rating the players based on who stacks the biggest amount of neat wins (= total wins - total looses).
The 50% win, when you are at your rating is pure speculation and it's in no way reflecting the real ratings.

Here:
You have a 4 player game. 2 people per team. 2 veteran players have 3,000 rating each. 2 new players have just entered ladder, each with 1500 rating. It turns out that one of these players actually has the skill of a 3,000 rating player, and the other does not. Every game gives or costs you 25 points to your overall rating.

We'll say that to equalize team rating ladder assigns one of these 1500 rating players to each 3k player 100% of the time, but switches which player gets which every other game.

Over time the 1500 player that had a 3k skill level trends towards a 3,000 rating while the other actual 3,000 level players maintain their 50% win rate. During this time the "bad" 1500 rating player trends downwards.

Eventually the "good" 1500 rating player achieves a 3,000 skill rating and can now join the ranks of the other 3k rated players and potentially have the "bad" player assigned to him.

Now - this is where your argument comes in. Say that over time the "bad" player gained enough skill to be as skilled as one of the 3,000 rating players. But everyone's winning 50% of their games at this point. That means that convergence is impossible from a mathematical standpoint and that the previously bad player can NEVER attain a 3,000 rating, despite the fact that he's as skilled as the other players.

This is how you view ladder in its current form, and it makes "sense" from a purely mathematical standpoint. But it's wrong from a reality standpoint.

Throw in 6 player teams, an endlessly changing series of people, players of differing skill levels, some variance of slightly inaccurate rating, and a multitude of other variables, and that previously bad player will have full opportunity to achieve a higher rating.

That human variable is where the convergence in ladder exists, and it appears to be very effective. It may never be something you can quite boil down to "two players with the exact same skill level".

Urpee
02-04-2011, 04:51 AM
Here:
That human variable is where the convergence in ladder exists, and it appears to be very effective.

The idea of math is not to rely on appearance. I think we all can hand-wave a lot. I don't think it will ever settle this.

I won a 20 minute 1900 rated game 6-5 today scoring 3 goals including the game winner. I'm at a 1400 rating now. I think it works just fine obviously.

Because clearly I cannot compete at that level and my score should drop from 1500 to 1400.

In fact the randomness you cite does not help convergence. Yesterday a new player entered and dropped their first 5 game and won their sixth before leaving. I had the bad luck being pair with him 4 times, I won one (his last one) and lost 3. Clearly it's my skill not offsenting a clearly misranked player that is the problem here and not that random factors will actually DROP you rather than help you converge to your score. You yourself cited a similar phenomenon with underrated top players taking over games. Same with those drastically overrated players. They will make you tank. Now for you, you are not in the range where the balancer will balance you with these kinds of folks and assume that you just got a fair shake.


Look I agree that the chance of a 3000 player being matched with a misranked 3000 player is vanishingly low. But the change of a 1500 player being matched with a misranked 1500 is high and this randomness is not clearly in your favor.

And yes I had many 6-5 games today. The autobalancer worked great. But what this really means is that there is a tight margin and no easy wins.

Frankly it seems to me that scores in the range of roughly 1200-1800 are very fuzzy and other factors play a role. It isn't clear at all that factors will efficiently converge people to their true ranking in that range.

here is another way to think about it. There are these 1900 games. Everybody below 1900 is expected to lose more than win in that environment, especially if your argument that convergence works is true. So how does that gel with the claim that people will end up at their skill ceiling as you put it. Above 1900 the pressure is in the right direction, but below 1900 the pressure is in the wrong direction. You are likely to drop more games than win, but that by definition means that your score is below 1500, not below 1900!

That's roughly why things in that range are fuzzy to say the least. The system does not try to converge people to the mean (which now is closer to 1900 than 1500) etc etc.

As skill but a range of other factors and the overall competitiveness of games all contribute here.

But yes, that's me hand-waving. I'd argue it's actually less hand-wavy that what you offer, but it's still hand-waving. So who is right? I think we both are. Sure balln can take over games in which he is drastically underrated, because he will get good team-mates that make his live easy and he will be a force to recon with. But for people who are not top 25 material, I think you are just blind to what the dynamics are. Truth be told virtually any system that sensibly correlates wins with rating will work for the very top end, so sure one can keep this one.

VipMattMan
02-04-2011, 05:25 AM
In fact the randomness you cite does not help convergence. Yesterday a new player entered and dropped their first 5 game and won their sixth before leaving. I had the bad luck being pair with him 4 times, I won one (his last one) and lost 3. Clearly it's my skill not offsenting a clearly misranked player that is the problem here and not that random factors will actually DROP you rather than help you converge to your score. You yourself cited a similar phenomenon with underrated top players taking over games. Same with those drastically overrated players. They will make you tank. Now for you, you are not in the range where the balancer will balance you with these kinds of folks and assume that you just got a fair shake.

I said in my first post that inappropriately ranked players were the main issue of ladder. That includes new players who may repeatedly get attached to you causing you to unfairly drop (or rise). Highly rated players get stuck with those players too. I usually end up quitting ladder for the day to avoid that circumstance when i see it happening.

The fact that HE lost the majority of his games the second he entered ladder is an indication that convergence is in fact at play. The flaw is that he by sheer bad luck may be repeatedly assigned to your team while that convergence occurs.

How to fix that issue has been pretty much been the golden question of ladder, and really the direction i think nobo wanted this thread to go in.

mikesol
02-04-2011, 05:44 AM
The idea of math is not to rely on appearance. I think we all can hand-wave a lot. I don't think it will ever settle this.

I won a 20 minute 1900 rated game 6-5 today scoring 3 goals including the game winner. I'm at a 1400 rating now. I think it works just fine obviously.

Your point? Not to be harsh but I watched you play today and you do not deserve to be rated 1900. For fun I made a smurf account and played a bunch of games. I'm already up to nearly 1900 with about an 80% win percentage in two days. Does that mean our system is working? NO. You are using a logical fallacy. You are making a hasty generalization. You are trying to rationalize why your score is so bad without looking at the bigger picture. Scoring 3 goals could mean you flew towards the goal and had someone pass it to you. It does not mean you played at the highest caliber. Your team-mates could have carried you throughout your whole game. You could have been playing better than normal. The other team had players who were at your skill level, too. They could have been having an off day.

Because clearly I cannot compete at that level and my score should drop from 1500 to 1400.

In fact the randomness you cite does not help convergence. Yesterday a new player entered and dropped their first 5 game and won their sixth before leaving. I had the bad luck being pair with him 4 times, I won one (his last one) and lost 3. Clearly it's my skill not offsenting a clearly misranked player that is the problem here and not that random factors will actually DROP you rather than help you converge to your score. You yourself cited a similar phenomenon with underrated top players taking over games. Same with those drastically overrated players. They will make you tank. Now for you, you are not in the range where the balancer will balance you with these kinds of folks and assume that you just got a fair shake.

This is indeed a flaw. Newer players will shake things up a bit. Your skill can help - but ultimately you're screwed in this scenario. Hence why we are trying to figure out a way to make new players get to their rating faster. Regardless, this is still an infinite conditionally converging series (albeit one that takes a long time to converge). The proof for this sucks but you can read about converging series here. (http://en.wikipedia.org/wiki/Convergent_series)


Look I agree that the chance of a 3000 player being matched with a misranked 3000 player is vanishingly low. But the change of a 1500 player being matched with a misranked 1500 is high and this randomness is not clearly in your favor.

Explain to me why your chance of being matched with a misranked 1500 is high please. From my experience watching ladder (which I do all the time) - there are not a ridiculous amount of misranked 1500 players. If they are - it's because they've only played a handful of games. Take [g6]prince - his rating is probably around 900 - but since he's only played like 10 games (and lost most of them) - he's still up around 1300's.

And yes I had many 6-5 games today. The autobalancer worked great. But what this really means is that there is a tight margin and no easy wins.

Ok - so you're saying our auto-balance is working? This doesn't do anything to back up your point =X

Frankly it seems to me that scores in the range of roughly 1200-1800 are very fuzzy and other factors play a role. It isn't clear at all that factors will efficiently converge people to their true ranking in that range.

People start off at 1500. It takes them awhile to eventually reach their actual level. Once again I feel like you're just making stuff up to fit your point.

here is another way to think about it. There are these 1900 games. Everybody below 1900 is expected to lose more than win in that environment, especially if your argument that convergence works is true. So how does that gel with the claim that people will end up at their skill ceiling as you put it. Above 1900 the pressure is in the right direction, but below 1900 the pressure is in the wrong direction. You are likely to drop more games than win, but that by definition means that your score is below 1500, not below 1900!

Wrong. Everybody below 1900 is not expected to lose more than win in that environment BECAUSE they are paired with high level players who can carry them. If I had to do a 2v2 with you against other people - you would have the same chance of winning as me. It is really quite basic statistics. Everyone on a team has the same probability of winning during a specific game. Think of it like this: let's say I have a team of 5 people and you have a team of 5 people. I'm going to flip a coin and if it's heads my team wins and if it's tails your team wins. Everyone on my team has the exact same chance of winning despite the fact that I'm doing the action.

If that's not a good enough example think of a track team. Let's say there are 5 people per team. Some are super awesome and some suck. The super awesome person might play super awesome but he does not have a higher probability of winning than his team-mates because they are all on a team. It is their combined total score that counts. In Altitude - it is the combined play of everything that counts.


That's roughly why things in that range are fuzzy to say the least. The system does not try to converge people to the mean (which now is closer to 1900 than 1500) etc etc.
What? You do realize that a mean is where the average converges at, right? Not everyone should be at the mean because not everyone is equal to the mean. If you are a below average player - you will be below the mean. If you are an above average player - you will be above the mean.


As skill but a range of other factors and the overall competitiveness of games all contribute here.

Quite true. Hence why we want to try and make this better! Many of the ideas presented are interesting and have given us some thoughts. Nonetheless - your constant claims of not converging or of different probabilities are mistaken.

But yes, that's me hand-waving. I'd argue it's actually less hand-wavy that what you offer, but it's still hand-waving. So who is right? I think we both are. Sure balln can take over games in which he is drastically underrated, because he will get good team-mates that make his live easy and he will be a force to recon with. But for people who are not top 25 material, I think you are just blind to what the dynamics are. Truth be told virtually any system that sensibly correlates wins with rating will work for the very top end, so sure one can keep this one.

I'm sorry for sounding harsh but I've been watching this thread for awhile and it's really been bugging me.

Rainmaker
02-04-2011, 06:13 AM
Nobo:
Could you please post the code of the autobalance system?
I'm very interested on knowing how it works.

VipMatMan: I read your thread, but I'm not interested analizing irreal ad-hoc situations. There is no point to it.
If we would expect to try both models, the best way to test irregularities for the systems, is to run both model at the same time. (For season 2, i ll ask nobo if he lets me do it).
Moreover I would only be able to make an educated case (and not a statement) on the facts, because I would need to know how is that teams are autobalanced. I know just 2 or 3 algorithms, and each works differently
ie: 10 players are ranked from best to worse (according to rating)
TeamA:
1
10
3
8
6

TeamB:
2
9
3
7
5

6 and 5 could be switched in need to keep rates evens.

Urpee
02-04-2011, 07:07 AM
you do not deserve to be rated 1900

I never said that I deserve a 1900 rating. I do say that 1400 isn't correct.

For fun I made a smurf account and played a bunch of games. I'm already up to nearly 1900 with about an 80% win percentage in two days. Does that mean our system is working? NO.

Thank you. That is precisely what I have been arguing. That you climb ladder like you do does not mean the system is working! Precisely correct.

You are using a logical fallacy. You are making a hasty generalization.

Really? I didn't claim that balln's rising score or goose's rising score showed the system worked fine with respect to convergence. I.e. I didn't make the generalizations you lament.

I agree with you. We shouldn't generalize.

You are trying to rationalize why your score is so bad without looking at the bigger picture.

Not at all. I have given the math. But noone wants to argue math, rather they are happy to make it about me.

Scoring 3 goals could mean you flew towards the goal and had someone pass it to you. It does not mean you played at the highest caliber.


Technically I agree with you. Do you think what you just said is also true for a tight game that goes on for 20 minutes, where clearly flying to the goal and scoring isn't all that is going on?

Note that I also don't claim I played at the highest caliber. What I do claim is that I was on the winning team of a competitive match and clearly contributed. How you talk about generalizations, you understand that the converse is also true. Just because all that does also meant that it's not clear that you can claim I did NOT perform at the highest caliber. You can think of my performance what you may. Fact is that I won a 1900 average game that was very competitive. Now I cannot prove anything here. Just like any other anecdote doesn't prove anything here. That was the point!

Your team-mates could have carried you throughout your whole game. You could have been playing better than normal. The other team had players who were at your skill level, too. They could have been having an off day.

Exactly! But at the same time I could be underrated. Who be the judge? I'm exactly giving hand-waving arguments because that's what I have been given. I agree that it doesn't say anything.

We should be discussing math. Not handwaving.


Regardless, this is still an infinite converging series (albeit one that takes a long time to converge).

How is that a better claim than my handwaving?

I have shown that any balanced matchup that truly scores 50% will not move the score of any player, regardless what the score is. To show convergence you have to show that the pressure on the score is sufficient and not overridden by other factors.

Explain to me why your chance of being matched with a misranked 1500 is high please. From my experience watching ladder (which I do all the time) - there are not a ridiculous amount of misranked 1500 players. If they are - it's because they've only played a handful of games. Take [g6]prince - his rating is probably around 900 - but since he's only played like 10 games (and lost most of them) - he's still up around 1300's.

Sure. It is indeed hard to have a strong winning record and not many people have 3000 scores. Hence a player being a gross mismatch is minimal. However a player at 1500 has many factors to consider. (a) there are many people in the range (b) their score may all be transitory or not at convergence.

So clearly there is a higher risk of having the 1500 range mismatched than the 3000 range.

I have no clue what the prince example tries to illustrate.


Ok - so you're saying our auto-balance is working? This doesn't do anything to back up your point =X


Which point? The point that at a perfectly balanced field the ratings by definition cannot converge? No, the fact that the auto-balancer works well actually is part of what I'm saying causes the issue.

But clearly we hear cases where it is not working.

Look if we assume that indeed there is a huge skill gradient between 1800 and 1200 then yes, the autobalancer would have a hard time creating balanced games. But in reality just the fact that the mean of the game is at 1900 and the pool it picks from is small and in that case was virtually all veterans does to into it too, right?

Have you considered what kind of player variability we actually have on ladder with those games having a mean of 1900 sometimes exceeding 2000?

People start off at 1500. It takes them awhile to eventually reach their actual level. Once again I feel like you're just making stuff up to fit your point.

Can you articulate what you think my point even is? My point is that the balancing and the lack of using the team mean in a game means that theoretically there is no convergence. You claim that there is. I'm happy to see the proof or at least some stronger argument than what I have seen.

Wrong. Everybody below 1900 is not expected to lose more than win in that environment BECAUSE they are paired with high level players who can carry them.

I don't think you understood the context of the argument correctly. If there is convergence happening people gotta win/lose in accordance with their rating, else they wouldn't have it.

And once that convergence actually happens you get this paradox. The players above 1900 converge to their mean and those below converge to theirs. But there is no way to make that work, given that the means mismatch.

If I had to do a 2v2 with you against other people - you would have the same chance of winning as me. It is really quite basic statistics.

Yes, and in that case we'd have the same rating. It's a little less basic statistics if you mix those ratings while maintaining the mean scores (say 1900) and claim that you can get arbitrary convergence with 1500 reflecting a 50% win ratio.

Everyone on a team has the same probability of winning during a specific game.

Yes but not everybody was equally expected to win. A player with 1500 rating is expected to win 50% of games. A player with 1900 rating is expected to win more.

That too is basic statistics. While players who do team up on a specific game will have the same outcome, the expectation for the outcome was different, hence the adjustment of the expectation (at least in that model) should be different. Currently it's not.

And the consequence is exactly as I describe. People who play at 50% don't have their score adjusted, and it doesn't matter if they are at their actual score or not. Any score will show that behavior.

You can actually go into ladder and test this hypothesis. Check if 1600 players have a 50% chance to win in games of a rating a mean rating of 1600 and what their win % is for 1900 rated games. You basically claim that we will see no difference at all. I'd be really curious if you are correct.

I basically tell you that you will observe that players above the mean team rating will win more than those below, and that this is not only likely but required for the math to work out. The only case where you won't observe this is when the team mean coincides with 1500.

Now I have given you a test. Go prove me wrong.

What? You do realize that a mean is where the average converges at, right? Not everyone should be at the mean because not everyone is equal to the mean. If you are a below average player - you will be below the mean. If you are an above average player - you will be above the mean.

Clearly you didn't get the point I made. See above.

your constant claims of not converging or of different probabilities are mistaken.

Sorry. I still haven't heard an argument that shows this. Do you deny that scores do not move if people play at 50% no matter their actual seeded score?

I'm sorry for sounding harsh but I've been watching this thread for awhile and it's really been bugging me.

I have already offered that I really don't have to argue this. If devs do not want this discussion let me know and I'll do something else. I really just want to help, and I just don't think not discussing the properties the system obviously have is a way to do it. But yes it is hard to have this discussion and it is obvious that I get a lot of pushback except from IntoTheWalls who is roughly in the same page as me.

But as said I don't need to aggravate at all. Just say the word and we won't be having this discussion at all.

nobodyhome
02-04-2011, 07:11 AM
ok:

class Autobalancer {
static function balance($players_playing) {
Log::log_helper("Balancing players...");
for ($i = 0; $i < count($players_playing); $i++) {
Log::log_helper("Player $i: " . $players_playing[$i]);
}

$playersPerTeam;
$mode;
Log::log_helper("count players playing " . count($players_playing));
if (count($players_playing) == 10) {
$playersPerTeam = 5;
$mode = "tbd";
}
else if (count($players_playing) == 12) {
$playersPerTeam = 6;
$mode = "ball";
}
else {
Log::log_error("Invalid number of players sent to autobalancer.");
}

Log::log_helper("players per team: $playersPerTeam");

$match = array();
for ($i = 1; $i < $playersPerTeam + 1; $i++) {
$match['team1_player' . $i] = $players_playing[2 * ($i - 1)]->vaporId;
$match['team1_player' . $i . '_prev_rating'] = getPlayerRating($mode, $players_playing[2 * ($i - 1)]->vaporId);

$match['team2_player' . $i] = $players_playing[(2 * ($i - 1)) + 1]->vaporId;
$match['team2_player' . $i . '_prev_rating'] = getPlayerRating($mode, $players_playing[(2 * ($i - 1)) + 1]->vaporId);
}

print_r($match);

$cAutobalance = new CAutobalance();
$cAutobalance->Balance($match);

print_r( $cAutobalance->BalancedTeam1);
print_r($cAutobalance->BalancedTeam2);

$ret_balanced_teams = array();
$ret_balanced_teams[0] = new Team();
$ret_balanced_teams[1] = new Team();

$balanced_team_1_players = $cAutobalance->BalancedTeam1->players;
for ($i = 0; $i < count($balanced_team_1_players); $i++) {
for ($j = 0; $j < count($players_playing); $j++) {
if ($players_playing[$j]->vaporId == $balanced_team_1_players[$i]->vaporid) {
$ret_balanced_teams[0]->players[] = $players_playing[$j];
break;
}
}
}
$balanced_team_2_players = $cAutobalance->BalancedTeam2->players;
for ($i = 0; $i < count($balanced_team_2_players); $i++) {
for ($j = 0; $j < count($players_playing); $j++) {
if ($players_playing[$j]->vaporId == $balanced_team_2_players[$i]->vaporid) {
$ret_balanced_teams[1]->players[] = $players_playing[$j];
break;
}
}
}

return $ret_balanced_teams;
}
}
class CPlayer_AB {
public $id;
public $vaporid;
public $elo;
public $name;

public function __construct($id,$vaporid='',$elo=0) {
$this->id = $id;
$this->vaporid = $vaporid;
$this->elo = $elo;
}
}
class CTeam_AB {
public $players;
public $elo;
public $variance;

public function __construct($players,$elo = 0,$variance = 0) {
$this->players = $players;
$this->elo = $elo;
$this->variance = $variance;
}
}

class CAutobalance {
public $BalancedTeam1;
public $BalancedTeam2;

public function __construct() {
/*
****ERY:
totalplayers in JS contains the players in the order they are in current_players
totalplayers in PHP contains the players in the order winner1,loser1,winner2,loser2 etc
*/
}

public function Validate(&$match) {
$maxPlayers = 5;
if (isset($match['team1_player6'])) $maxPlayers = 6;
$maxPlayers++;
$team1 = new CTeam_AB(array(),0,0);
$team2 = new CTeam_AB(array(),0,0);
$playerId = 0;

for($i=1;$i<$maxPlayers;$i++) {
$team1->players[] = new CPlayer_AB($playerId,$match['team1_player'.$i],$match['team1_player'.$i.'_prev_rating']);
$playerId++;
$team2->players[] = new CPlayer_AB($playerId,$match['team2_player'.$i],$match['team2_player'.$i.'_prev_rating']);
$playerId++;
}
for($i=0;$i<count($team1->players);$i++) {
if ($team1->players[$i]->vaporid == $this->BalancedTeam1->players[0]->vaporid) {
//at this point we have determined that a player on the winning team was on AutoBalances BalancedTeam1. this means the rest of his team should be here
$nAllies = 0;
//if $nAllies is 5 by the end of these loops, we have found his other 4 teammates on BalanceTeam1, and the match was autobalanced
for($j=0;$j<count($team1->players);$j++) {
for($k=0;$k<count($this->BalancedTeam1->players);$k++) {
if ($team1->players[$j]->vaporid == $this->BalancedTeam1->players[$k]->vaporid) {
$nAllies++;
break;
}
}
}
if ($nAllies == $maxPlayers) return true;
else return false;
}
elseif ($team1->players[$i]->vaporid == $this->BalancedTeam2->players[0]->vaporid) {
//team1_player[$i] is on BalancedTeam2, this means the rest of his team is here as well
$nAllies = 0;
//if $nAllies is 5 by the end of these loops, we have found his other 4 teammates on BalanceTeam1, and the match was autobalanced
for($j=0;$j<count($team1->players);$j++) {
for($k=0;$k<count($this->BalancedTeam2->players);$k++) {
if ($team1->players[$j]->vaporid == $this->BalancedTeam2->players[$k]->vaporid) {
$nAllies++;
break;
}
}
}

if ($nAllies == $maxPlayers) return true;
else return false;
}
}

return false;
}
public static function SetTeamAverageElo(&$team) {
$avg = 0;
for($i=0;$i<count($team->players);$i++) {
$avg += $team->players[$i]->elo;
}
if (count($team->players) == 0) return $team->elo = 0;
$team->elo = $avg/count($team->players);
}

private function FillOpposingTeam(&$team1,$totalplayers) {
$i = 0;
$j = 0;
$team2 = new CTeam_AB(array(),0,0);

for($i;$i<count($totalplayers);$i++) {
$exists = false;
$playerId = $totalplayers[$i]->id;

for($j = 0;$j<count($team1->players);$j++) {
if ($team1->players[$j]->id == $playerId) {
$exists = true;
break;
}
}

if (!$exists) $team2->players[] = $totalplayers[$i];
}

return $team2;
}

public static function SetTeamVariance(&$team) {
$team->variance = ($team->elo * -1) * $team->elo;
for($i=0;$i<count($team->players);$i++) {
$team->variance += ($team->players[$i]->elo * $team->players[$i]->elo)/count($team->players);
}
}

private function GetMisMatch(&$team1,&$team2) {
return (abs($team1->elo - $team2->elo) + abs($team1->variance - $team2->variance) / 150000);
}

public static function ExpectedWin($winners,$losers) {
$rwinners = new CTeam_AB(array(),0,0);
$rlosers = new CTeam_AB(array(),0,0);
$playerId = 1;
foreach($winners as $winner => $stats) {
$rwinners->players[] = new CPlayer_AB($playerId,0,$stats['prev_rating']);
$playerId++;
}
$playerId = 1;
foreach($losers as $loser => $stats) {
$rlosers->players[] = new CPlayer_AB($playerId,0,$stats['prev_rating']);
$playerId++;
}

CAutobalance::SetTeamAverageElo($rwinners);
CAutobalance::SetTeamAverageElo($rlosers);

//catch an eventual garbage result
$xyz = pow(2,1.1);
$rPercent = 1/(1 + pow(10,($rlosers->elo - $rwinners->elo)/400));
$rPercent = round($rPercent * 100);

$lPercent = 1/(1 + pow(10,($rwinners->elo - $rlosers->elo)/400));
$lPercent = round($lPercent * 100);

return array($rPercent,$lPercent);
}

public function Balance(&$match) {
$totalplayers = array();
$playerId = 0;
$nPlayers = 5;
if (isset($match['team1_player6'])) $nPlayers = 6;
$nPlayers++;
for($i=1;$i<$nPlayers;$i++) {
$totalplayers[] = new CPlayer_AB($playerId,$match['team1_player'.$i],$match['team1_player'.$i.'_prev_rating']);
$playerId++;
$totalplayers[] = new CPlayer_AB($playerId,$match['team2_player'.$i],$match['team2_player'.$i.'_prev_rating']);
$playerId++;
}

$matchBadness = 1000000;
$team3 = new CTeam_AB(array(),0,0);
$team4 = new CTeam_AB(array(),0,0);
$i=0;
$j=0;
$k=0;
$l=0;
$m=0;
$n=0;

for($i=1;$i<6;$i++) {
for($j=$i+1;$j<7;$j++) {
for($k=$j+1;$k<8;$k++) {
for($l=$k+1;$l<9;$l++) {
for($m=$l+1;$m<10;$m++) {
if ($nPlayers == 7) {
//we're in BALLER MODE
for($n=$m+1;$n<11;$n++) {
if (isset($team1)) unset($team1);
if (isset($team1)) unset($team2);

$team1 = new CTeam_AB(array(),0,0);
$team2 = 0;

$team1->players[] = $totalplayers[$i];
$team1->players[] = $totalplayers[$j];
$team1->players[] = $totalplayers[$k];
$team1->players[] = $totalplayers[$l];
$team1->players[] = $totalplayers[$m];
$team1->players[] = $totalplayers[$n];


$team2 = $this->FillOpposingTeam($team1,$totalplayers);

$this->SetTeamAverageElo($team1);
$this->SetTeamVariance($team1);
$this->SetTeamAverageElo($team2);
$this->SetTeamVariance($team2);

$mismatch = $this->GetMismatch($team1,$team2);

if ($mismatch < $matchBadness) {
$team3 = $team1;
$team4 = $team2;
$matchBadness = $mismatch;
}
}
continue;
}
if (isset($team1)) unset($team1);
if (isset($team1)) unset($team2);

$team1 = new CTeam_AB(array(),0,0);
$team2 = 0;

$team1->players[] = $totalplayers[$i];
$team1->players[] = $totalplayers[$j];
$team1->players[] = $totalplayers[$k];
$team1->players[] = $totalplayers[$l];
$team1->players[] = $totalplayers[$m];


$team2 = $this->FillOpposingTeam($team1,$totalplayers);

$this->SetTeamAverageElo($team1);
$this->SetTeamVariance($team1);
$this->SetTeamAverageElo($team2);
$this->SetTeamVariance($team2);

$mismatch = $this->GetMismatch($team1,$team2);

if ($mismatch < $matchBadness) {
$team3 = $team1;
$team4 = $team2;
$matchBadness = $mismatch;
}
}
}
}
}
}
//loops done
$this->BalancedTeam1 = $team3;
$this->BalancedTeam2 = $team4;
}
}

?>

Tekn0
02-04-2011, 09:23 AM
Stormich is right.
You can't use previous statistics of plane use, to predict a player choice for a plane. I mean, you actually could do it, the problem is the effectiveness.
The likelihood is small, because the pick in ladder is situational. If there are already 2 explodets, its not likely that someone will pick a 3rd explodet, but a more agile plane.
I think my suggestion would work better in this case, if you want to start "forcing" players to play real "plane formations".
But as i said, it's only a suggestion, you might think that you want weird setups like 5 explodets team to be available.


Eg: Sunaku, Goose, Alasard, Moser.Steve end up on the same team, and the other team has Shmo, Ball'n, SuperFifou, ufo. My point is, distribute the randas and whales. (Yes they are capable of playing other planes but they are probably best at whale and randa/loopy respectively, at least from what I've seen).

I think you're missing the point completely. The aim is NOT to force people to play planes (if possible). If for example ladder is filled with 10 randas, nothing can be done, but in a scenario where you can somewhat roughly balance light/heavy planes per team it might be better.

If someone plays loopy/randa 99% of the time, forcing them to play whale and screwing up the team when the other team has 3 whales who can be balanced out (swapped) is the point.

The algorithm does this ONLY when it is possible, like many light planes end up on same team and we take that into account and try to re-balance.

It will not have any effect on those who play a variety of planes.

Can you comment on the algorithm I posted? Maybe I didn't make it as clear?

beefheart
02-04-2011, 02:10 PM
As for proof, I've already stated my ball TA story how I climbed fairly quickly at 60% or above when I changed my playstyle, and also when I introduced my smurf to ball ladder I hit 2200 rating or something with like a 70% win percentage, showing that random variables aside, if you are underrated, you will climb and make that up, and it won't take as long as Urpee suggested.

I think this argument cannot be generalized. Balln is actually very good, probably the best atm with a rating around 3000. Ofcourse once you are 1500 you will rise pretty fast to top 10 because you stand out of the crowd a lot. And i think that counts for ca. top 40 players

Take my example I would estimate my deserved rank would be around 50-200. But i cannot be decisive enough to actually secure this position all the time. My variance in rank is so incredibly big: i went from 40 to 120 to 60 to 400 to 30 to 500 to 120 to 800+ currently and i played almost 700 games. The only conclusion i can draw is that either my performance is very volatile or that ladder does a very bad job in pinpointing my rank.

So I think ladder can pretty precise estimate top 50 rankings (because they have unique skills that make them able to win almost any match), but below that it lacks this ability.

Edit: In my view (although i didnt read the whole discussion) the main reason for this that ladder cannot quickly enough find an (approximate) equilibrium, due to the influx of new players and the slow convergence rate to every individual players true ranking. The improved ELO system should try to improve on these two points imo.

Urpee
02-04-2011, 02:26 PM
Thanks nobo for the balancing algorithm. It's a brute force search for the lowest imbalance team pair, it's clearly will find the minimum.

The matching criteria is:

|Team1AvgElo-Team2AvgElo|+|team1Variance-Team2Variance|/150000

Variance = E[Players^2]-TeamMean^2

Basically the balancer will minimize elo and variance. I assume that 150000 is heuristic (roughly just under 400 points if converted in an approximate equivalent stdv). This makes variance a weak weight in the equation when the Elo scores are sufficiently different, but it can be the dominant score if the elo scores are the same or very close.

Basically two teams being equal the balancer will try to minimize the difference in spread between the teams.

I'm actually not sure if this really is the correct thing to do but it's interesting.

That means that an extremely good player is actually not necessarily matched with the extremely worst player, because that would not minimize variance in that team. To be more accurate, if the variance term is dominant (or the only nonzero term), then this optimizes against the best player being matched with the worst. This is because matching the worst player with the other team is going to have lower difference in team variance.

But all that only kicks in if the team matches have a close to ideal ELO match.

I would actually favor a somewhat more complicated scheme, that uses the difference in pair-wise elos rather than this.

Assume that teams are sorted in order of rating:

(|Team1[1] - Team[1]| + |Team1[2] - Team2[2]| ... |Team1[6] - Team[6]|)/(6*380)

That is rather than put pressure on variance, put pressure on team-wise pairing. This will make it less likely that teams end up with more people in a certain rating range than another if that is avoidable and does not have the pathology of actually trying to prevent the best players to be matched up with the worst.

Again this is scaled down to only kick in if teams are very close in Elo score to begin with.

An alternative scheme is what I proposed earlier. But the downside of that one is that it does not have probable optimality with respect to ELO difference.

(Again, everything I say is descriptive or a suggestion, don't take it as anything else. Discard at will).

mikesol
02-04-2011, 05:09 PM
Urpee -

Rather than try and respond to each and every sentence you wrote I'm going to try a different approach. Please confirm or deny if these points are true.

You believe:

That you deserve to be rated significantly higher than 1400.
That players on a team have a different probability of winning.
That this algorithm does not converge.


If my assumptions are not correct please say so.

My responses based on those assumptions:

You do deserve to be rated at 1400. After thousands of hours of play time, me and most of the other admins can tell pretty well whether people are over-rated or under-rated. This comes with experience. You frankly are not an above average player. Looking at your score and saying "Hey that's not what I think I deserve" is committing a logical fallacy. You use it but then say you're not making any generalizations... This is not hand-waving. There is no magic to this. More experienced players simply know more about the game. No math is needed. There is not a formula I can give you to show you how bad or good you are. Nonetheless, people can watch you and be able to rank you accordingly. Passing this off as simple hand-waving is rather silly.

I already told you that the proof that this series converges is very long and very hard to type into here. I linked you to a wonderful wikipedia article explaining converging series mathematically. Basically you prove that this series converges by proving that the sum from 1 to infinity of (1 / (1 + 10^((2a - x)/400) converges. If you wish to counter me please feel free. However, you have given no such proof that this does not converge.

In response to players having a different probability of winning: Let mu denote the mean skill of a player and sigma the spread. The greater the difference between two player's mu values, assuming their sigma values are similar - the greater the chance of the player with the higher mu value performing better in a game.

This principle holds true in the TrueSkill ranking system. BUT, this does not mean that the players with the larger mu's are always expected to win. Rather, this means that their chance of winning is higher than that of the players with the smaller mu's. The team's skill is assumed to be the sum of the skills of the players. In other words, if you have a team with a combined total score of 8000 - you'd expect them to do better than a team with a combined total score of 7500.

This enables players to be compared for relative chance of drawing. The more even the skills of match participants, the more likely it is that this configuration will end up in a draw or a very close game. Hence making it more interesting and fun to watch for every participant.

However, because some players actually deserve to be higher or lower - matching teams at equal scores let's this play out. If a team is scored at 7500 but one player is really under-rated - their team will have a higher chance of winning compared to the other 7500 score team.

Edit: I think I'm starting to understand what we're both saying. Are you saying that before you even know your team-mates, a player who is rated 2200 would be expected to win more games with an average score of 1900 than a player with 1700? That's one of those weird statistics things. A 2200 player is not expected to win more games necessarily (in the future - they already have won more games in the past). If the 2200 player has a 60% win and the 1700 has a 40% win then yes that would be true. However, if the 2200 ranked player has a 51% win and the 1700 ranked player has a 75% win - the 1700 ranked player would have a higher chance of winning in this game because of the auto balance algorithm. Remember the key point here - teams are balanced around having a similar total number of points. If someone has won 75% of their games and is ranked 1700 - they are under-rated or their team-mates have been under-rated for most of their games (enough to make up for them). As most people are appropriately rated the later is most likely not the case.


I invite you to show me the statistics or mathematical proofs to counter my claims. You have done no such thing. Yet, you continue to claim like you have proven something. Please link me to the post you claimed to give a mathematical proof in. Maybe I'm just missing the one where you counter my points?


In regards to your responses. While we do indeed want this to be a place to bounce ideas off of each other - you are simply not understanding the core principles of the system. You keep going off on tangents that are irrelevant.

Let me give you an example: You say "Do you deny that scores do not move if people play at 50% no matter their actual seeded score." This shows a vast amount of ignorance with our system. If people are continually winning 50% of their games that means they stay at roughly the same score and our system has converged. If people are not winning 50% of the games - then they are still working their way towards that. The point of whether or not this is an accurate convergence is irrelevant to this claim. In fact, you're contradicting yourself by saying it converges and then telling me you have proved otherwise.

Edit: Of course if we ignore these points - I do think your variance idea proposed in the above post is interesting.

Duck Duck Pwn
02-04-2011, 05:36 PM
Players with higher ratings should have a marginal 50% win percentage, due to auto-balancer. They would have a higher overall win percentage to have reached convergence on their true ranking, but once that is reached, the fact that they are "correctly" ranked means that the contributions that they will add to their team are accurately modeled. This is ignoring improvement/deterioration, overrated/underrated players, plane comp, etc, because it is just as likely to be paired with someone improving or is over or underrated as it would be for them to be on the opposing team (any losses you suffer will most likely be recouped afterwards, as you would then be underrated, not to mention that there is still the possibility that you yourself are improving/deteriorating).

Can someone please explain to me why this logic is wrong, short of the argument "you cannot accurately model skill on a ladder"? Because if that is the response, then what is the point in a ladder >_>

mikesol
02-04-2011, 05:53 PM
Players with higher ratings should have a marginal 50% win percentage, due to auto-balancer. They would have a higher overall win percentage to have reached convergence on their true ranking, but once that is reached, the fact that they are "correctly" ranked means that the contributions that they will add to their team are accurately modeled. This is ignoring improvement/deterioration, overrated/underrated players, plane comp, etc, because it is just as likely to be paired with someone improving or is over or underrated as it would be for them to be on the opposing team (any losses you suffer will most likely be recouped afterwards, as you would then be underrated, not to mention that there is still the possibility that you yourself are improving/deteriorating).

Can someone please explain to me why this logic is wrong, short of the argument "you cannot accurately model skill on a ladder"? Because if that is the response, then what is the point in a ladder >_>

Players should have a 50% marginal rating once they have played many games and have reached an appropriate score. Players with higher rating will have about a 50% win rating along with players that are lower rated.

You are correct that once someone is accurately ranked - their contributions are modeled accurately (provided you ignore all of those crazy hard things to model).

The point of ladder (well not the only point) is that people get better and get worse. If I take 5 months off and come back - I'd probably be a lot worse than I once was and would proceed to lose many games. If I've been repeatedly improving - you'd expect my score to increase. Take donk for instance. When donk first started ladder he was not the most amazing player ever. He did pretty well but eventually he got amazing and shot up to the very top of ladder. When he stopped trying and didn't play as much - he dropped down. Ladder shows you a somewhat accurate portrayal of where you are compared to other people provided that people are actually playing.

Unfortunately, one of the issues with this ladder is you have people like me who never play anymore who are high up in the ranking. It's hard for lower ranked people to get to the top if you have people that quit once they are there. Hence why I think we should have a deflation factor where if you don't play for a month or so you start to lose points.

VipMattMan
02-04-2011, 06:23 PM
I think this argument cannot be generalized. Balln is actually very good, probably the best atm with a rating around 3000. Ofcourse once you are 1500 you will rise pretty fast to top 10 because you stand out of the crowd a lot. And i think that counts for ca. top 40 players

Take my example I would estimate my deserved rank would be around 50-200. But i cannot be decisive enough to actually secure this position all the time. My variance in rank is so incredibly big: i went from 40 to 120 to 60 to 400 to 30 to 500 to 120 to 800+ currently and i played almost 700 games. The only conclusion i can draw is that either my performance is very volatile or that ladder does a very bad job in pinpointing my rank.

So I think ladder can pretty precise estimate top 50 rankings (because they have unique skills that make them able to win almost any match), but below that it lacks this ability.

Edit: In my view (although i didnt read the whole discussion) the main reason for this that ladder cannot quickly enough find an (approximate) equilibrium, due to the influx of new players and the slow convergence rate to every individual players true ranking. The improved ELO system should try to improve on these two points imo.

The reason your rank varies so much is that there are many many many more people in your rating range.

I had a recent rating peak of 3190 and a recent low of around 2500. I fell about 15 spots in that process. Difference of about 700 rating.

The 40th ranked player has 2150, and the 800th ranked player has a 1400 rating. Difference of 750 rating.

That variation of imperfect rating is going to occur for everyone in accordance with the dynamics of ladder at the times you play. The idea is that over time your response to those dynamics put you in the right approximate rating-range. Obviously as you play within your current rating range your overall rank is probably going to fluctuate wildly.

elxir
02-04-2011, 06:24 PM
Eg: Sunaku, Goose, Alasard, Moser.Steve end up on the same team, and the other team has Shmo, Ball'n, SuperFifou, ufo. My point is, distribute the randas and whales. (Yes they are capable of playing other planes but they are probably best at whale and randa/loopy respectively, at least from what I've seen).


this is immaterial. people like myself who are good at 4-5 planes can be higher rated solely based on that fact. we can adjust to different types of teammates whereas other players cannot.

you are trying to make up for a players weakness through the rating system. that's wrong.

Premier Stalin
02-04-2011, 07:42 PM
I dont think hes trying to say that all, just that it might help to balance teams if a players plane preference was involved after the initial points based system has been applied.

Altho at the same time, it is better for competition to make people vary how they have to play, in order to be the most effective for your team.

elxir
02-04-2011, 08:05 PM
I dont think hes trying to say that all, just that it might help to balance teams if a players plane preference was involved after the initial points based system has been applied.

Altho at the same time, it is better for competition to make people vary how they have to play, in order to be the most effective for your team.

that's exactly the point he made and i then rebutted, yes.

someone who can only be effective with one plane will cause their team to lose, sooner or later, due to that lack of versatility. attempting to rectify that within the ladder formula is foolish as it should be a pure player-based decision as to what they can and cannot do. your rating should reflect how good you are, overall, not how good you are with a flak cannon bomber and a team tailored to fit it.

Urpee
02-04-2011, 10:02 PM
1) I don't know what significantly is supposed to mean. My rating/ranking is secondary though. I thought the ideas was to understand and improve ladder. I used my case to try to illustrate. That was obviously a mistake because we discuss me rather than what we should discuss.
2) No. Players on a team have different expectations to win. Probability isn't the right word but in spirit it is roughly the same I guess. But I am not saying that the outcome will be different, just that the expectation for them to win is different.
3) No. I say that there is a pathology when the team is perfectly balanced in that one can proof that then no rating changes no matter how it is seeded. This means that there is no convergence in this case. Clearly some people converge quite well. So let me repeat I do not say there is no convergence. I say that at the very point that the system tries to reinforce (50%) there is no convergence.

Let me actually elaborate on this point.

There are two types of systems dealing with ties.

Those that converge people to their true rating even when they play 50% and those that don't do anything if they play 50%. We have the latter. The former is possible. Right now there is this assumption that a player performing at 50% that is due to the balancing being great and the scores being accurate and hence the higher ranked players have pulled the weight for the win. All I am saying is that I don't think that is obviously the case, but even more one can design the system that isn't prone to this problem. For that you have to model the level of competition though.

Because yes I do argue essentially that if people play a 1000 games at an average game level of 1900 and they all have 50%, the correct answer should be that everybody has 1900 rating. A player with 1400 rating simply would not win 50% consistently in that environment. But this very pull is not encoded currently.


I already told you that the proof that this series converges is very long and very hard to type into here. I linked you to a wonderful wikipedia article explaining converging series mathematically.
Basically you prove that this series converges by proving that the sum from 1 to infinity of (1 / (1 + 10^((2a - x)/400) converges.


I use convergent series in my work. The wikipedia article says nothing about the convergence of our setup. Well the problem here is of course that the way the system operates this formula does not operate under assumptions it was set up.

Yes convergence of that formula has been proven, but not under a system that auto-balances for a 50% ratio on an ensemble score!

See how the original ELO and this system are different in that ELO does not have the pathology I describe. If two players of equal rank play 50% their scores do not move. This is correct and is the only possible equilibrium in single player ELO.

However if you take an ensemble of N players and take their mean scores you get many possible equilibria. But each player is supposed to have only one correct target score. But we only use a convergence over a randomized ensemble to claim that each target score indeed converged.

I have already given the construction that the current algorithm does not move players at all independent of their individual score in the ensemble in the 50% case. Clearly there is no convergence even if a player in the ensemble actually outperformed his rank.


If you wish to counter me please feel free. However, you have given no such proof that this does not converge.


Yes I have. Let me sketch the proof again. Pick all random ratings. They play at 50%. No score move independent of what they are. You can fix any hidden variable that is actual skill and it will be uncorrelated to the score and remain so. That is the proof. I can set it up as a contradiction if you really want to, but it should be pretty obvious.

It's a theoretical point but given that the system actually tries to stay at 50% it's actually a more practical concern too. At least I don't know an actual argument to show that this is an unstable singular point (unstable singularity having the property that perturbations will make the dynamics diverge from the singular case).


Edit: I think I'm starting to understand what we're both saying. Are you saying that before you even know your team-mates, a player who is rated 2200 would be expected to win more games with an average score of 1900 than a player with 1700?


Precisely.

That's one of those weird statistics things. A 2200 player is not expected to win more games necessarily (in the future - they already have won more games in the past). If the 2200 player has a 60% win and the 1700 has a 40% win then yes that would be true. However, if the 2200 ranked player has a 51% win and the 1700 ranked player has a 75% win - the 1700 ranked player would have a higher chance of winning in this game because of the auto balance algorithm. Remember the key point here - teams are balanced around having a similar total number of points. If someone has won 75% of their games and is ranked 1700 - they are under-rated or their team-mates have been under-rated for most of their games (enough to make up for them). As most people are appropriately rated the later is most likely not the case.


I agree that it's difficult. I don't actually think you correctly paraphrased my concern here.

Let me see if I can make this clearer.

Sure it's thinkable that the 2200 plays 50% In fact it is thinkable that everybody plays 50% without touching the 1700. That requires that the 1700 also plays at 50%, again note that this is independent of whether the 1700 rating is correct or not. Now let's assume that 2200 really just in general outperforms his competition and has an actual competitive ratio of 51% and we assume this for all players above the mean. Note that this requires that those below the mean will drop (no matter if their rating is actually supposed to rise).

Does that make sense? Basically if there is any pressure at all at the top this will put downward pressure below the average.

If you place the game mean at 1500 this paradox disappears. Players who play below 50% are supposed to drop below 1500 and those who perform above 50% are supposed to rise. (Strictly the mean and median do not need to coincide like this, but given that the E term in altitude's ladder formula is largely ineffective due to the auto-balancer this is what actually turns out to be true. People with a 50% ratio will have a score at or very close to 1500).

The point here is quite simple. The system should apply the right trends to give the right outcome. We should not have to assume that the 2200 player is stable etc. And we should not assume that a 50% game has everybody ranked correctly. The system should apply pressure to challenge these and if the pressure proves to be wrong correct for it.

This is why I actually argue there should be a pressure towards the game mean score. A player is 1700 and wins 50% in a 1900 game. The rating should rise. If he then starts drops his % clearly that was a mistake. If he keeps rising clearly there was a misadjustment.

That would actually introduce a convergent property under 50% conditions!

Currently there is no such pressure in the system. See just because the team mean is taken into account does not mean that low ranked players get a free ride. If they don't perform as predicted they will drop after all.

TrueSkill does draw such inferences under draws btw.

But this very thing is missing.


I invite you to show me the statistics or mathematical proofs to counter my claims. You have done no such thing.
Yet, you continue to claim like you have proven something. Please link me to the post you claimed to give a mathematical proof in. Maybe I'm just missing the one where you counter my points?


I have proven that 50% games have no correcting measure to randomly seeded scores. I.e. arbitrary seeds will not move independent if they are correct or meaningful or not.

I have also made many very explicit and testable claims. For example I have made the claim that someone competes at just under 50% in games rated at 1900 they will experience a downward pressure that will move them further and further away from 1900, that independent of whether they actually should have a rating of 1800 or 1000. Do you agree or disagree with the argument I have provided. If no, what is the flaw in the argument.


In fact, you're contradicting yourself by saying it converges and then telling me you have proved otherwise.


Frankly you obviously haven't understood what my contention is. No contradiction at all. I do not say that there is no convergence. I say that if games play 50% currently the system gives no information if the underlying rating is correct, i.e. if someone plays at 50% there is no way to argue that they are converging to their "true" rating, because no such pressure is in the system. Clear what I am claiming?


Edit: Of course if we ignore these points - I do think your variance idea proposed in the above post is interesting.

Cool.

Tekn0
02-04-2011, 11:20 PM
that's exactly the point he made and i then rebutted, yes.

someone who can only be effective with one plane will cause their team to lose, sooner or later, due to that lack of versatility. attempting to rectify that within the ladder formula is foolish as it should be a pure player-based decision as to what they can and cannot do. your rating should reflect how good you are, overall, not how good you are with a flak cannon bomber and a team tailored to fit it.

I don't see how someone who plays 4-5 planes will get a higher rating unless they TRULY play all planes equally well and better at it than others. Which I'm sure the MAJORITY of altitude players even on Ladder do not do. And going by current top rankings, I don't see players in there who play 4-5 planes. They stick to playing 1 or 2 planes all the time. Even the top ranked players. I don't want to point out specific names any longer, but you can see that yourself.

Of course, I'm just suggesting a change that some of us feel might be worthy. I personally don't mind if no such thing is considered.

shrode
02-05-2011, 12:36 AM
Urpee, I believe that you are not thinking about this correctly.

If you are a 1400 player winning in "a 1900 environment," you will still have a fifty percent chance of winning. This is because the balancer will put you on a team with people higher than 1900 to counterbalance your personal rating. So your "1900" team will be something like 1400 1900 1900 1900 2400. See, the 2400 player will carry your slack in this 1900 environment, meaning you will still have a fifty percent chance to win. ALWAYS. If the teams are unable to be truly balanced, you gain/lose less to compensate.

For example I have made the claim that someone competes at just under 50% in games rated at 1900 they will experience a downward pressure that will move them further and further away from 1900, that independent of whether they actually should have a rating of 1800 or 1000. Do you agree or disagree with the argument I have provided. If no, what is the flaw in the argument.

I disagree, and this is why. If a player is playing just under 50% in games rated 1900, than they must be overrated, or their teammates are overrated, which, after enough games the luck of whether or not your teammates are overrated will balance out so that the only factor is whether or not you are overrated. Once the person is at the appropriate rating, their win-loss will become fifty-percent, due to the fact that they will get balanced with better and better (or in the case of good players, worse and worse) players on their teams. And note this, I'm not saying their cumulative win-loss will become fifty-percent, I'm saying their post-finding-the-correct-rating W/L will become fifty-percent. Cumulative win/loss does not matter, because that is simply determined by how many games you have played at the correct rating. Therefore, people will NOT continuously be pressured away from 1900, but rather pressured towards their proper rating.

The current system works rather well, except with the cases of new players and players who stop playing for a period of time, or drastically change playstyles and become significantly better or worse (causing them to become misrated).

The fallacy in your logic is that you assume that people who play above the mean level of the game are expected to win. This is false, due to autobalancer causing the other team to have just as many rating points above the mean as your team does: zero (or as close to zero as possible). It is the people who play above the level of THEIR INDIVIDUAL rating that are expected to win.

Rainmaker
02-05-2011, 12:40 AM
2.I already told you that the proof that this series converges is very long and very hard to type into here. I linked you to a wonderful wikipedia article explaining converging series mathematically. Basically you prove that this series converges by proving that the sum from 1 to infinity of (1 / (1 + 10^((2a - x)/400) converges. If you wish to counter me please feel free. However, you have given no such proof that this does not converge.

Ok. I'm gonna make an effort to point the mathematical point of view very clear:

This is the formula provided by Esoteric for probability of win based on team ratings:
E = 1 / [1 + 10^(x)]
also known as a logistic distribution:
http://upload.wikimedia.org/math/9/e/2/9e2a6aae2a5a3ea88c91bdbafda49d60.png
The graph of this Density Distribution Function is:
http://upload.wikimedia.org/wikipedia/commons/thumb/c/ca/Normal_Distribution_CDF.svg/500px-Normal_Distribution_CDF.svg.png
which obviously has no point proving it's convergent, as lim(x-> infinite) it's 1.

It's Cumulative Distribution Function is this:
http://upload.wikimedia.org/wikipedia/commons/f/fc/Logistic_cdf.png
Which obviously converges to 1. Why? It's the property of any function that represents probability: the sum of all the probabilities of it's domain must be 1.

But this is not what was asked before in the topic mikesol. (or I asked).
No one has proved that ratings "converge" (many have repetead the "you will have a 50% win ratio when you stand on your "true" rating).

Here I ll explain it with a simpler graph:
http://img217.imageshack.us/img217/3185/logistic.jpg
x represents the difference in ratings divided 2*sigma:
x= (u1-u2)/2*s

So if you look at the graph a 1.5 ratio difference:
1.5 = u1-u2/2*s

Elo uses a s = 200 ratio points,
1.5 = u1-u2/400
u1-u2 = 600

So for a difference of 600 points -> the probability of winning for u2 is something around 0.97 and of u1 is 0.03 (from looking at the graph, the exact numbers can be found through the actual formula)

Rainmaker
02-05-2011, 12:41 AM
Another totally different thing is what are u1 and u2.
u1 is the mean of the player 1 rating, which theoretically would follow a normal distribution, which looks something like this:
http://content.answers.com/main/content/img/barrons/accounting/images/N4.jpg
Lets build a normal distribution for a random player A. We will take the spread (sigma) proposed in the ELO model which is 200.
The mean, lets suppose player A is a pretty skill player, around 1875

So his Normal distribution is: Rating.A = N (u=1875;s=200).

So for example, I would like to know with 90% confidence what would be the minimal real rating for this player:
phi = min.0,9
(x-u)/s = -1.2816 (this is a table value the prob of 0,9 = 1.2816)
so X (inferred real rating) = 1875 -200*1.2816 = 1617.
What does this mean? I'm 90% sure that his real rating is above 1617
His max real rating (90% confident): 2131
With a 80% confidence we could say that his "TRUE" rating is between 1617 and 2131 (provided that the system has rated him 1875).

Someone please explain me the mathematical deduction which you performed to state that a player's rating converges to an unknown number (known as "TRUE" rating) (1875 for example) when his winning ratio is 0.5 (win games/total games).




On a different subject, my opinion on the current system.
On the begging teams would be autobalanced, so points earned or lost wouldn't be constant as now (24), it would depends on the teams average ratings.
So a game could give you 29 points, and another 21.
But, after autobalance was introduced, and if all teams would have as a max difference of avrg rating 20; all players should always win or loose 24 points.

So, if anyplayers achieves his true rating when he reaches the 0.5 ratio, his rating should be 1500.
Look, random.player has a 0.5 ratio, but his rating is 1600. how come?
As I said, the system has been modified on different occasions. One of those was to include autobalance, so win/lost points weren't constant before autobalance was introduced to the system, which lead to some fluctuations on this numbers. Also, as autobalance can't always get 2 teams with the exact same average rating, when points are calculated it doesn't always result in 24 points, sometimes it will be between 4 points above or below 24 [20~28].

Formulas provided for the original topic are:

E = 1/ {1 + 10^[(avr1-avrg2)/400]
points = 50*(S-E)

Take for example this game of ball:
http://64.191.124.60/match.php?id=15668&mode=ball_6v6

Seems, like there were only 12 people on the server, as the autobalance system could find the best arrangement to be of
team1.avrg = 1741
team2.avrg = 1689

So probability of team2 loosing; E2 = 57.43% (0.574280203 to be exactly)
For some reason (i would have to search in the code, but i took the liberty to test in on 7 matches) point distribution is only calculated from either
chances to loose from team who won.
chance to win from team who lost. (mind you, this probability are exactly the same)
For this case:
50* (0.425719796) = 21.28598982 points (rounded up as 21).

My guess is, that when coding it was decided that the system should be kept zero sum, so instead of calculating each teams points won/lost differently
which would result in:
team1: +21 (each)
team2: -29 (each)
(you can check the numbers your self, but as K = 50, points drawn from one team + points given to the other team should always sum 50).

This is in most players the 0.5 ratio = 1500 isn't 100% accurate. (as not all matches are 24 points constant as it was designed).

As you can see for yourself (feel free to check all the calculations, etc, if you find any error), this system doesn't converge to a real one.
It's purpose is to reward the player who can have the most neat wins (total games won - total games lost).
It doesn't matter if you played with a 3000 rated people, or against him, it doesn't matter if your teams avrg rating is 2000 or 1400.
What matters in the current system is that can get as many net wins.
For those purposes, its the same if you have
(Cloud.Ace) 106 - 60
(ACE.Yuyuko) 254 - 208

Both are rated almost equally, because their neat games are 48 wins.
But you can see that their total games are quite different, as well as the win ratio.

Moreover, on a personal opinion as you would imagine how a rating system works: if ACE.yuyuko has 63.9% win ratio and 106 - 60; you expect him to keep climbing on the leaderboard.
But according to what most have stated, his ratio should be 0.5; for that to happen he would have to loose lots of games, keeping his recording going down.
How is that he si to loose lots of games, if he is in a 12w streak.
Anyone thinks that he is having 12 wins in a row of pure luck? (all his 4 teammates in those 12 games were marvelous players?)


http://img199.imageshack.us/img199/1250/tbdrating.jpg

shrode
02-05-2011, 12:47 AM
Someone please explain me the mathematical deduction which you performed to state that a player's rating converges to an unknown number (known as "TRUE" rating) (1875 for example) when his winning ratio is 0.5 (win games/total games).

It's a real tricky math deduction. It's the autobalancer, which creates fair teams if everybody is appropriately ranked. Fair teams = 50/50 win/loss. Also, do not look at winning ratio total (win games/total game), look instead at (win games at correct rank/total games at correct rank). The second is what converges to 50/50, not the first. Well, technically the first will eventually but after a very long time.

mikesol
02-05-2011, 01:01 AM
So you did all that math and you still proved my point o_o. The graph converges as I said. wiki on convergence here. (http://en.wikipedia.org/wiki/Limit_(mathematics)#Convergence_and_fixed_point)

Nowhere have I ever said that a player's score will not fluctuate. Nowhere have I ever said that there is like a magic number (like 1875) that someone converges upon. That would make no sense and I believe you're completely mis-interpreting me.

I am saying there is some general area that they fluctuate between. I.e. some number that they converge upon. People do fluctuate - but it's based on a wide variety of factors. Nonetheless you will not see poor players suddenly jump up to a 3000 rating. They will generally stay where they are until they improve their game play. Going on your example - a player like that would most likely stay between 1600 and 2100. Now player's generally don't waver 500 points without some extraneous factor. The reason the standard deviation is so high would generally be because of the unknown of other players. Many players get screwed over if someone is under-rated or over-rated. Nonetheless, as you play more games - this counters that. You will not see anyone in the top 50 suddenly go down to 1500 unless they stop playing this game altogether and come back after many months.

Incidentally my responses weren't even directed at you. They were directed at Urpee and his mathematical flaws.

As far as explaining how we got to this 50% number I'll try again but I'm pretty much convinced that you and urpee are on completely different pages than the rest of us. It doesn't appear there is any real progress in convincing either one of us (or pretty much anyone here) that one of us is right. Note: I could be entirely wrong on this. I'm not saying I'm 100% guaranteed to be right. I'm just saying that we're not on the same page and that these things are not making sense to one another - hence no real progress.

Let me try a simple example, though. Let's say you have 10 players. 2 players at 1000, 2 at 1100, 2 at 1200, 2 at 1300, 2 at 1400. So in other words the average score of the players is 1200. The players are balanced around this with one from each of those scores on each team. So the teams look like this set: {1000, 1100, 1200, 1300, 1400}. Now if these people are at their actual rating we'd expect each team to win 50% of the time. This is by the simple definition of rating. If two teams have the same points they should have the same probability of winning the game. For instance if we changed one team to be: {1400, 1400, 1300, 1300, 1200} - we'd expect that team to win as they have a higher total score (by expect to win I mean their probability of winning is > 50%)

The catch with all of this is - players may not be at their appropriate rating. Let's say the 1400 player really deserves to be 1500. Provided that no other player is different - we'd expect that team to win more than 50% of the games now. That one player has suddenly increased the probability beyond 50%. On the other hand, what if the 1000 player really deserved to be ranked 700? He has brought down his team's chance of winning. After a repeated number of games - these factors all lead to a fairly accurate rating. That's why after only 30 games or so I'm already back up to near the top whereas someone like urpee is still at his 1400 rating. Because each team should win 50% of the time when it's balanced - if someone is under-rated or over-rated or plays badly or chooses a bad plane etc - the probability is increased or decreased.

Edit: You posted some new stuff so let me address that, too. We would expect tyr to keep increasing judging solely on his performance so far. Tyr has not played enough games to have an accurate portrayal of where he stands. Maybe he really deserves that score and if he keeps playing it will eventually converge to 50%. You're expected to increase your net wins until you reach some number. o_o. It's why you'd expect my smurf who is at 77% win to keep winning. My smurf has not reached the 2500 that I'm currently rated at. My net wins are still below where they should be.

Rainmaker
02-05-2011, 01:35 AM
So you did all that math and you still proved my point o_o. The graph converges as I said. wiki on convergence here. (http://en.wikipedia.org/wiki/Limit_(mathematics)#Convergence_and_fixed_point)
Don't get me wrong, I agreed with you that those 2 functions converge, because is one of their properties. As being families of Probability Functions.

The reason the standard deviation is so high would generally be because of the unknown of other players.[QUOTE]
Not really, the standard deviation (s for "sigma" as the the greek letters) is assigned based on criteria.
What criteria?
In this model, it is proposed that a difference in 200 points, results in the higher ranked player having a 0.75 chance to win, against 0.25 of the other team to win.
Taking altitude for example: a 1900 avrg rated team against a 1700 avrg rated team.
How do I know this?
Look at E formula

E =1 / [1 + 10^(u1-u2)/2*s]

We could say that we don't agree, we think that rating isn't that accurate, so, we say that only a 300 points difference give team1 0.75 winning chance

Then the new E formula would be: E' = 1 / [1 + 10^(u1-u2)/600]



[quote]As far as explaining how we got to this 50% number I'll try again but I'm pretty much convinced that you and urpee are on completely different pages than the rest of us.
I agree on our disagreement.
I find rather amusing that statements are made on such liberty that those who stated them, can't back them up (not particularly you really)
My argument is that the system is faulty from scratch, and there is no point in arguing that. I base it on the fact how is that the system ranks players.
IMO players shouldn't be ranked as it does currently (Explained on previous post which i EDITED, would you been so kind to read last few paragraphs?).

I'm not gonna guess if it was designed that way (faulty IMO) on purpose, or only by accident (Eso Elo's model + autbalancer, etc). Either way it doesn't seem to matter, because you (as ladder community) are/seem to be happy with it (not 100%, but some satisfaction on how it turned out/works).

Urpee
02-05-2011, 01:53 AM
They were directed at Urpee and his mathematical flaws.

My math isn't flawed but I am learning my lesson. Look Mike, it's obvious that you don't even know what you are taking about.

Explain what this condition means on the wikipedia page you cited:

1) First check that p is indeed a fixed point:

f(p) = p


That said, this thread invited people who do understand math to analyze and give input. Frankly all that has been gotten is hearsay arguments and opposition to actual analysis they find subjectively unwelcome or against their intuition what the system is claimed to do. If that's what you want. Fine you have it.

If there are some sane devs out there who want input, they can PM me and we take this conversation offline.

But noone will be able to help if one cannot even talk about the property of the system let alone improvements without drawing all sorts of uninformed opposition.

mikesol
02-05-2011, 02:15 AM
Urpee -

I have nothing more to say to you and I would wish for you to stop posting here. Your ideas and comments are degrading into completely useless posts whose only goal is to insult my intelligence without ever reading what is being presented. We can go back and forth all day saying that each of us is more experienced than the next. I feel I have clearly demonstrated why you are wrong and you clearly feel you are not wrong. Frankly I don't care anymore. The people who are working with this all agree that you are completely mistaken with the points that we're interpreting from you. Your points may not necessarily be wrong. If they aren't, you are saying them in such a way that makes absolutely no sense to anyone involved and are, therefore, worthless for us.


Into the wall-

I appreciate your more thought out posts, but I do think it's rather ignorant to say that people can't back up their claims. Obviously people here feel like they have been backed up. Personally I feel I have given numerous examples of where your logic is wrong, but it clearly does not come across that way to you. The people working on this project will take what they will from this and move on. If we need help or further input, we'll continue this discussion via pm. Thanks for your time.

nobodyhome
02-05-2011, 02:16 AM
To address IntoTheWow's point:

Yes, the system as it is right now (where autobalancer balances the average team ratings to exactly equal almost 100% of the time) makes it so that each game is worth exactly 25 points, and thus your rating will solely be a function of your net wins. To be exact, your rating is equal to 1500 + (net wins * 25).

This is why that doesn't quite matter: because of the autobalancer, not all of your games played are the same difficulty--the difficulty of your match actually increases as your rating increases. This is because due to the autobalancing mechanism, you will become paired with worse and worse teammates as your rating increases (assuming that the average rating for each ladder game is roughly the same, which is a reasonable assumption to make). Thus, 3300-rated player has to work much harder in order to win each game as compared to a 1500-rated player, which is why if you are able to win only 50% of your games when you are rated 1300 you are not of the same skill level as someone who can win 50% of his games when rated 1500.

In the case of tyr (tyr is the name that ACE.yuyuko common goes by), mike is right in that in his case, he has not converged to his rank at all. This is because tyr rarely plays ladder but yet is a very good player--thus he is vastly underrated. If he does begin to play it will take him a good 50+ games before he will settle down to a correct rating and begin winning at 50% (note that I do not mean his overall win rate will be 50%, but rather, his recent win rate will become 50%). Unfortunately, this settling down takes a large amount of games--this is a flaw that I have already acknowledged and will be solved with the system of introducing an "uncertainty" variable into the equation.

nobodyhome
02-05-2011, 02:18 AM
Urpee--I am the only dev of ladder. You probably shouldn't waste time checking your PM box.

Urpee
02-05-2011, 02:45 AM
Nobo, it's cool. I have given plenty of constructive input in this thread. What I got is lots of negative judgment. It's clear that it's not appreciated. Yeah I don't need that either, thank you.

Good luck finding someone to help with improving ladder. Hope you'll listen to not judge the next person.

VipMattMan
02-05-2011, 03:11 AM
IMO players shouldn't be ranked as it does currently (Explained on previous post which i EDITED, would you been so kind to read last few paragraphs?).

I'm not gonna guess if it was designed that way (faulty IMO) on purpose, or only by accident (Eso Elo's model + autbalancer, etc). Either way it doesn't seem to matter, because you (as ladder community) are/seem to be happy with it (not 100%, but some satisfaction on how it turned out/works).

You repeatedly go back to this idea in your posts that somehow a player who is winning 69% of their games with a 2000 rating should be ranked much higher than someone with a 2000 rating who has a 50% win rate, based on your idea that ladder doesn't converge.

If ladder works as everyone but you and UrPee believe it does, then why does it make sense that a person who has been playing at a higher ranking for a longer period of time, therefore having to carry lower ranked players, would be punished for having hit the point at which they're appropriately rated? Someone just starting ladder with a lower rating and an equal skill level is going to converge at a high rate towards the beginning of their playtime, and at some point begin to win approximately 50% of their games. Their overall win rate wouldn't immediately show as 50%. It trends towards 50% over time.

Noone is saying that an individual with a 69% win rate has already converged to his actual skill associated rating and should now be winning 50% of his games. From ladder's perspective it simply assumes his current rating is accurate, no matter what that rating is. Ladder doesn't know who he is or where he came from, it just knows his rating and assigns him to a team based on team cumulative rating.

Say every player in a specific game is matched with another player with the exact same rating, and all other players are appropriately rated except for him. If he's better than the player on the other team that had the same rating, then his team is probably going to win.

His teams rating rises and the other team's rating goes down. The players on the other team whose rating just went down may be underrated now, while every other player that was just on his team may be overrated. Teams get shuffled and they go again. Differently shuffled players may win or lose based on who's on his team. The only commonality is that whatever team he's on nearly always wins until he hits a point where he can't "carry" the weight of his rating.

You used ACE.Yuyuko as some sort of bizarre anecdotal evidence that somehow something that noone was arguing was false. The reality is that your anecdotal evidence indicates exactly what everyone else is saying. That underrated players converge to a more accurate rating.

If convergence doesn't exist, then why is he winning so many games? Why did he nearly set the ladder record for a win streak in HIS FIRST DAY OF PLAYING. Remember, he started with the 1500 rating that all other players who are supposedly "locked in" did.

http://64.191.124.60/matchlist.php?id=b9c3b910-04d3-4b5b-8b94-e7c62f1276dd&mode=tbd_5v5&grf=100&sort=played_d

As for the whole 50% issue and why win rate shouldn't have any impact on rating, go through all of ladder and notice a certain propensity for players with more games to have a winning percentage closer to 50% than those with less games. More games at a certain rating doesn't = worse player.

I suppose that it also needs to be clarified that noone's saying that the 50% rule that people are referencing is part of an equation. It's an individual result that EVERY player gets closer to the more games they play.

Pieface
02-05-2011, 04:24 AM
You used ACE.Yuyuko as some sort of bizarre anecdotal evidence that somehow something that noone was arguing was false. The reality is that your anecdotal evidence indicates exactly what everyone else is saying. That underrated players converge to a more accurate rating.

If convergence doesn't exist, then why is he winning so many games? Why did he nearly set the ladder record for a win streak in HIS FIRST DAY OF PLAYING. Remember, he started with the 1500 rating that all other players who are supposedly "locked in" did.

http://64.191.124.60/matchlist.php?id=b9c3b910-04d3-4b5b-8b94-e7c62f1276dd&mode=tbd_5v5&grf=100&sort=played_d


While I generally agree with what's been said here, I'd like to point out for the sake of clarity that tyr's first set of games (and those of anyone else who's been around since the start of ladder) do not reflect the current algorithm as they occurred before the autobalance system was introduced. When ladder originated, teams were generally picked by experienced captains. Points were then assigned based on the balancing of the rankings, which is why many people got only 8-9 points for severely imbalanced games. This system led to a huge amount of people only playing games they felt they could win. In general, it's probably best to look at rating trends since the new rating/autobalance algorithm were implemented.

VipMattMan
02-05-2011, 04:44 AM
While I generally agree with what's been said here, I'd like to point out for the sake of clarity that tyr's first set of games (and those of anyone else who's been around since the start of ladder) do not reflect the current algorithm as they occurred before the autobalance system was introduced. When ladder originated, teams were generally picked by experienced captains. Points were then assigned based on the balancing of the rankings, which is why many people got only 8-9 points for severely imbalanced games. This system led to a huge amount of people only playing games they felt they could win. In general, it's probably best to look at rating trends since the new rating/autobalance algorithm were implemented.

Ya my bad. I looked at the date and for some reason thought 2011 and not 2010. Regardless all the general points remain the same.

Most people who were skilled before they joined ladder set large win streaks early in their ladder careers as convergence occurs. I set my personal ladder streak record in my first few days of ladder, and thousands of games later it still hasn't been touched. I started playing ladder post-algorithm changes.

JDR
02-05-2011, 02:10 PM
I see at least 3 posters in this thread that might be just as bat**** crazy as Jared Loughner.
Sorry, but it needed to be said.

Tekn0
02-07-2011, 12:52 PM
Oh, btw, will the new ladder system (if there is going to be one) going to address 6-5 losses over 6-0 losses?

Right now losing a game 6-5 and losing one 6-0 makes no difference. Normally, 6-0 should not happen on Ladder because of the balancer, but (I think) since we all agree that the balancing will never be perfect due to an influx of new players (till they reach their new scores) there are occasions where you do get terribly lopsided teams.

I feel the team losing 6-5 should not be as heavily penalized as one losing 6-0.

I don't know if match length can or should be taken into consideration (or how 'match quality' is judged currently... I leave it to better, more experienced of you men to pass judgment on these ideas/suggestions.

ryebone
02-07-2011, 07:00 PM
I feel the team losing 6-5 should not be as heavily penalized as one losing 6-0.


Wait, if one team unfortunately gets a new player who's way overrated and causes them to lose 6-0, you want to punish them MORE?

Ingbo
02-07-2011, 07:04 PM
I think i might read this entire thread just for the fun of it later on, lots of good and interesting points here, keep it up fellers!

Ribilla
02-07-2011, 08:52 PM
Wait, if one team unfortunately gets a new player who's way overrated and causes them to lose 6-0, you want to punish them MORE?

If a team gets a new player and only just loses 6 -5 you want to punish them more than a team who gets a new player and cannot even score one?

nobodyhome
02-07-2011, 09:02 PM
Sorry, the new ladder system will not (and should not) take into account any in-game factor when determining ratings. This include things like the final score of the game. This is because a win is a win, and a 6-5 should not be any less of an achievement than a 6-0 win.

York
02-07-2011, 11:11 PM
Sorry, the new ladder system will not (and should not) take into account any in-game factor when determining ratings. This include things like the final score of the game. This is because a win is a win, and a 6-5 should not be any less of an achievement than a 6-0 win.

I know nothing about the math you guys are talking about. Way past my grounds.

But anyone fighting to get change the way the rating works, with respects to the 6-5 loss over 6-0 loss, is a crazy man.

Even if the teams are 100% balanced, there can still be shut outs.
Even if the teams are 0% balanced, there can still be upsets.

(3) sky
(4) donk
(5) tmic
(6) aya
(10) mikesol

would probably not beat

(1) mled
(33) trendy
(33) pieface
(37) wok3n
(105) york

5 Top 10 players probably wouldn't beat this second team, even though that second team has the worst player ever, YorK.

"YORK STFU UR DUMB"

^ K no. 5 bomb runners will not beat that second team, even though that the second teams players don't have a rating that comes even close to the bomb running team.

Now, everything I said probably makes no sense. So I will just write my point and be gone:

Teams don't lose 0-6 because ladder balance didn't work, they lose that badly because of ****ty plane composition.

Teams lose 0%-100% in TBD not because they got a bad team, its because they got 3 biplanes! Then, the guy on their team who is super high rated basically just gives up leading to a god awful loss.

If you don't want to lose 0-6 or 0%-100%, then GET BETTER and learn to play every plane! If you feel that you are losing 0-6 often, and all you play is biplane, than expect to lose that badly some more.

I feel ladder already takes everything into consideration, with respects to your usefulness.

I have a rating of 1737 in TBD. Is that great, no. Is it horrible, not really.

Now look at Sinstar's 1656 rating. He plays a whore randa. Is he bad at what he does, god no. He is actually the best at what he does. Could he be more useful? Of course, he learned 4 more planes and learned to bomb run. Could he get his rating to 3000? Yup. Can he do it with his 1 plane set up? Nope.



Learn to play the FULL GAME, and then maybe you won't lose so badly any more.

Tekn0
02-08-2011, 12:20 AM
Well firstly, I'm not _REALLY_ complaining. I don't mind if the same ladder rankings stay even.

Wait, if one team unfortunately gets a new player who's way overrated and causes them to lose 6-0, you want to punish them MORE?

One player needs to be extremely extremely bad to cause the entire team to lose 6-0. Yes I know even one player can bring down the team, but losing 6-0 and blaming entirely one bad player is a little escapist IMO. Unless he was deliberately causing it.

The reason I asked this was, if I lose 3 of my past games 6-5, it didn't seem fair when I see the next team with losing 6-0 or 6-1 and getting the same effect on the losing team rankings. Of course, I'm not into team based ranking systems and how they work etc., so I won't refute your claim that it should NOT be taken into consideration.

Tekn0
02-08-2011, 12:26 AM
Learn to play the FULL GAME, and then maybe you won't lose so badly any more.

Thank you for elaborating, York, though I've not much experience with TBD ratings I take it they're somewhat similar (in range/value) as ball.

It was more my gripe with 6-5 loses than 6-0 loses.

Evan20000
02-08-2011, 12:30 AM
Now look at Sinstar's 1656 rating. He plays a whore randa. Is he bad at what he does, god no. He is actually the best at what he does. Could he be more useful? Of course, he learned 4 more planes and learned to bomb run. Could he get his rating to 3000? Yup. Can he do it with his 1 plane set up? Nope.

While I agree with what you're saying, keep in mind that when I decided to retire from ladder, I didn't want to sit on my rank that I earned from (badly) bomb running if I wouldn't be playing anymore, so I started playing other planes and preforming about as well as you would expect me to do with them until I hit a rank outside of the T50.

XX1
02-08-2011, 05:40 AM
While I agree with what you're saying, keep in mind that when I decided to retire from ladder, I didn't want to sit on my rank that I earned from (badly) bomb running if I wouldn't be playing anymore, so I started playing other planes and preforming about as well as you would expect me to do with them until I hit a rank outside of the T50.

sorry for interruption but what is T50? :\ top 50:D?

elxir
02-08-2011, 06:01 AM
sinnypants was #2 overall when he cared lol

York
02-08-2011, 01:17 PM
Thank you for elaborating, York, though I've not much experience with TBD ratings I take it they're somewhat similar (in range/value) as ball.

It was more my gripe with 6-5 loses than 6-0 loses.

Same difference

While I agree with what you're saying, keep in mind that when I decided to retire from ladder, I didn't want to sit on my rank that I earned from (badly) bomb running if I wouldn't be playing anymore, so I started playing other planes and preforming about as well as you would expect me to do with them until I hit a rank outside of the T50.

I said it with love, I just had to use you as an example, since you are the best at what you do, whore.

Tekn0
02-08-2011, 01:26 PM
Same difference

Hmm, seems you didn't get it, nevermind.

Tekn0
02-08-2011, 01:40 PM
If a team gets a new player and only just loses 6 -5 you want to punish them more than a team who gets a new player and cannot even score one?

That's my question too.

I'm NOT pushing for a change to include final scores in the rankings.

Now that that's clearly out of the way, can someone explain to me (and Ribilla perhaps) on what is quoted above?

"A win is a win" <-- No I don't think so, ranking is like grading performance, which means it's not ALL or nothing. IMO ranking should take your "effort" into consideration. Or no? (Honest question really).

A 6-5 loss could mean a fluke last goal a "could have gone either way situation", but chances of 6 fluke goals in a 6-0 chance are a magnitude more improbable.

York said, ****ty plane composition is why they lose 6-0 and it has nothing to do with badly balancing teams... Maybe... but that still does not answer WHY should they not be penalized more for this? People who refuse to "play for the team" and being selfish with their setups and lose badly -should- lose more points...

Can someone explain why 6-5 losing teams lose as much points as a team losing 6-0 ?

I don't know if I'm being annoying, and I'm not doing this to piss off anyone, I'm really just curious and honestly want to know your reasoning behind this.

blln4lyf
02-08-2011, 03:58 PM
Now look at Sinstar's 1656 rating. He plays a whore randa. Is he bad at what he does, god no. He is actually the best at what he does. Could he be more useful? Of course, he learned 4 more planes and learned to bomb run. Could he get his rating to 3000? Yup. Can he do it with his 1 plane set up? Nope.



Learn to play the FULL GAME, and then maybe you won't lose so badly any more.

If sin stopped whoring and played the ideal way for ladder(in his words its "babying" his team) he'd be top 5. Knowing other planes help but you can absolutely be #1 playing only 1 plane.

VipMattMan
02-08-2011, 04:06 PM
If you're only going to gain or lose a significantly reduced portion of your ~25 rating that you usually get for a game, all of a sudden those 6-5 games become a lot less intense.

The all or nothing method allows a lot more rating variation which keeps rankings from stagnating too much. That adds to the fun, and it lets people get to their approximate rating faster.

Sometimes those 6-5 games aren't as close as they seem either. Last night we had a game where the other team went up 4-0 while our team struggled to find a good composition. The second we found the right composition we went on a 6-1 run and won the game.

If that game had been allowed to run for another 10 minutes we may have doubled the other teams score. Then again the other team may have restructured their composition and started to dominate us again. That's just one of those things that you can't really know. All you know is that someone has to win, and you have 6 goals to figure out what you have to do to get it right.

In the end, losing 25 rating isn't going to blow you out of the water.

Ribilla
02-08-2011, 05:30 PM
That's my question too.

I'm NOT pushing for a change to include final scores in the rankings.

Now that that's clearly out of the way, can someone explain to me (and Ribilla perhaps) on what is quoted above?

"A win is a win" <-- No I don't think so, ranking is like grading performance, which means it's not ALL or nothing. IMO ranking should take your "effort" into consideration. Or no? (Honest question really).

A 6-5 loss could mean a fluke last goal a "could have gone either way situation", but chances of 6 fluke goals in a 6-0 chance are a magnitude more improbable.

York said, ****ty plane composition is why they lose 6-0 and it has nothing to do with badly balancing teams... Maybe... but that still does not answer WHY should they not be penalized more for this? People who refuse to "play for the team" and being selfish with their setups and lose badly -should- lose more points...

Can someone explain why 6-5 losing teams lose as much points as a team losing 6-0 ?

I don't know if I'm being annoying, and I'm not doing this to piss off anyone, I'm really just curious and honestly want to know your reasoning behind this.

I was just making a point in response to someone's stupid post. I think that maybe there should be some discrimination between close games, but it's not a priority. The whole thing is far too messy to even contemplate before everything else is perfect. Further, plane composition is such an issue here we would have to factor it into every game, close or not, thus complicating everything further.

Something to think about, but let's not run before we can walk.

banana
02-08-2011, 06:24 PM
Call me for season 3 ladder when I'll have finished a final year mathematical modelling module in a few months time.

ryebone
02-08-2011, 07:14 PM
If a team gets a new player and only just loses 6 -5 you want to punish them more than a team who gets a new player and cannot even score one?

They're not being punished more; both teams lose the same amount of points for a loss. At the end of the day, a win is a win, and a loss is a loss. The Steelers don't get half of a Super Bowl ring because they almost had a comeback win.

vintage
02-08-2011, 09:28 PM
"A win is a win" <-- No I don't think so, ranking is like grading performance, which means it's not ALL or nothing. IMO ranking should take your "effort" into consideration. Or no? (Honest question really).

I don't think ratings should ever take in-game factors into consideration. Mostly because it will cause people to take actions that positively affect their ranking but don't necessarily positively affect their chance of winning.

This is why I'm all for Nipple's idea of not showing player's scores so that people won't have as strong an incentive to ratio whore.

For instance, you might have the chance to bomb the opponent's base with a counter attack to win the game. However, knowing that they will bomb your base and you won't get as many points you might choose to defend instead. Basically, you are giving up the guaranteed win for a chance at a higher rank.

As for ball: Once a team is down 3-0 in a very even match-up they are unlikely to win. Therefore, they have an incentive to take big risks. Much like a hockey team that pulls their goalie. They don't care if they lose by a little or by a lot, but a goal that sends them to overtime would be huge. You could also liken this to an option that is out of the money. The more you increase volatility the more likely you are to get back into the money. The downside is irrelevant, because losing is losing. However, if ranking took into account the difference in score then people would have an incentive to "play it safe". Why? Because, big risks could just as easily cause you to go from 3-0 to 6-0 or from 3-0 to 5-6. So, your upside is small and your downside is large.

Does this make sense?

The second problem (the one I'd like to discuss now) is that we are taking ELO and forcing an adaption of it for team games. Notice how in the formula description above, nowhere in the entire thing is the concept of an individual player even mentioned. This is a reflection of the nature of how we are ranking things. Basically, the only way that we can test skill in Altitude is by gathering two teams, and then pitting them against each other. Consider each game to be a "test", and the output of this test is either "team1 wins" or "team2 wins". Now, if say, team1 wins, then this is a datapoint from which we can gather that team1 played better than team2 in this particular test (this game) and this test only. We would like to reflect this result in the ratings themselves so we decide that team1 should get some points and team2 should lose some points.

However, here's where it gets fuzzy: We decide that team1 as a whole has played better than team1's rating. Here we have defined team1's rating to be "the average of the players of team 1's ratings", but this is not necessarily true. Because of things like synergy, a team consisting of five players rated 2000 may not necessarily be just as good as another team consisting of five players rated 2000 (plane composition comes to mind here). How do we take a team composed of five individual player ratings and use that to form a composite "team rating"?

Furthermore, in our current system we assume that if team1 beat team2, which means that team1 played better than its aggregate rating, this means that each of team1's players played better than their individual ratings. We thus reward each player in team1 with equal amounts of points. This is also not necessarily true--it may be that players A, B, and C in team1 played better than their ratings and players D and E in team1 played worse than their ratings. Without looking into the actual in-game factors (individual kills/deaths, bomb hits, etc), is there a better way we can determine the distribution of points to the winner other than just "everybody gets the same"?

First, I know you said you don't want to discuss the implementation of K, so I won't. I will however give an idea for helping deal with newer players that has nothing to do with K. The idea is similar (or identical) to what you find on many chess sites: New players are un-ranked for their first X number of games. This allows the formation of teams by an autobalance algorithm that assumes the new player's ranking is wrong. I haven't yet thought about how that algorithm would look, but I wanted to toss the idea out there and see if it was even being considered.

Second, you asked if there was a better way to distribute points to the winners other than equally. I may be missing the point, because wouldn't K do this? Not everyone on the team will have the same number of games played, the same winning streak, etc. Therefore, they would all receive different changes in rank.

Third, you asked about composing a "team rank" based on individual ranks. This is the question that is the most interesting to me and I don't have an answer, just some thoughts. Currently you use an average which I think is flawed to some extent. For example: If you had a very good player ranked 2400 and a guy who was effectively useless ranked 0 against two so-so players ranked 1200, would the teams be even? I really doubt it. I think that 2 on 1 would be such a large advantage that it would over shadow the difference in skill. Thus, 2400 + 0 < 1200 + 1200. Perhaps an algorithm that took into account difference in skills. Sort of like an OLS. Thoughts?

Alternatively, is it really that important for the teams to be ranked evenly? Chess doesn't restrict the opponents to be the same skill level. If you were to allow for somewhat unbalanced games then it might more quickly show which players were ranked incorrectly.

nobodyhome
02-08-2011, 10:47 PM
First, I know you said you don't want to discuss the implementation of K, so I won't. I will however give an idea for helping deal with newer players that has nothing to do with K. The idea is similar (or identical) to what you find on many chess sites: New players are un-ranked for their first X number of games. This allows the formation of teams by an autobalance algorithm that assumes the new player's ranking is wrong. I haven't yet thought about how that algorithm would look, but I wanted to toss the idea out there and see if it was even being considered.

This hasn't been thought out much yet, but let's explore your idea a bit further. If a new player is unranked for their first X number of games, how would they represented in the rating system (what number would be put into the ratings formula to generate the points to be distributed)? How would the autobalance algorithm take these players into account?

Second, you asked if there was a better way to distribute points to the winners other than equally. I may be missing the point, because wouldn't K do this? Not everyone on the team will have the same number of games played, the same winning streak, etc. Therefore, they would all receive different changes in rank.

True, but I was more wondering how would the points be distributed pre-K. Though, pre-K is supposed to be a representation of the probability of winning, so changing the values of this probably shouldn't be done.

Third, you asked about composing a "team rank" based on individual ranks. This is the question that is the most interesting to me and I don't have an answer, just some thoughts. Currently you use an average which I think is flawed to some extent. For example: If you had a very good player ranked 2400 and a guy who was effectively useless ranked 0 against two so-so players ranked 1200, would the teams be even? I really doubt it. I think that 2 on 1 would be such a large advantage that it would over shadow the difference in skill. Thus, 2400 + 0 < 1200 + 1200. Perhaps an algorithm that took into account difference in skills. Sort of like an OLS. Thoughts?

Yes, we were trying to come up with a better way to represent "team rank" from individual ranks besides just using averages when we were first writing this system in the first weeks of ladder. Eso polled several players asking them whether a 2400 1800 1800 1800 1200 team would beat an all-1800 team and tried to come up with a formula that included variance in the equation and not just averages. However there was such a wide variety of opinion as to whether a high-variance team would be better than a low variance team that we decided to just go with average. I'm not sure what OLS is but if you can come up with a good system for this let me know.

Alternatively, is it really that important for the teams to be ranked evenly? Chess doesn't restrict the opponents to be the same skill level. If you were to allow for somewhat unbalanced games then it might more quickly show which players were ranked incorrectly.

No it is not important for teams to be ranked 100% evenly. The rating system should work with all varieties of matchups, the balancer only exists in order to make games roughly even (and thus more challenging and fun). This is indeed a problem I've been thinking about--currently, the highest ranked players are not necessarily the best players at altitude but are the best at being able to overcome the handicap of having the worst teammates. This is why you see an overrepresentation of bomb runners in the top tbd ranks and an overrepresentation of scorers in the top ball ranks (very few support players). I am considering loosening the balance criteria and opting for a simpler system in which the players are sorted by rating and then balancing the teams by 1, 4, 5, 8, 9, 12 vs 2, 3, 6, 7, 10, 11.

Boko
02-09-2011, 03:02 AM
I'll bet all of you 10 bucks that the new system, whatever you might come up with in this topic, will still be ****ty. It's just not gonna work and the qq will always ensue.

vintage
02-09-2011, 03:52 AM
This hasn't been thought out much yet, but let's explore your idea a bit further. If a new player is unranked for their first X number of games, how would they represented in the rating system (what number would be put into the ratings formula to generate the points to be distributed)? How would the autobalance algorithm take these players into account?

Like I said, I haven't though this out.

It would make sense to have their actual ranking start at 1500 and then adjust over the first X games as it does now. However, not be displayed on the ladder page and not be used in the balancing algorithm.

For the balancing algorithm, the first idea that pops into my head would be to assume they are the worst person playing and then pit them against the 2nd worst person in the server. For the actual numbers used in the algorithm, you could assign them the same ranking as the 2nd worst person in the server.

I think this would work well if there was a representative sample of players in the server. However, if everyone playing was 2000+ I guess this would be pretty ****ty.

I'll think about it more and get back to you. If anyone else has any thoughts please feel free to jump in.

True, but I was more wondering how would the points be distributed pre-K. Though, pre-K is supposed to be a representation of the probability of winning, so changing the values of this probably shouldn't be done.

Okie doke. I don't think you should change the pre-K values, so I have nothing to add here.

Yes, we were trying to come up with a better way to represent "team rank" from individual ranks besides just using averages when we were first writing this system in the first weeks of ladder. Eso polled several players asking them whether a 2400 1800 1800 1800 1200 team would beat an all-1800 team and tried to come up with a formula that included variance in the equation and not just averages. However there was such a wide variety of opinion as to whether a high-variance team would be better than a low variance team that we decided to just go with average. I'm not sure what OLS is but if you can come up with a good system for this let me know.

Interesting.

OLS = Ordinary Least Squares, which I admit was a pretty terrible way to have described what I was thinking about. I was thinking of basically the same thing that Eso was: variance. Sort of. The problem with variance is that VAR(1200,1800,1800,1800,1800) = VAR(2400,1800,1800,1800,1800) because variance is the squared deviation from the mean. Instead, it would have to be something like (x[i] - avg(x))*abs(x[i] - avg(x)). So, as for Eso's question, I think variance is bad and I would have said that the all-1800 team would have won more often. However, it doesn't really matter, because whether you think high-variance is good or bad, if the teams have the same "variance", as I wrote it above, your preference is irrelevant.

I wrote a Monte Carlo simulation to see what would happen if you created teams by minimizing the difference in "variance". On average, the difference in average rankings was roughly 94. And, in 10,000 simulations, the maximum difference in average rankings was roughly 400.

If you're curious I can send you the Excel file.

No it is not important for teams to be ranked 100% evenly. The rating system should work with all varieties of matchups, the balancer only exists in order to make games roughly even (and thus more challenging and fun). This is indeed a problem I've been thinking about--currently, the highest ranked players are not necessarily the best players at altitude but are the best at being able to overcome the handicap of having the worst teammates. This is why you see an overrepresentation of bomb runners in the top tbd ranks and an overrepresentation of scorers in the top ball ranks (very few support players). I am considering loosening the balance criteria and opting for a simpler system in which the players are sorted by rating and then balancing the teams by 1, 4, 5, 8, 9, 12 vs 2, 3, 6, 7, 10, 11.

I was actually thinking about what would happen if match-ups were random. :p

Obviously there would be games that were just blowouts. However, in those games player's ranks wouldn't increase or decrease much, but should the underdog pulled off a win then ranks would change by a lot. I can see some benefits to this, but the costs probably outweigh the gains. So, just loosening the criteria is probably a better first step. I don't know if you've run any tests to see how much your suggested method might change the difference in team averages, but I'd like to see how it compares to the method I proposed above.

Rainmaker
02-09-2011, 05:15 AM
I am considering loosening the balance criteria and opting for a simpler system in which the players are sorted by rating and then balancing the teams by 1, 4, 5, 8, 9, 12 vs 2, 3, 6, 7, 10, 11.
I've toyed with this idea myself since you posted the balancing code.

It works pretty well, though you might want to add some extra code something like:

If avrg.team1 > avrg.team2 do;
....player5 -> team2;
....player6 -> team1;
else;
.

This is because sometimes ranking this way comes with a difference of + or - in avrg rating. difference.

elxir
02-09-2011, 06:01 AM
I'll bet all of you 10 bucks that the new system, whatever you might come up with in this topic, will still be ****ty. It's just not gonna work and the qq will always ensue.

this...current system is great except when one team has an absolute **** player

nobodyhome
02-09-2011, 07:29 AM
I've toyed with this idea myself since you posted the balancing code.

It works pretty well, though you might want to add some extra code something like:

If avrg.team1 > avrg.team2 do;
....player5 -> team2;
....player6 -> team1;
else;
.

This is because sometimes ranking this way comes with a difference of + or - in avrg rating. difference.


What do you mean? Allowing a difference of average rating to be greater than absolute zero is the entire point of switching the balancing system to the one I proposed. If you wanted to minimize the average rating difference then you'd just have the system I have already.

Evan20000
02-09-2011, 08:04 AM
this...current system is great except when one team has an absolute **** player

There is no system that is going to be able to compensate for someone so bad that a match essentially becomes 4v5/5v6 and let anyone who wants to play ladder be able to play.

Tekn0
02-09-2011, 09:50 AM
VipMattMan and vintage, thanks a lot for the explanation as to why 6-5 or 6-0 losses are to be viewed equally.

I can now see your argument and reasoning for having this, thanks. I think then it is correct, but still the frustration of losing 6-5 in a 20-minute match lives on :D

Rainmaker
02-09-2011, 05:16 PM
asdfghijkñl

Duck Duck Pwn
02-09-2011, 05:43 PM
We could provide a posthumous ranking in awarding points after the fact when trying to balance inexperienced players. After X amount of games, when we have a better idea of the rank of the inexperienced player, we can award points according to what his rank really was. Games will be unbalanced, but they already were unbalanced. This just would result in teams winning less points if they were more heavily favored versus winning more if they were more favored.

However, I am not pushing for this in general, as over time, players get better. Winning with a 1200 player today, but a month later seeing them become a 3200 player shouldn't have an effect on your points earned. But I think it would make some amount of sense for points, when dealing with newer players, to be handled after the fact. If we do it this way, the problems that we are faced with seem to be the following.

a. What time X would we use?
b. How would we decide to place this new player after a set number of games?
c. How do we program ladder to check after x games what this person's ranking is?
d. What if this person stops playing ladder and cannot be accurately modeled?

a is largely arbitrary, b I provide food for thought below, c is a programming issue that I can't deal with, and d is a problem already inherent in ladder.

I think a possible solution is to assume that new players in ladder are various differing ranks between, say, 500-1500 (or perhaps 500-2500 for the sake of a 1500 medium, although massive butthurt will ensue if a noob gets 2500), and then after a set number of games to award them a ranking that seems in-line with their performance. Perhaps give games where the first 3 games are a 1500 game, a 2000 ; 1000 game, a 2250/1750 ; 1250/750 game, etc. for as many X iterations necessary, depending on previous wins and losses. After their base rank has been set, then points can be awarded. It's not the end of the world, considering that ladder is already programmed to give more or less points in the case of more uneven games.

Sure, it's flawed, especially if this player wins their first game and probably shouldn't have. But it's not a problem that is either likely due to this player's likely contribution, nor something that ladder doesn't see anyway (overrated players winning).

In the case that this person just stops playing ladder, their points value can either be assumed to be whatever point level ladder assumed them to be at. While imperfect, especially if this player wins their first game and throws everything for a loop, it would still be fairly effective, I think. It may be unfair to base a ladder ranking largely based on the first game, so perhaps the system i provided is not the ideal one, but I feel that the assumption of differing values could be effective in trying to place their point in a ladder.

Thoughts/criticisms/etc are appreciated.

Pieface
02-09-2011, 06:26 PM
Not sure how this would work, but how about something like this:

Every player new to ladder would receive a "provisional" rank that is not initially displayed on the master ranking list (Yahoo has a similar method). For balancing purposes, these provisional players would receive a rating of X (you could adjust X based on what you think the average newcomer's skill is about - probably somewhere in the 1000-1500 range). Although they technically have this rating, they are still not displayed on the ranking list until they complete a certain number of games (say 20). For the time that each player is ranked as "provisional," each team playing in the same match with them would earn fewer points for a win and lose fewer points for a loss - hopefully taking into account the fact that the player is not yet correctly rated. However, the "provisional" player's rating would fluctuate greatly during this time. Using whatever method you deem appropriate (larger rating fluctuations, variable uncertainty for skill, etc.) this period would be the time in which the player's rating would be expected to settle at its correct value. By the time the player completed 20-30 games, a good system would have their rating figured about right - they could then be automatically added to the ranking list and their "provisional" status removed. As the player nears the end of their provisional status, their games should start affecting their rating less (reflecting the settling of their ranking) and their teammates rating more, until they both arrive at the normal +/-25 again.

Obviously we'd still need to come up with something that makes the new players' scores converge to their true values quicker than they do currently. However, this system might help to mitigate the huge gameplay differences that we currently experience when new players join ladder. If a new player is assumed to not have reached their definite rank and the system awards points to reflect this fact, they should be able to reach their true rating pretty quickly without hugely affecting those who have already settled.

I'd still recommend some sort of rating deflation for players who haven't participated in ladder for a while. That should prevent them from sitting on a high rank or leaving the game and coming back months later extraordinarily overrated. Maybe if someone's rating falls too much as a result of inactivity they could be automatically reassigned to the "provisional" class, so that when they return the system will take into account that their rating is probably much different since they last played and will need some time to readjust without affecting others? You could give them the benefit of the doubt and allow their "provisional" rating to begin at the same value as their former one.

Hope that made sense - if I didn't make myself clear let me know and I can try to explain better. I'll add a couple of example rating graphs later for how I envision this would work to better support the idea.

Pieface
02-09-2011, 10:09 PM
For clarification, this is what I meant (with pictures):

New players should have to play a certain number of games before they are assumed to be correctly rated. When they are still in their "provisional" state, they should not be ranked on the ladder website in the normal category and anyone playing in a game with them should not be able to win/lose as many points. After a certain point, the "provisional" status should be removed and normal endgame behavior can continue.

It would have to be decided whether the provisional status should be removed after a fixed number of games or when the player's rating has stopped fluctuating as rapidly. I personally prefer the latter, as it allows for the possibility that it would take more than a specified number of games to achieve the general range in which you should be located. To this end, I'd recommend scaling down the amount your rating can fluctuate with the number of games you play during "provisional" status and meanwhile scaling back up the amount of points correctly rated players have at stake. Eventually, both numbers should converge to 25 and the "provisional" status should be removed. At this point, the player is assumed to be in the correct range of scores and can be ranked on the website without too much consequence.

I ran some example numbers to get a visual representation of what I'm suggesting. I assumed all players starting at a correctly placed rating of 1800, with the newcomer starting at 1500. You can see that for the first few games the new player's score fluctuates wildly while the others' scores only change a small amount. As more games are played and the new player's score is assumed to be getting slightly more accurate, his rating should change less in amount and the others should return to winning or losing similar amounts to before.

Here's what my predicted rating behavior is for an underrated player:
http://farm2.static.flickr.com/1388/5432112254_a1b7ea73c3_z.jpg

And an overrated player:
http://farm5.static.flickr.com/4153/5432112250_9e8bbdb354_z.jpg

Note that I assume the player gets the same team for each game, which obviously won't be the case. If the teams are composed differently, I expect the rating shift of correctly rated players to be affected even less while the "provisional" one's rating range is being located by the system. Also remember that this is only a proposed solution for addressing the problem of newcomers to ladder and does not take into account what should happen after everyone's correctly placed.

Rainmaker
02-10-2011, 01:59 AM
Hmmm, interesting Pie face!.
I proposed something similar to what you are saying (even to the same amount of numbers for a 1st approach to the rating: 20 games, at double rate that others player K.
So if "regular" player is getting 25 points per match, newcomers should get 50.
Only that slight change in the formula, would stabilize the ranking at first.

On another matters its to decide on the K factor.
I suggested that everyone's first 20 games be calculated with a high K.
¿Why? The rating becomes more accurate as long as you keep playing. The more data entry you feed, the better it works.
This is was many were complaining for: you can't climb the leader board.
The first matches will make you loose/win a lot of points, thus ensuring a rough first approach to your rating.
After that the K should we lowered gradually to a value (and this is just criteria) to reflect skill improvement.
And, even make K slightly diminish as you play more and more games. Toying with that idea I sketched some rough numbers:

I've tryed other formulas, but a K' variable as:
K' = variable K
K = constant K (the current one is 50)
K' = (0.999^x)*K
x= amount of games, seems to be a good one.
For 500 games, you win/loss 60% from the original.
For 693 games, you win/loss 50% from the original.
After that I think that keeping K' = 0.5K would be reasonable.

f you want to rates being "guessed" faster make the K variable.
There many ways, but the principle ideas would be:

First X amount of games K is incredible huge. So you may gain/loose 100 points.

Make K gradually smaller as you rank up.
For example (just a rough approximation):

K' = (1/1.001^x)*K . X being amount of games

Example:


After 400~600 games played I would pick a constant K' = K / 2 (instead of 24 points, you only gain/loose 12)
On the good side, people on the high ranks won't be highly influenced if they keep loosing because autobalance, pairs them up with all 1000 rated people.
On the bad side, after 700 games you only gain/loss 12 points, making it harder to climb up/down the board.
I mean, if a player gets suddenly better because a patch, or his training, it won't be preceived, unless we add the "inflated K due to streaks".

I agree changing the K due to streaks. (add a factor which increases as streaks)
ie: K' = 1.2^(n-1) * K
C factor being 1.2^(n-1)
being n the consecutive games won lost.
For 2 games: K' = 1.2*K (a 20% increase)
For 3 games: K' = 1.44*K (a 44% increase)
For 10 games: K' = 5.15*K ( 500% increase)

A player who wins 5 games in rows currently only adds:
5*24 = 120 points

Using C factor:
1st: 24 (C = 1)
2nd: 29 (C = 1.2)
3rd: 35 (C =1.44)
4th: 41 (C =1.73)
5th: 50 (C= 2.5)
total: 179 points (120 base from before, plus a 59 bonus for streak)

Same happens the other way, the more you loose consecutively, the faster you lower on the ranking.