Math/Statistics major needed - Page 2

Rainmaker · #41 02-02-2011, 06:44 PM

Quote:

Originally Posted by andy

The system is built to give you a 50% winrate when you reach your rating.

Yes & No.
WHEN you reach your ideal rating (thats what ELO is meant to calculate through a large data entry). If you play against someone with your same rating you would have 50% win rate.
When you play against a random player, the system calculate that possibility by its self.
¿How? That's the value E.

E = 1/ {1+10^[(opponent rating)-(my rating)]/400}

Whats that 400? In chess a difference of 200 means you have a probability of 0,75 of winning/loosing.
As the rating is assumed to follow a normal distribution, you have to take into consideration this 200 points difference both ways.
Someone 200 points over you, and someone 200 points below you.

If we use the formula for E, considering the rating difference being -200 (i m 200 points over my opponent) my probability for winning is:

E = 1 / {1 + 10^[(1700-1900)/400]}
E = 1 / [1+ 10^-0.5]
E = 1 / [1+ 3.16228]
E = 0.240253073

I have a 0.24 probability of loosing, and 0.76 of winning. (actually there is a chance of draw)

I tried to explain this to nobodyhome:
You have to stop forcing false data to the system. That's whats screwing with the ELO system.
The auto balancer, is the assumption (organizer / community / etc) that an average rating means a fair match (a 1700 rated team against a 1700 rated team ON AVERAGE). It means that each team is on average the same rating, then making it a "fair" match, and the win rate calculated by ELO 50%.

E = 1 / [1 + 10 ^ [team1.avrg.rating - team2.avrg.rating/400]
E = 1 / [1 + 10 ^ (0/400)] = 1 / [1 + 1]
E = 1 / [1 + 1] = 0.5

So here is one thing that was messing ELO. This is why a 1500 rated player in a 1700 rated game would earn 24 points, and a 3000 rated player in a 1700 rated match would earn 24 points as well. Wait, what?
That's wrong!

Now, ELO is designed to reward players who challenge (and win) higher rated opponents. That is, because they measured rating is perceived underrated.
So ELO's system for point rewards is:

new rating-old rating = K*[S - E]

S = 1 if you win, 0 if you loose
E = probability
K = factor.

K is an arbitrary number. You could set it to 500, to 2 or to 50 (currently).
A higher K means you are making more fluctuations. Players will loose or gain too many points if few matches, so they would never stay in one rating.
A high K is used to get a first rating value that is an approximation to the "real" rating of the player.
A lower K means a steadier rating, points gained or lost affect slightly your rating. A lower K means you need a lot of games played (won or lost) to change your rating.

Quote:

This is a team game if youre ranked 3000 and the other players are ranked 1000 or 2000 you will get more 1000 players than 2000 players while the other team will have more 2000 players, by no means you will be able to win easier if you play against lower ranked players, the teams will always be balanced. You will start winning more if you improve.

This is what I mean that we are forcefeeding false data to the ELO system.

Let me try to explain this again. YOU ARE NOT SUPPOSED TO WIN HALF OF YOUR GAMES. That would be an idealist scenario where everyone is equally skill wise and equally rated.
You are supposed to have a 0.5 ratio only if you play against equally rated players to you. We were forcing this kind of data to the system by team balancing.
I agree with Esotheric on that points: this system was designed essentially for 1v1, its really hard to make work for a team based game.
¿How you simplify 5 people into one rating?
That's one of the problems we are having.
We think that the average rating of a team if the real representation of that team skill (The sum of the parts). This is what most system are criticized for, team dynamics.

I agree with Usurpers that the current system is flawed (in a very bad bad way).
Only that slight change in the formula, would stabilize the ranking at first.

On another matters its to decide on the K factor.
I suggested that everyone's first 20 games be calculated with a high K.
¿Why? The rating becomes more accurate as long as you keep playing. The more data entry you feed, the better it works.
This is was many were complaining for: you can't climb the leader board.
The first matches will make you loose/win a lot of points, thus ensuring a rough first approach to your rating.
After that the K should we lowered gradually to a value (and this is just criteria) to reflect skill improvement.
For example:
K = 50
A player rated 1500, playing in a 1700 rated game.
Will gain 38 points if his team wins
Will loose 12 point if his team looses

¿How does work?
The player is rated below the average, he is actually on that 200 point border (win 0.25, loose 0.75). So if this players wins (his team actually) the system knows he is misrated, and correct his rating adding a high value:
His new rating is 1538

If he plays again this same setup:
A player rated 1538, playing in a 1700 rated game.
Will gain 36 points if his team wins
Will loose 14 point if his team looses

And again...:
A player rated 1574, playing in a 1700 rated game.
Will gain 34 points if his team wins
Will loose 16 point if his team looses

And again...:
A player rated 1608, playing in a 1700 rated game.
Will gain 31 points if his team wins
Will loose 19 point if his team looses

And again....:
A player rated 1639, playing in a 1700 rated game.
Will gain 29 points if his team wins
Will loose 21 point if his team looses

(I assumed the team or this player won EVERY match).
His ranking changed by 139 points. It was a 5 winning game streak.
With the actual system he would gain 125 points.

Now, lets get real. Lets assume he won 5 matches, but he lost 5 as well.
After his 5 win streak, he is in a 5 loose streak:

A player rated 1639, playing in a 1700 rated game.
Will gain 29 points if his team wins
Will loose 21 point if his team looses

And again....:
A player rated 1618, playing in a 1700 rated game.
Will gain 31 points if his team wins
Will loose 19 point if his team looses

And again....:
A player rated 1599, playing in a 1700 rated game.
Will gain 32 points if his team wins
Will loose 18 point if his team looses

And again....:
A player rated 1581, playing in a 1700 rated game.
Will gain 33 points if his team wins
Will loose 17 point if his team looses

And again....:
A player rated 1564, playing in a 1700 rated game.
Will gain 34 points if his team wins
Will loose 16 point if his team looses

His rating would be 1548. WHAT?!?!?
But he had a 0.5 win ratio; shouldn't he be at 1500 again?
NO, the system takes into consideration that he was playing in a higher rated environment. He is a 1500 playing in a 1700 rated teams; his chances are supposed to be slimmer for winning. If he is winning games, means the system is underating him, thus making him climb faster the ranking.

Visceversa happens with a high rated player in a low rated environment.

Hope this helped clear out some of your doubts, and explain the system mechanics.

(KEEP in mind i used the modify system, and not the current. The current system had a constant gain of points of 24~25. Not being able to perceive a low rated player from a high rated player. He would give +/- 125 points for a win/loose streak to a 1500 rated player in a 1800 game, as well as a 3000 rated player in a 2000 rated game).

Urpee · #42 02-02-2011, 06:47 PM

Thanks Pieface, that pretty much exactly paraphrases what I'm trying to say.

The point certainly is not to make life for good players unfairly hard or bad players unfairly easy. The point is mostly to try to find a system that has desirable properties, elevate good players and drop bad ones while encouraging a good chance of overall balanced and competitive games.

I don't want to give the impression that I think that the current system is awful. I don't think it is. I think it's pretty good. I think it converges just a tad too slowly, and it predictably slows down the longer ladder is active (because overall win% directly correlates with ladder score and win% variability slows down the longer one plays).

Basically the goal is to keep the score agile but reflective of skill/performance. The simple alternative is to simply reset ladder more frequently. That works fine and it does essentially achieve the same goal.

andy · #43 02-02-2011, 06:49 PM

Once you reach your rating you should win 50% of your games.

Ill add on later.

Rainmaker · #44 02-02-2011, 07:04 PM

Quote:

Originally Posted by andy

Once you reach your rating you should win 50% of your games.

Ill add on later.

No. You will have 50% of winning only if you play equally rated players.
The assumption that equally rated teams = equally rated players is an approximate; nothing more, nothing less.

You ill achieve a 0.5 win rate only when you play against equally rated players to you.
If you are rated 1680 and lets assume this is your "real" rate. If you keep playing people rated 1680, you should win 0.5 of the match. Thus, your rate shouldn't fluctuate more than 1730 and below 1630.

sunshineduck · #45 02-02-2011, 07:34 PM

just a disclaimer i have no idea about any of this math or if it's even possible but..

i would really like to see plane choices implemented into the team balancing formula somehow. like, player X plays plane A 80% of the time, and the formula could account for that somehow. would actually make for more fun and balanced games than having 5 mirandas on one team just because their elo ratings fit together perfectly

Rainmaker · #46 02-02-2011, 07:52 PM

Quote:

Originally Posted by sunshineduck

just a disclaimer i have no idea about any of this math or if it's even possible but..

i would really like to see plane choices implemented into the team balancing formula somehow. like, player X plays plane A 80% of the time, and the formula could account for that somehow. would actually make for more fun and balanced games than having 5 mirandas on one team just because their elo ratings fit together perfectly

That should be server sided. You shouldn't be able to pick miranda if there are already 2 mirandas on your team. Which a quite simple and efficient rules.
No more 2 of the same. Worst case scenario: 2 randas, 2 biplanes, 1 loopy.

If its supposed to be a ladder, people will agree (or be force.agreed) to choose bomber or explodet too.

On another side of this kind of suggestion:
Make the ladder choose a total of 12 players. Only 5 would be playing, but in case of disconection or leave, the remaining is able to join. Focus on this, it isnt a 6 people team, there should be some random spectator picked so he can fill in the slot left.

blln4lyf · #47 02-02-2011, 08:16 PM

I fully understand what you are saying Into The Walls...but there is an issue with this. If you are rated 3000 in a average 1700 game, you still shouldn't get less points if your team's total average is 1700, because even though you are 3000, the other 5 on your team will have an average rating of 1440, which is meant to balance your team to the other team.

Why would someone rated higher lose more points if the total team average is the same as the other team? They wouldn't, as it is a double disadvantage if you stack them with lesser teammates AND lesser winning potential.

ELO is a 1v1 ranking system, and what you are saying is 100% correct if ladder was 1v1, but being a team game it changes how you have to address ladder and the system, and if you set it up in such way the ladder rankings will fail miserably. I understand the current system can be upgraded, but not like this. You are applying 1v1 type data to a team game, when in reality, by your "false" data claim does nothing to change the fact that team A will have the same chance of winning, meaning that the player rated 3000 will have the same change of winning as the player rated 1000 on his team. They should not be judged differently UNLESS there is also something in place that makes the player rated 3000 somehow independent from the player rated 1000, which CANNOT happen in a team oriented game.

The team has to be treated the same, as it is team versus team, not (player 1 on team A vs player 2 on team A) vs (player 1 on team B vs player 2 on team B).

blln4lyf · #48 02-02-2011, 08:28 PM

Quote:

Originally Posted by Pieface

To play the devil's advocate here, I believe what Urpee was trying to get across is that the situation of being underrated only benefits you if you assume the other team is correctly rated to start with. If you have someone equally underrated (or overrated) on the other team, you will still win about 50% of the time even though compared to the total ladder population's skill you should be rated higher. In essence, the prevalence of misrated players in ladder prevents you from following your predicted ranking trajectory: winning if you're underrated and losing if you're overrated.

I've also experienced these huge win/loss streaks that dramatically change your rating, but I wouldn't necessarily attribute them to ladder working the way it was designed to. If they do follow from that, it's clear that you need a certain set of conditions to achieve a large change in ranking. These situations only come every so often, which is why it takes so long to achieve your true rating. In turn, the fact that you haven't yet achieved your predicted rank prevents others playing with you from following their projected behavior as well. With the current system it's a cycle that's only broken when you get games where everyone except yourself is perfectly rated.

I understand this, and have understood that Urpee was saying this, but while its true to an extent, it isn't the case in full.

If you are underrated, after playing enough games(say 100) you WILL overcome any such obstacles you described that can hold you back from reaching your true ranking. Note that your true ranking is usually plus or minus 200 points from where you are after a good amount of game, which is decently accurate.

As for proof, I've already stated my ball TA story how I climbed fairly quickly at 60% or above when I changed my playstyle, and also when I introduced my smurf to ball ladder I hit 2200 rating or something with like a 70% win percentage, showing that random variables aside, if you are underrated, you will climb and make that up, and it won't take as long as Urpee suggested. Point blank, his first 50 or so games that he says has caused him to still be underranked, have virtually no effect on him anymore because he has played 400+ games since then and that is WAYY more than enough for him to reach whatever his true value is.

blln4lyf · #49 02-02-2011, 08:34 PM

Quote:

Originally Posted by IntoTheWalls

No. You will have 50% of winning only if you play equally rated players.
The assumption that equally rated teams = equally rated players is an approximate; nothing more, nothing less.

You ill achieve a 0.5 win rate only when you play against equally rated players to you.
If you are rated 1680 and lets assume this is your "real" rate. If you keep playing people rated 1680, you should win 0.5 of the match. Thus, your rate shouldn't fluctuate more than 1730 and below 1630.

Dude, but you don't penalize a high rated player for playing with lesser rated players when the teams are equally rated BECAUSE the teams are equally rated. It may be an approximate, but it is still just that. If you want to penalize higher rated players from an 1v1 ELO standpoint, then you pretty much have to put all the higher players on the same team and all the lower players on the other team and say, okay since these players/team are much higher rated, they will have a great chance of winning so they only get +5 if they win while they get -50 if they lose. And frankly, that is stupid because the games won't be close. You gotta stop thinking of it from an ELO standpoint because you are letting the basic logic eclipse you due to the fact that you are trying to forcefeed 1v1 logic into a team game by treating each individual player as their own entity. THAT IS WRONG.

Rainmaker · #50 02-02-2011, 08:47 PM

Quote:

Originally Posted by blln4lyf

I fully understand what you are saying Into The Walls...but there is an issue with this. If you are rated 3000 in a average 1700 game, you still shouldn't get less points if your team's total average is 1700, because even though you are 3000, the other 5 on your team will have an average rating of 1440, which is meant to balance your team to the other team.

Why would someone rated higher lose more points if the total team average is the same as the other team? They wouldn't, as it is a double disadvantage if you stack them with lesser teammates AND lesser winning potential.

A 3000 rated player in a 1700 rated game is supposed to have a winning probability of 0.999437974.
Take into account WHAT would take for someone to get to rating 3000. He would have to be extremely good according to this "new" rating.
To the "actual" rating he would just have to have more than 60 wins than looses.
Having 63 wins / 0 looses is rated equally as someone who has 1000 wins / 937 looses.

This is what happens with the current rating system:
Someone rated 3000, playing against a bunch of 2000 will lose 25 points if he looses the match.
The same person playing against 1000 rated people will still loose 25 points.

¿Whats the difference between been ranked 3000 and 1000?
You have intended or unintended crippled the rating system.

Rainmaker · #51 02-02-2011, 09:07 PM

Quote:

Originally Posted by blln4lyf

Dude, but you don't penalize a high rated player for playing with lesser rated players when the teams are equally rated BECAUSE the teams are equally rated.

I'm not penalizing them on purpose. It's how it works.
I know that it is messed up. But that feature, was introduced not to have unbalanced teams (a bunch of 3000 rated against 1000 rated people)

Quote:

It may be an approximate, but it is still just that. If you want to penalize higher rated players from an 1v1 ELO standpoint, then you pretty much have to put all the higher players on the same team and all the lower players on the other team and say, okay since these players/team are much higher rated, they will have a great chance of winning so they only get +5 if they win while they get -50 if they lose. And frankly, that is stupid because the games won't be close. You gotta stop thinking of it from an ELO standpoint because you are letting the basic logic eclipse you due to the fact that you are trying to forcefeed 1v1 logic into a team game by treating each individual player as their own entity. THAT IS WRONG.

I understand but the problem is the other way around, a 1v1 system is trying to be forced into a 5v5.
Somehow you/we have to find a way to simplify those 5 people into 1 rating:
peoples skills
team dynamics
past experience
etc

As i said this is one of the most important critics to the system adaptation. It's not only us who encounter this problem, every multiplayer system has this. Microsoft encountered this and designed TrueSkills.
Chess Tournaments encounter this problem (and many more like inflation/deflation).

¿Your critic is against the higher ranked players loosing too many points?
There is a solution to that, and has also been introduced to chess.
Make K vary with the rank

* Players below 2100 -> K factor of 32 used
* Players between 2100 and 2400 -> K factor of 24 used
* Players above 2400 -> K factor of 16 used

Why? some filters or rules tend to leave gaps for unintended bad habits in players.
For example opponent picking. Chess players would play against a highly rated computer, with a previously known strategy that worked against, meaning free ELO points for them.
Some players would "stop" playing to keep high rank.

How will affect a diff K factor?

A 2000 player playing in a 1700 rated game:
P of winning = 0.849

K = 60
Win: +9
Loss:-51

K = 50
Win: +8
Loss: -42

K = 40
Win: +6
Loss: -34

K = 30
Win: +5
Loss: -25

K = 20
Win: +3
Loss: -17

K = 15
Win: +2
Loss:-13

Mind you ballin: with the new system is highly improbable of having someone with 3000 rating like we do now. My guess is most rating will be close to each other, between 1000 and 2000. There won't be 1 player ahead for 500 points like now. The 1st could be 2224, and the 2nd would be 2220. But, it wouldn't be easy for a player ranked 2000 to reach 2230.
Why? because there aren't people ranked 3000, so he would have to play certain amount of 1800 rated games to win those 200 points (my guess is around 12 consecutive wins for a 2000 rated person in 1800 rated games)

If you don't like ratings so close you can always change the scale. (1500 zero sum, and 200 being a rate breakthrough)

Any Ladder Organizer mind clarifying this?
First Eso implemented his modified ELO system (in which the skill difference was calculated as "team 1 avrg"-"teams 2 avrg")
Later an autoteambalance feature was added.

Pieface · #52 02-02-2011, 09:33 PM

Quote:

Originally Posted by blln4lyf

If you are underrated, after playing enough games(say 100) you WILL overcome any such obstacles you described that can hold you back from reaching your true ranking.

But that's exactly the problem. If the system necessitates playing at least 100 (most of the time more) imbalanced games before you even get close to achieving your "true" skill rating, then it's not working as efficiently as it could be. Factor in the almost constant influx of people who play a certain game mode sporadically or are new to ladder and you effectively ensure that there are very few games that are balanced according to the players' "true" skill since the newcomers have not played enough to correctly balance the teams.

To be honest, I think what makes establishing a rating system so hard for ladder is the presence of our current autobalance system. It's extremely difficult to come up with something good that would also meld with the way teams are assembled at the present.

A Nipple · #53 02-02-2011, 10:54 PM

it's important to remember peoples skills fluctuate. At least speeking for myself

Niipneeep

=]

andy · #54 02-02-2011, 11:38 PM

Quote:

Originally Posted by IntoTheWalls

A 3000 rated player in a 1700 rated game is supposed to have a winning probability of 0.999437974.
Take into account WHAT would take for someone to get to rating 3000. He would have to be extremely good according to this "new" rating.
To the "actual" rating he would just have to have more than 60 wins than looses.
Having 63 wins / 0 looses is rated equally as someone who has 1000 wins / 937 looses.

This is what happens with the current rating system:
Someone rated 3000, playing against a bunch of 2000 will lose 25 points if he looses the match.
The same person playing against 1000 rated people will still loose 25 points.

¿Whats the difference between been ranked 3000 and 1000?
You have intended or unintended crippled the rating system.

You seem to not understand the current system at all. You get +25 because your winning percentage is 50% (and this is because the teams are AUTOBALANCED).
If you are rated 1000 or 3000 you will still be put in the conditions of having a 50% win chance.

Answer this:
Lets say we have 2 balanced teams

Team A
1 Pro 3000 rating
2 vets 2000 rating
2 noobs 1000 rating

Team B
1 Pro 3000 rating
2 vets 2000 rating
2 noobs 1000 rating

How the hell could the guy rated 3000 have a .9 chance of winning and the 1000 guy on his team have a .1 chance of winning..!??

You will be rated 3000 if you can stand getting better players against you if you are rated 1000 but your skill is like a 3000 player you will start getting a bigger winning percentage till you get to your true rating at that point the winning percentage will tend to 50%.

Ribilla · #55 02-03-2011, 12:44 AM

Quote:

Originally Posted by andy

You seem to not understand the current system at all. You get +25 because your winning percentage is 50% (and this is because the teams are AUTOBALANCED).
If you are rated 1000 or 3000 you will still be put in the conditions of having a 50% win chance.

Answer this:
Lets say we have 2 balanced teams

Team A
1 Pro 3000 rating
2 vets 2000 rating
2 noobs 1000 rating

Team B
1 Pro 3000 rating
2 vets 2000 rating
2 noobs 1000 rating

How the hell could the guy rated 3000 have a .9 chance of winning and the 1000 guy on his team have a .1 chance of winning..!??

You will be rated 3000 if you can stand getting better players against you if you are rated 1000 but your skill is like a 3000 player you will start getting a bigger winning percentage till you get to your true rating at that point the winning percentage will tend to 50%.

THIS

you always have a 50% chance of winning (assuming that 1 pro + 1 noob == 2 vets) because there is a equivalent counter-part on your team.

If someone had 0.9... chance of winning then whoever was on their team would also have that same chance of winning. IMO the system needs only minor adjustments.

Rainmaker · #56 02-03-2011, 01:45 AM

Quote:

Originally Posted by andy

You seem to not understand the current system at all. You get +25 because your winning percentage is 50% (and this is because the teams are AUTOBALANCED)

Exactly, thats my point.
Why would you implement an ELO system and cripple it. While you an just reward winning teams with +25 points and loosers -25 points.
The current rating system isn't doing anything, it doesn't touch your rating.

Quote:

If you are rated 1000 or 3000 you will still be put in the conditions of having a 50% win chance.

You will be placed in a team, which in average has a 50% chance to win.
You are mixing the correlation between individual player rating and average team rating

Quote:

Answer this:
Lets say we have 2 balanced teams

Team A
1 Pro 3000 rating
2 vets 2000 rating
2 noobs 1000 rating

Team B
1 Pro 3000 rating
2 vets 2000 rating
2 noobs 1000 rating

How the hell could the guy rated 3000 have a .9 chance of winning and the 1000 guy on his team have a .1 chance of winning..!??

Someone rated 3000 is expected to have a 0.9 winning chance in 2000 rated environment. Because as he is most skilled (proved by previous rating) he is expected to perform much greater than the 2000 rated guy, and even greater than by a 1000 rated guy.
On a different matter, the team has a 50% chance of winning.
So, the pro will win only (hypothetically, lets say the math is 5 points) +5
The vet gets +10 and the noob gets +30.
Why the discrepancy?
The noob isn't expected to outperform in a 2000 rated environment (team rated); but he did. Obviously he is underrated or was a statistical abortion. So his rate gets a huge boost
That is bloody unfair! the vet carried the whole team on his shoulders
It wasn't expected nothing from the noob, though somehow he must 've contributed for the team to win.
If it was pure chance, then most games in a 2000 rated environment he will lose, this loosing points, given to the "rightful" owners.

My point is that with the new system you will no longer have those overrated players.

Quote:

You will be rated 3000 if you can stand getting better players against you if you are rated 1000 but your skill is like a 3000 player you will start getting a bigger winning percentage till you get to your true rating at that point the winning percentage will tend to 50%.

Excuse me the expression, but thats pure bullcrap.
I 've stated with math basing it.
The only way to get a 50% win chance if you play someone your skill level (your same rate).
The win % then is calculated depending on both players rating. It has NOTHING TO DO if you have achieved your TRUE rating or not.
If you stare long enough to the formula you will see it:

E = 1 / [1 + 10^[(His rating - my rating)/400]
E = chance to loose
1 - E = chance to win.

PERIOD, there is nothing more to it. the only variables are YOUR rating and your OPPONENTS rating. If you keep winning against bad odds (1 out of 4 games only wins), then the system gives you a boost, because you are being underrated.
If you achieved a high rate (2000), but you lack skills and were just lucky on the team balance on the last 5 matches, then on the next matches you will see your rate drop drastically, because you were overrated due to abnormalities (playing with high rate players in a high rate environment)

Let's take for example the current rating system and the top #25:

Players like Nipzor shouldn't be so close to the top: 0.5 win rate, and only 33 neat difference (win-lost).
eth & mikesol with only 40 neat difference, and only 100~180 games played.

The current system could be easily replaced for what i said first:
1. Autobalance system
2. add +25 to each player on winning team, withdraw -25 points to each player on looser team.

Because the only way to climb to the top is to have a neat difference of X games (between 40 and 60)
So it doesn't matter if your win ratio is 0.7 with only 100 games played or a win ratio of 0.52 with 1000 games. So long as your diff of win-loss > than 60 you will be on the top of the leaderboard.

Ladder has to be reset for ANY new system to be implemented correctly, if not it will continue to carry the previous system's flaws (over/under rated players)

elxir · #57 02-03-2011, 02:35 AM

but uhh, nipple is like the second best bomber on that list sooo obviously something is working

nobodyhome · #58 02-03-2011, 02:47 AM

This is ridiculous. There is so much misunderstanding of the concept behind ELO and the current system's implementation of it here in this thread. Urpee's proposed system (where you average either the other team's ratings or the all the players' ratings and then compare it against an individual player) is plain wrong. Don't try to "argue" for the system because it is simply mathematically wrong.

In my previous post I posted two links in which the system was explained and justified, please read those two links before you try to post against it. If after reading those links you still do not see why Urpee's proposed system is incorrect, please don't post here arguing that Urpee's system is actually better than the current system. Because it is not. Rather, you may post questions asking why it is wrong in order to gain a better understanding. I will not let this thread devolve into people arguing for something mathematically incorrect and then having the people who actually understand the system shoot them down, and then arguing back and forth to see who is right.

This is not to say that the ladder system is completely fine (otherwise we would not be having this thread at all). There are flaws in the system, but Urpee's proposal is not a solution. First of all let's examine how the system actually works. Given two teams, it first assigns a probability that one of the two teams will win using the following formula:

E1 = 1 / [1 + 10^ ([(Avg rating of team2)-(Avg rating of team1)] / 400)]

It then takes the probability and, depending on whether the team won or lost, assigns each team a point gain or loss:

New Rating = Old Rating + [ 50 * ( (1 if won, 0 if loss) - E ) ]

Notice if you find the expected value of this match (probability of win * point gain if win + probability of loss * point decrease if loss), you will find it to be exactly 0. This is because if these two teams are rated correctly, then there should be no change. If they are rated incorrectly, then the result of a series of games should result in each team's ratings moving closer to their correct value.

Now, one of ladder's primary problems is that sometimes, people take too many games to converge to their correct rating. We can solve this by replacing the "50" in the equation with a variable, K, which represents the uncertainty of one's rating. Notice how changing the 50 to any other value will not change the expected value of the equation at all. This is desirable. We can fluctuate the K up and down based on several things. For one, K can start off large for a new player (whose rating is very uncertain as they have not displayed their skill much yet) and decrease the more games you play. For two, K can also increase if you start streaking (either win streak or loss streak), as this can mean that your skill suddenly is changing because you are no longer balanced correctly. For three, K can increase if you also undergo a long period of inactivity (the ladder is no longer certain of your rating because it has not seen you play much recently). We will not be discussing the implementation of this new K variable here, as I have found extensive sources to read up on this and thus it shouldn't be a problem (if you have any suggestions/questions on this particular topic feel free to shoot me a PM or talk to me on altitude).

The second problem (the one I'd like to discuss now) is that we are taking ELO and forcing an adaption of it for team games. Notice how in the formula description above, nowhere in the entire thing is the concept of an individual player even mentioned. This is a reflection of the nature of how we are ranking things. Basically, the only way that we can test skill in Altitude is by gathering two teams, and then pitting them against each other. Consider each game to be a "test", and the output of this test is either "team1 wins" or "team2 wins". Now, if say, team1 wins, then this is a datapoint from which we can gather that team1 played better than team2 in this particular test (this game) and this test only. We would like to reflect this result in the ratings themselves so we decide that team1 should get some points and team2 should lose some points.

However, here's where it gets fuzzy: We decide that team1 as a whole has played better than team1's rating. Here we have defined team1's rating to be "the average of the players of team 1's ratings", but this is not necessarily true. Because of things like synergy, a team consisting of five players rated 2000 may not necessarily be just as good as another team consisting of five players rated 2000 (plane composition comes to mind here). How do we take a team composed of five individual player ratings and use that to form a composite "team rating"?

Furthermore, in our current system we assume that if team1 beat team2, which means that team1 played better than its aggregate rating, this means that each of team1's players played better than their individual ratings. We thus reward each player in team1 with equal amounts of points. This is also not necessarily true--it may be that players A, B, and C in team1 played better than their ratings and players D and E in team1 played worse than their ratings. Without looking into the actual in-game factors (individual kills/deaths, bomb hits, etc), is there a better way we can determine the distribution of points to the winner other than just "everybody gets the same"?

ryebone · #59 02-03-2011, 02:59 AM

Just to add a bit of history here, in case anyone has forgotten or didn't know. When ladder was first implemented, it was the standard ELO system, but without any sort of autobalance mechanism. Unfortunately, there were two major flaws with this system:

1) Teams were usually picked by arbitrarily-chosen captains. This process was tedious beyond belief. It often took ten minutes to organize an eight-minute game.

2) People began dodging teams. If they felt the team they were picked to didn't stand a chance of winning, they would say something like "I'm busy" and refuse to join. From a personal-gain standpoint, I totally understand that logic; it's sometimes beneficial to stay away from huge risks for huge gains, and play only when the chance of winning is higher. This is especially true when a large portion of the final result is dictated by forces outside of their immediate control, aka teammates.

For those two reasons, autobalance was implemented. It's clearly a flawed solution, as it takes an already-flawed system (using a 1v1 rating model in a 5v5 setting) and placing additional restrictions on it. But from what I have experienced, it actually works fairly well in rating players relatively appropriately.

I'll be honest and say that I tl;dr'ed most of the mathematical posts in this thread (sorry ITW), but from skimming the thread I take it that the two main issues that's being discussed is 1) the time it takes to get from 1500 to your appropriate rating, and 2) the staleness of always having 50/50 games.

For the first, it can be fixed by allowing for huge rating variability for a person's first x amount of games, and gradually decrease that variability as the person plays more games. A moving average could also by implemented by increasing the rating variability if someone enters a trend of winning/losing (say, 60-70% win or loss over the last 10 games), which could happen if someone decides to try a new plane, has an epiphany and suddenly gets better, etc. Obviously this wouldn't be zero-sum, but that's the least of our concerns.

For the second, I personally think it would be good to give the autobalancer some leeway, and allow games to have up to 30-20 point changes. As it stands now, your record directly correlates with your rating, which makes the whole thing relatively pointless. Allowing up to 30-20 point swings will introduce a fair level of variation to keep things interesting without completely overtaking the current system. It would also make spectating more interesting (in specchat of course), when there is a clear underdog to root for.

Pieface · #60 02-03-2011, 03:23 AM

Totally off subject, but are there any plans for some sort of rating degradation in ladder 2? It's sort of strange to have a large amount of people in the top 30 who simply played enough to make it there and then quit.

Rainmaker · #61 02-03-2011, 04:06 AM

@ ryebone:

Looking it far away (after reading the WHOLE Altitude ladder, where many complains against this system were made).
I've come to this:

If the system "feels" accurate in your opinion keep it as it is.

As i recommended Nobody on irc:
If you want to rates being "guessed" faster make the K variable.
There many ways, but the principle ideas would be:

First X amount of games K is incredible huge. So you may gain/loose 100 points.

Make K gradually smaller as you rank up.
For example (just a rough approximation):

K' = (1/1.001^x)*K . X being amount of games

Example:

After 400~600 games played I would pick a constant K' = K / 2 (instead of 24 points, you only gain/loose 12)
On the good side, people on the high ranks won't be highly influenced if they keep loosing because autobalance, pairs them up with all 1000 rated people.
On the bad side, after 700 games you only gain/loss 12 points, making it harder to climb up/down the board.
I mean, if a player gets suddenly better because a patch, or his training, it won't be preceived, unless we add the "inflated K due to streaks".

I agree changing the K due to streaks. (add a factor which increases as streaks)
ie: K' = 1.2^(n-1) * K
C factor being 1.2^(n-1)
being n the consecutive games won lost.
For 2 games: K' = 1.2*K (a 20% increase)
For 3 games: K' = 1.44*K (a 44% increase)
For 10 games: K' = 5.15*K ( 500% increase)

A player who wins 5 games in rows currently only adds:
5*24 = 120 points

Using C factor:
1st: 24 (C = 1)
2nd: 29 (C = 1.2)
3rd: 35 (C =1.44)
4th: 41 (C =1.73)
5th: 50 (C= 2.5)
total: 179 points (120 base from before, plus a 59 bonus for streak)

Same happens the other way, the more you loose consecutively, the faster you lower on the ranking.

(wrote down the formulas so nobo doesn't get mad

)

Urpee · #62 02-03-2011, 04:28 AM

Quote:

Originally Posted by nobodyhome

This is ridiculous. There is so much misunderstanding of the concept behind ELO and the current system's implementation of it here in this thread. Urpee's proposed system (where you average either the other team's ratings or the all the players' ratings and then compare it against an individual player) is plain wrong. Don't try to "argue" for the system because it is simply mathematically wrong.

In my previous post I posted two links in which the system was explained and justified, please read those two links before you try to post against it. If after reading those links you still do not see why Urpee's proposed system is incorrect, please don't post here arguing that Urpee's system is actually better than the current system. Because it is not. Rather, you may post questions asking why it is wrong in order to gain a better understanding. I will not let this thread devolve into people arguing for something mathematically incorrect and then having the people who actually understand the system shoot them down, and then arguing back and forth to see who is right.

My math isn't wrong, at least no more wrong than what's already there. But rather than just say that you don't like my proposal you have to do this.

But if we must, go ahead and explain why my math is "wrong". I'm used to having my work critiqued. Happens all the time if you publish math in academic journals. Will be educational to discuss additivity of expectation values, the expectation of a single sample against a population mean and modeling of mixed expectations.

For reference, what I have done is added a model for the expectation of individual in the team compared to the team average. There is nothing wrong with setting up such a model, one can choose to model it or not. That neither right nor wrong, but a modeling choice. And yes, the current model does not do it. But that is exactly the crux of the problem. ELO was never designed to operate on randomized ensembles of ELO scores and taken the mean. In fact it's clearly not trivially valid and some of the symptoms we see are precisely because the current method are treats randomized ensemble expectations as if they are single sample expectations. Now the right thing to do is something like TrueSkill, hence why I have favored it. That was too complex, so I was left with brainstorming a simple solution. It models a mixed expectation. Now how much a player contributes to the team is a model assumption that depends on the game and other unknown factors. One sure could try to formalize this, but all it would change is the weights I propose. I submit that such a formalization is hard, but am happy to be convinced otherwise.

But let's back off. I was under the impression the goal was to brainstorm various solutions. That's what I did. I offered a suggestion. That's all it is. Don't like it, don't take it. But don't make up some story that supposedly the math is wrong and I don't understand what is on the table. I perfectly fine understand what's there. I'm just making suggestions that happens to be not exactly the same as what is currently there. No need to worry or instruct everybody to understand how wrong the math supposedly is. Just say you don't like it. It's quite sufficient.

Rainmaker · #63 02-03-2011, 06:32 AM

I have to agree with Usurpeer here.
The way you are trying the ELO model is absolutely wrong. Esotheric pointed that out in the beginning (autobalance would definitely screw what he proposed the first time).

So you have kicked, twisted, and distortioned the model (a square block) into what fills your assumptions (a round gap).
I 've understand, by the way you expressed your opinions, that you are not planning to change that point of view you have. Not because you haven't seen the evidence, but because you are not willing to put it under the microscope.

It's frustrating arguing with someone that's not fully capable of understanding the math behind the model (not you specifically nobo, talking about the thread in general). This is in no way patronizing, but because plenty of people have come up with a bunch of irreal ad-hoc examples where they think to have proven wrong. Though they can't neither present the math that's backs their assumptions, nor confirm in any way their predictions.

Just as a mere way of showing one of my points:
Nobody stated that you can't compare "avrg rating" and "player rating".

Quote:

Originally Posted by nobodyhome

There is so much misunderstanding of the concept behind ELO and the current system's implementation of it here in this thread. Urpee's proposed system (where you average either the other team's ratings or the all the players' ratings and then compare it against an individual player) is plain wrong. Don't try to "argue" for the system because it is simply mathematically wrong.

Maybe realistically its hard to think that the uncertainty between them makes them no way comparable.
But, from a mathematical point of view, if this two variables are proven* to follow a Normal distribution (mu, sigma; as parameters); one of the most important properties is that the addition or subtraction of two Normal variables corresponds, to a new Normal variables, with its parameters being: mu: "mu1-mu2" and sigma "sqrt(sigma1^2 + sigma2^2)".

There are millions of examples that backs up this little mathematical property.
To calculate the weight of a 6 pack:
each bottle follow a weight as: 1 bottle (mu=500 grams, sigma=70), and the cardboard (mu=100,sigma=20).
The total weight would be: bottle1+bottle2+...+bottle6 + cardboard:

mu.weight= 500*6 +100 =3200
sigma.weight = 420
We could state with a 90% confidence that the minimum weight of a six pack is: 2662 grams = 2.6 kilograms.

There are a lot of practical applications, most are used in production control (to check that you keep your faulty production ratio below certain proportion, like 0,05)

See? you can mix the same kind of measurement from 2 different type of objects.
For ELO model: the mix variable is rating, the objects are "team" and "players".
If someone doesn't know this, how can I expect to have a reasonable discussion over what ELO model is best?

I know this sounds really cocky, it's not my intention (maybe I suck at expressing myself in English), but it's my point of view on the matter.

Keeping focused for Ladder Season 2: i've provided you with some examples nobo, I'm still willing to cooperate with you on the project.

*: There is a Theorem known as "Theorem of the Central Limit" which stats that ANY TYPE (could follow any distribution such as: Gamma, Beta, Patetto, Weilbull, Maximum Gumbel, Minimum Gumbel, exponential, and a large etcetera) of variable is added over 30 times; or its mu/sigma coefficient<0,2; then it can be approximated by a Normal Distribution

EDIT:

Quote:

Now, one of ladder's primary problems is that sometimes, people take too many games to converge to their correct rating. We can solve this by replacing the "50" in the equation with a variable, K, which represents the uncertainty of one's rating. Notice how changing the 50 to any other value will not change the expected value of the equation at all.

This is not technically correct. K doesn't represent the uncertainty of rating.
K can be any number you want. (you couldn't pick arbitrarily the uncertainty now, could you?)
K is a factor that changes the scale in which you want the points to be assigned based on the probability of winning or losing. If you want big trades of points you are going to assign a high K value, if you want a low transaction of points you will use a low K value.
What's the difference? a high K value is desired to get a first approach on someones rating (everyone starts at a zero sum, like 1500 for altitude, in chess is 1000).
But, a high K value has too many sensitivity, it will change drastically your rating in a few games; so to widen the specter in the high rankings (2400 and above), you want a low K value. That way its easier to follow an expert's skill improvement; and the influence of a statistical abnormality is reduced.

¿What measures uncertainty then?
In the E formula:

E= 1/{1+10^[(opp rating - my rating)/400]}
That 400 is containing the uncertainty information. That 400 represents the 200 to the right and the 200 to the left to the mean of the Normal distribution.
In ELO, a 200 difference rating, is considered as the top player having a 0.75 probability of winning.

new rating = old rating + K* [S-E]
And here is K, and you can see that higher K means higher points 'added to'/'drawn from' your rating.

Tekn0 · #64 02-03-2011, 08:32 AM

Quote:

Originally Posted by sunshineduck

just a disclaimer i have no idea about any of this math or if it's even possible but..

i would really like to see plane choices implemented into the team balancing formula somehow. like, player X plays plane A 80% of the time, and the formula could account for that somehow. would actually make for more fun and balanced games than having 5 mirandas on one team just because their elo ratings fit together perfectly

I cannot agree with this more.

In fact why not club it into groups? Like say player X plays Loopy 50% & Miranda 50% of the time, he is considered an offence player, then player Y plays (bomber or whale or biplane) he can be considered defense and support. Some players will be more or less same % of all planes but then they can be placed on either team based on their rating.

Then balance teams to have 2 or 3 offense players. Of course there can be nothing done if all 12 players are play offense but clearly that usually does not happen.

Yes I know the "people should play all planes or one heavy, one light plane" argument but in reality not everyone are equally good at all planes.

Point is, the auto balance algorithm can take these statistics into account too.

Stormich · #65 02-03-2011, 08:37 AM

Quote:

Originally Posted by Tekn0

I cannot agree with this more.

In fact why not club it into groups? Like say player X plays Loopy 50% & Miranda 50% of the time, he is considered an offence player, then player Y plays (bomber or whale or biplane) he can be considered defense and support. Some players will be more or less same % of all planes but then they can be placed on either team based on their rating.

Then balance teams to have 2 or 3 offense players. Of course there can be nothing done if all 12 players are play offense but clearly that usually does not happen.

Yes I know the "people should play all planes or one heavy, one light plane" argument but in reality not everyone are equally good at all planes.

Point is, the auto balance algorithm can take these statistics into account too.

This isn't as simple as it sounds+your solution only works on ball.

Tekn0 · #66 02-03-2011, 09:34 AM

Quote:

Originally Posted by Stormich

This isn't as simple as it sounds+your solution only works on ball.

if you have the stats for it, it shouldn't be really hard to implement. Also yes only ball, is it not possible to have this "additional" criteria taken into account if the game mode is ball?

Again we only classify as player as offense (eg. if they play > 85% of the time as loopy or miranda or biplane) and we classify as defense if they play (85% of the time as whale or bomber). For others we classify them as "Whatever" class and we need not bother with what plane they will play.

Also I think since Random was ruled out in Ladder this system makes sense.

Extra checks during autobalance algorithm for game mode ball:

1. Balance teams using the existing rankings etc.

2. Count number of offense players assigned to Left team as cL.

3. Count number of offense players assigned to Right team as cR.

4. if cL == cR, do nothing, exit.

5. if cR - cL == 3, else go step 6. Right team has 3 more offense players than Left, get average of all cR rankings of offensive players and move the offensive player closest to average to Left team. Take one defense player of closest ranking to chosen offense player (who will be switched) and assign the player to the Right team. (essentially swapping offense and defense averaged players). If we don't have sufficient defense players we can use "Whatever" class.

6. Repeat step 5, for cR - cL == 4 and 6 and move 2 and 3 players respectively.

6. do same as step 5 and 6 except reversive cL and cR.

I just wrote this algorithm off hand without too much thought.. feel free to discuss. I'm sure this algorithm can be tweaked a lot better but I don't have time at the moment to put more thought into it.

Just a very very rought draft.

VipMattMan · #67 02-03-2011, 12:56 PM

Quote:

Originally Posted by Urpee

Not at all. This is the myth that seems to be encoded in this discussion.

Let me give you a toy ladder. 12 people compete and we seed them randomly with these scores: 4 have 3000, 4 have 1500, 4 have 0. But they actually are all equally good.

We play and in fact it turns out that everybody wins 50%. The system will allow this and given people's actual skill they maintain their 50% ratio.

This is our current system. Is this working? You truly want to claim that people's scores are well reflected and converge properly?

It's a myth that someone playing at an overall 50% win ratio is properly ranked. The system encourages locking them into place whereever they are and no matter their actual correct ranking.

If the system worked, we would get a convergence of everybody to 1500 and this convergence would be sensibly fast. Currently there is no such mechanism. Because the player who is scored 0 competing in an average 1500 game gets no benefit over a player who is scored 3000 in a 1500 game. That is the convergence mechanism that would be needed to fix this example, but it's nowhere to be found.

And there is this myth that just because I have a 1500 now and a 50% win ratio that it's swell. In fact it may just be that me being misranked gets matched against another player who is misranked and we end up at 50% win and don't move. Only if the system converged properly would it be fair to assume that that other player I'm balanced against is actually about at the right spot. But it's blatantly obvious this isn't the case. You will find people above 1500 who aren't all that good. And you will find people below 1500 who are quite good. The reason for this is simple: The evaluation has gone awry and once you are at a wrong spot the system has insufficient correctives.

The semi-random nature of auto-balancer and times that different people play creates that convergence. Your perfect-world ladder where there would be no change in ratings with players of equal skill is entirely predicated on EVERYONE in ladder having the exact same skill level. Enter people of differing skill levels and all of a sudden there will be much more rating change.

The current system actually works pretty damn effectively for determining influence in a game. We've seen it time and time before. People who aren't appropriately rated essentially have control of ladder (which is the real fault of ladder, and is what I'm pretty sure nobo is mostly concerned about). That includes people who hold ratings well above your 1400 rating which you say is inaccurate.

Ball'ns smurf account instantly shot up the rankings and then hovered around rank 50 in ladder as he played different planes . When he didn't care and decided to play other planes, his team lost. When he did care and played miranda his team won. People got tired of this and complained. He got irritated by those complaints and decided to play solely randa. Within a day he had gotten to the 10-20 range.

Another instance distinct in my mind was Goose apparently letting someone else play his account for a couple of days when he was ranked in the 20 range. The person on his account wasn't playing at the same skill level and lost something like 27 out of 30 games. When Goose subsequently started playing normally he won 19 out of 23 games. His rating trended very quickly right back to where he was before. He had full control of those games with a rating well above what yours currently is.

Goose's page from those days - look towards the bottom:
http://64.191.124.60/matchlist.php?i...&sort=played_d

You aren't being held down by ladder's flaws, or your history. Your rating is nearly as if you had just entered ladder for the first time. Most people when they enter ladder for the first time with your rating either win or lose lots of games as their rating adjusts to their actual skill level. If a player is lowly rated and at any point they have a sudden increase of skill then they'll begin winning a higher percentage of their games until they reach their appropriate rating.

CCN · #68 02-03-2011, 01:26 PM

without restricting plane setups that problem is hard to fix.

Urpee · #69 02-03-2011, 01:49 PM

Matt, no offense but what happens a lot here is that anecdotes are taken to be more valuable than actual math.

I have given an actual proof that the ladder does not converge under certain conditions and that this is a property of the design of the ladder. That neither implies nor means to imply that people cannot raise or drop. What it does prove without any shadow of a doubt is that people playing at a 50% is no sign at all that they are at a rating that reflects their performance.

It doesn't really matter if one can give exceptional cases where people rise drastically. What does matter is that this is what the model actually does.

In any model proposed balln and goose would perform well, wins would still be scored positively, and losses would be scored negatively, so I'm not sure what exactly you are trying to say except repeat the disproven mantra that the scores converge properly.

But yes, I should never have mentioned my own case, because it allows people to make it about me. It isn't.

Look if you like the system as is, that's fine. Here is what I have already said about it:

"I don't want to give the impression that I think that the current system is awful. I don't think it is. I think it's pretty good."

It's kind of hard to try to have nuanced discussion when people climb on barricades the moment one tries to point out properties of what is there.

A Nipple · #70 02-03-2011, 02:09 PM

Quote:

Originally Posted by elxir

but uhh, nipple is like the second best bomber on that list sooo obviously something is working

well, I did crap the first few hundred games and learned most planes from scratch in allot more as it was the best environment to practise them. I'll get my number one bomber back for next APL hopefully if I'm not too busy =]

VipMattMan · #71 02-03-2011, 02:24 PM

Quote:

Originally Posted by Urpee

Matt, no offense but what happens a lot here is that anecdotes are taken to be more valuable than actual math.

I have given an actual proof that the ladder does not converge under certain conditions and that this is a property of the design of the ladder. That neither implies nor means to imply that people cannot raise or drop. What it does prove without any shadow of a doubt is that people playing at a 50% is no sign at all that they are at a rating that reflects their performance.

It doesn't really matter if one can give exceptional cases where people rise drastically. What does matter is that this is what the model actually does.

In any model proposed balln and goose would perform well, wins would still be scored positively, and losses would be scored negatively, so I'm not sure what exactly you are trying to say except repeat the disproven mantra that the scores converge properly.

But yes, I should never have mentioned my own case, because it allows people to make it about me. It isn't.

Look if you like the system as is, that's fine. Here is what I have already said about it:

"I don't want to give the impression that I think that the current system is awful. I don't think it is. I think it's pretty good."

It's kind of hard to try to have nuanced discussion when people climb on barricades the moment one tries to point out properties of what is there.

Under very specific conditions the convergence isn't there. Due to random dynamics, it's likely that over time the issues of potential non-convergence work themselves out. I don't think you've proven that the convergence doesn't exist in ladder.

The 50% rule is more of an issue of likelihood. It's likely that if you've played a large number of games and you're only winning about 50% of your games, then your skill and rating have probably settled.

Obviously if you win 50% of your games for months at 1500 rating, then you have an improvement of skill you're going to increase your rating. Depending on your number of games, your overall win percentage may only increase very very slightly during that time. At some point you're going to hit a rating wall dependent on your skill and then you're back to winning approximately 50% of your games.

Whether the 50% rule "exists" goes back to your belief that ladder has a non-convergent behavior and that ratings are inherently inaccurate. I suppose we could go in circles all day about that.

Urpee · #72 02-03-2011, 02:39 PM

Well I have never claimed that convergence doesn't exist. All I have done is give an actual proof that people's claim that they converge to their actual rating isn't trivially true.

It's not my belief that ladder has non-convergent behavior. It's obvious that it doesn't. That's exactly what I have shown. There is a rather massive difference between the two claims.

To be precise there are two equilibria to be had. One is an actual rate ceiling as you call it. The other, and this I have shown, is an actually balanced team. Now the system encourages balanced teams, so I won't just accept that what I say is some sort of rare singular case and that it is easily discrupted by randomness. You'd have to show that.

You given a hand-wavy argument that randomness will disturb those cases. I don't actually disagree with the rough notion. But it is just plainly not correct to say that if you see someone performing at 50% for a while that they are their true rating and this I have shown.

If you want to give an argument that under a certain kind of likely random disturbance of team balance this is reliably broken, I'd be interested in seeing that proof. But I won't just accept it because you say so or because you have some fitting exceptional anecdotes.

blln4lyf · #73 02-03-2011, 02:49 PM

Quote:

Originally Posted by Urpee

Matt, no offense but what happens a lot here is that anecdotes are taken to be more valuable than actual math.

I have given an actual proof that the ladder does not converge under certain conditions and that this is a property of the design of the ladder. That neither implies nor means to imply that people cannot raise or drop. What it does prove without any shadow of a doubt is that people playing at a 50% is no sign at all that they are at a rating that reflects their performance.

It doesn't really matter if one can give exceptional cases where people rise drastically. What does matter is that this is what the model actually does.

In any model proposed balln and goose would perform well, wins would still be scored positively, and losses would be scored negatively, so I'm not sure what exactly you are trying to say except repeat the disproven mantra that the scores converge properly.

But yes, I should never have mentioned my own case, because it allows people to make it about me. It isn't.

Look if you like the system as is, that's fine. Here is what I have already said about it:

"I don't want to give the impression that I think that the current system is awful. I don't think it is. I think it's pretty good."

It's kind of hard to try to have nuanced discussion when people climb on barricades the moment one tries to point out properties of what is there.

What your saying would never happen though...at least not anywhere near the extend in which you think it would. Seriously dude I'm not trying to make this about you, nor is matt, or andy. We strongly disagree with you though, and it seems neither of us will budge, but please get off your high horse, its getting old as schit. As for converging, that is why nobo wants to introduce K, as a high K value when you first join ladder would help you reach it, along with it increasing during streaks, etc.

Urpee · #74 02-03-2011, 03:08 PM

I thought the intent of the thread was about math discussions. Now a few posters have tried to contribute to math modeling. I think this is what it should be.

Frankly it matters if people like the system and believe strongly in it, but I really am not arguing this here.

This isn't a matter of budging. If there are mistakes or ideas that can be demonstrated to be wrong they can be corrected.

But I don't have to have the discussion. If devs rather select their input that's fine. Perhaps this never should have been a thread debate to begin with. It's bound to mix interest with analysis.

VipMattMan · #75 02-03-2011, 03:08 PM

Quote:

Originally Posted by Urpee

Well I have never claimed that convergence doesn't exist. All I have done is give an actual proof that people's claim that they converge to their actual rating isn't trivially true.

It's not my belief that ladder has non-convergent behavior. It's obvious that it doesn't. That's exactly what I have shown. There is a rather massive difference between the two claims.

To be precise there are two equilibria to be had. One is an actual rate ceiling as you call it. The other, and this I have shown, is an actually balanced team. Now the system encourages balanced teams, so I won't just accept that what I say is some sort of rare singular case and that it is easily discrupted by randomness. You'd have to show that.

You given a hand-wavy argument that randomness will disturb those cases. I don't actually disagree with the rough notion. But it is just plainly not correct to say that if you see someone performing at 50% for a while that they are their true rating and this I have shown.

If you want to give an argument that under a certain kind of likely random disturbance of team balance this is reliably broken, I'd be interested in seeing that proof. But I won't just accept it because you say so or because you have some fitting exceptional anecdotes.

The randomness = different teams, the fact that literally NOONE has exactly the same skill set/response methodology as any other person, and the fact that noone is ever 100% accurately rated.

It seems as if you're suggesting that rating has literally no skill dynamic whatsoever. If you're suggesting that despite personal experience and despite seeing the algorithms used, i don't know what to say.

All i can tell you is that the game sure does get alot easier any time my rating dips down too much, and alot harder when it goes too high. I'm sure that 99% of other ladder players have the same experience. This difference in experience directly correlates to my changing rating. As i've had this experience over a long period of time, i've bounced around the same rating range, and the more games i play the closer my win % comes to 50%.

elxir · #76 02-03-2011, 05:21 PM

i play time anchor like most of the time even though it's like my fifth best setup

so that would skew things

Urpee · #77 02-03-2011, 05:34 PM

Quote:

Originally Posted by VipMattMan

It seems as if you're suggesting that rating has literally no skill dynamic whatsoever.

I'm saying no such thing at all.

shrode · #78 02-03-2011, 05:58 PM

what planes people play should not be factored into ladder (as far as the code goes, or limiting # of planes). One thing that helps people rise is the ability to play multiple planes and do whatever is needed to help the team win. In ball, i'm a fairly good thermo and do it when my team needs me to. Having that ability has helped me win multiple games that i would have lost. This should continue to be rewarded in the next ladder system.

Rainmaker · #79 02-03-2011, 09:22 PM

Quote:

Originally Posted by Tekn0

I cannot agree with this more.

In fact why not club it into groups? Like say player X plays Loopy 50% & Miranda 50% of the time, he is considered an offence player, then player Y plays (bomber or whale or biplane) he can be considered defense and support. Some players will be more or less same % of all planes but then they can be placed on either team based on their rating.

Then balance teams to have 2 or 3 offense players. Of course there can be nothing done if all 12 players are play offense but clearly that usually does not happen.

Yes I know the "people should play all planes or one heavy, one light plane" argument but in reality not everyone are equally good at all planes.

Point is, the auto balance algorithm can take these statistics into account too.

Quote:

Originally Posted by Stormich

This isn't as simple as it sounds+your solution only works on ball.

Stormich is right.
You can't use previous statistics of plane use, to predict a player choice for a plane. I mean, you actually could do it, the problem is the effectiveness.
The likelihood is small, because the pick in ladder is situational. If there are already 2 explodets, its not likely that someone will pick a 3rd explodet, but a more agile plane.
I think my suggestion would work better in this case, if you want to start "forcing" players to play real "plane formations".
But as i said, it's only a suggestion, you might think that you want weird setups like 5 explodets team to be available.

Quote:

Originally Posted by blln4lyf

As for converging, that is why nobo wants to introduce K, as a high K value when you first join ladder would help you reach it, along with it increasing during streaks, etc.

I've seen that no one is making specific replies to my topics, either you are going over them because you think I'm right, or because you are ignoring them.
Either way, I've already proven this point before.
THE CURRENT SYSTEM ISN'T A CONVERGING ONE.
The current system works pretty much like any MMORPG ranking system, rating the players based on who stacks the biggest amount of neat wins (= total wins - total looses).
The 50% win, when you are at your rating is pure speculation and it's in no way reflecting the real ratings.

On the K value: rather than converging is divergent.
The problem of the original ELO, is that everyone starts at 1500, and to make an educated guess you would need a good amount of games (nearly 100 is good for a 1st approach).
So, an easy way (but wtih HIGH uncertainty) you would use a high K, so his rating varies a lot, after you would gradually make K tend to a real sensitive value to the scale. This way you will polish until you reach a steady rating. This is accomplished, because the player will have good odds against people below his rating, but odds against people above his rating, keeping him always on a +/-50 rating. (if his real rating is 1764; his current could be 1730 +/-50).
The smaller the K, the more accurate is the guess, but the more games you need to have that low uncertainty guess.
The higher the K, the less accurate is the guess (the value could have a +/-100 rating) but the faster is achieved.

chess example:
A new player is given the rating of 1000.
Through his first games he will have a high K value, K=32
After he reaches the 2100 rating, he is given K=24
After he reaches the 2400 rating, he is given K=16

This helps have a "zoom" effect on high ranking values, making the spectrum broader. In the high rankings, a difference of 5 points represents a much greater skill, that a different of 5 points on the 1000 rating.
This is because of K, but also because the skill is measured with a distribution that isn't linear; nor is the comparison used (E formula).

Mind you, the K value can be chosen arbitrarily. Usually it depends on how accurate you want to be with the rating system, and how many games are played.
Chess, is a kind of sport that the player can have a lot of matches through his career so you are able to pick lower values for K.
Also, K a smaller K is less "sensitive" to player improvements, like new tactics (altitude=new plane setups), new technology achieved(altitude=nerf or buffs), etc.
Golf is a sport that doesn't have THAT many "matches" (courses) compared to chess, hence, you have to pick a higher value.

VipMattMan · #80 02-04-2011, 02:40 AM

Quote:

Originally Posted by IntoTheWalls

I've seen that no one is making specific replies to my topics, either you are going over them because you think I'm right, or because you are ignoring them.
Either way, I've already proven this point before.
THE CURRENT SYSTEM ISN'T A CONVERGING ONE.
The current system works pretty much like any MMORPG ranking system, rating the players based on who stacks the biggest amount of neat wins (= total wins - total looses).
The 50% win, when you are at your rating is pure speculation and it's in no way reflecting the real ratings.

Here:
You have a 4 player game. 2 people per team. 2 veteran players have 3,000 rating each. 2 new players have just entered ladder, each with 1500 rating. It turns out that one of these players actually has the skill of a 3,000 rating player, and the other does not. Every game gives or costs you 25 points to your overall rating.

We'll say that to equalize team rating ladder assigns one of these 1500 rating players to each 3k player 100% of the time, but switches which player gets which every other game.

Over time the 1500 player that had a 3k skill level trends towards a 3,000 rating while the other actual 3,000 level players maintain their 50% win rate. During this time the "bad" 1500 rating player trends downwards.

Eventually the "good" 1500 rating player achieves a 3,000 skill rating and can now join the ranks of the other 3k rated players and potentially have the "bad" player assigned to him.

Now - this is where your argument comes in. Say that over time the "bad" player gained enough skill to be as skilled as one of the 3,000 rating players. But everyone's winning 50% of their games at this point. That means that convergence is impossible from a mathematical standpoint and that the previously bad player can NEVER attain a 3,000 rating, despite the fact that he's as skilled as the other players.

This is how you view ladder in its current form, and it makes "sense" from a purely mathematical standpoint. But it's wrong from a reality standpoint.

Throw in 6 player teams, an endlessly changing series of people, players of differing skill levels, some variance of slightly inaccurate rating, and a multitude of other variables, and that previously bad player will have full opportunity to achieve a higher rating.

That human variable is where the convergence in ladder exists, and it appears to be very effective. It may never be something you can quite boil down to "two players with the exact same skill level".