TF2 6s Skill Model

#1

gnat

0 Frags – +

I built a TF2 6s Premier/Invite skill model based on the matches hosted on Liquidpedia. I thought some of the results were interesting so I'm sharing it in this post. You can access the CSVs and python code on github here.

Data: I extracted 4,373 games from Liquidpedia from 2008 to 2024 (not including the current in-progress seasons) at Premier/Invite level played between international LANs, regional LANS, ETF2L, RGL, ESEA and Ozfortress seasons. (basically every S and A tier event from here: https://liquipedia.net/teamfortress/Portal:Tournaments, not counting AsiaFortress and SA TF2).

There may be some cases where mercs/subs were used but given the amount of match data I haven't had time to check each one. There were a lot of seasons where regular season data was missing, but what is there is a good start. Here is a representation of the spread of games extracted for the model, by year:

https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Distribution%20of%20Matches%20by%20Year.png

Ranking Skill: I applied two methods of ranking player skill, an Elo method (info here) and an OpenSkill method (info here). The elo method had an 80.55% accuracy in predicting match results whilst the OpenSkill method had an 80.86% match prediction accuracy. It is pretty close, but given the OpenSkill method was slightly better, i'll use the results from that model for the remaining post.

I ranked players based on their player id/class combo. So there are many players who are listed multiple times on different classes.

A note about OpenSkill - the model assigns each player an average rating and a confidence interval. As more games are played, the player's average rating is adjusted whilst the confidence interval gets smaller. These two figures that represents a player's rating can be combined for an Ordinal Skill, used for ranking.

The full rankings are on Github, but here is the overall top 20 current ordinal skill rankings from the model:

https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Current%20Skill%20-%20Top%2020.png

And here is the top 20 peak ordinal skill rankings from all time:

https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Peak%20Skill%20-%20Top%2020.png

I definitely think some of these ratings are due to the match data available, but insightful to see what made the list. The full list is available here. I included, as best as i could, region codes for each player so you can filter on that.

Role Power: I found this discussion from 4yrs ago interesting around role influence in TF2 6s outcomes and I have used this model to have a go at a more data-informed answer. For each match in the dataset and using the current skill ratings at the time the match was played, i used the model to predict match outcome based only on a single role vs role match-up (i.e. combo scout vs combo scout). Here are the results, in priority order:

combo scout match-up predicted 72.4088% of matches
pocket soldier match-up predicted 71.7081% of matches
demo match-up predicted 71.6115% of matches
medic match-up predicted 71.4182% of matches
flank scout match-up predicted 70.8142% of matches
roamer soldier match-up predicted 70.8142% of matches

I don't think huge surprises that combo scout and demo are up there, but i was surprised by the high pocket influence. This could perhaps be skewing from games in earlier metas that were more pocket-centric. I did try to graph role-influence over time (year-by-year) but it didn't come out well so i haven't included in this post.

Best Teams of All Time: Since i found this video interesting ranking all-time TF2 teams, I also used the model to combine individual player ratings to define an overall team rating. Here are the top-20 of all time:

https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Team%20Skill%20-%20Top%2020.png

End: That is all i have to share today. If anyone has access to more match results than what is available on Liquidpedia from ETF2L, ESEA, RGL and Ozfortress I'd be interested to include them in for a potential version 2 ranking. Please get in touch.

Otherwise, interested to hear what people think.

I built a TF2 6s Premier/Invite skill model based on the matches hosted on Liquidpedia. I thought some of the results were interesting so I'm sharing it in this post. You can access the CSVs and python code on github [url=https://github.com/alex-p-6/TF2_Skill_Engine/tree/main]here[/url].

[b]Data:[/b] I extracted 4,373 games from Liquidpedia from 2008 to 2024 (not including the current in-progress seasons) at Premier/Invite level played between international LANs, regional LANS, ETF2L, RGL, ESEA and Ozfortress seasons. (basically every S and A tier event from here: https://liquipedia.net/teamfortress/Portal:Tournaments, not counting AsiaFortress and SA TF2).

There may be some cases where mercs/subs were used but given the amount of match data I haven't had time to check each one. There were a lot of seasons where regular season data was missing, but what is there is a good start. Here is a representation of the spread of games extracted for the model, by year: [img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Distribution%20of%20Matches%20by%20Year.png[/img]

[b]Ranking Skill: [/b] I applied two methods of ranking player skill, an Elo method ([url=https://www.unrankedsmurfs.com/blog/what-is-elo-in-league-of-legends-and-how-does-it-work]info here[/url]) and an OpenSkill method ([url=https://openskill.me]info here[/url]). The elo method had an 80.55% accuracy in predicting match results whilst the OpenSkill method had an 80.86% match prediction accuracy. It is pretty close, but given the OpenSkill method was slightly better, i'll use the results from that model for the remaining post.

I ranked players based on their player id/class combo. So there are many players who are listed multiple times on different classes.

A note about OpenSkill - the model assigns each player an [i]average[/i] rating and a [i]confidence interval[/i]. As more games are played, the player's average rating is adjusted whilst the confidence interval gets smaller. These two figures that represents a player's rating can be combined for an Ordinal Skill, used for ranking.

The full rankings are on Github, but here is the overall top 20 [i]current ordinal skill rankings[/i] from the model:
[img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Current%20Skill%20-%20Top%2020.png[/img]

And here is the top 20 [i]peak ordinal skill rankings[/i] from all time:
[img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Peak%20Skill%20-%20Top%2020.png[/img]

I definitely think some of these ratings are due to the match data available, but insightful to see what made the list. The full list is [url=https://github.com/alex-p-6/TF2_Skill_Engine/blob/main/player_open_skill.csv]available here[/url]. I included, as best as i could, region codes for each player so you can filter on that.

[b]Role Power:[/b] I found [url=https://www.teamfortress.tv/56401/6s-role-carry-potential-order-discussion]this discussion[/url] from 4yrs ago interesting around role influence in TF2 6s outcomes and I have used this model to have a go at a more data-informed answer. For each match in the dataset and using the current skill ratings at the time the match was played, i used the model to predict match outcome based only on a single role vs role match-up (i.e. combo scout vs combo scout). Here are the results, in priority order:
[list]
[*]combo scout match-up predicted 72.4088% of matches
[*]pocket soldier match-up predicted 71.7081% of matches
[*]demo match-up predicted 71.6115% of matches
[*]medic match-up predicted 71.4182% of matches
[*]flank scout match-up predicted 70.8142% of matches
[*]roamer soldier match-up predicted 70.8142% of matches
[/list]

I don't think huge surprises that combo scout and demo are up there, but i was surprised by the high pocket influence. This could perhaps be skewing from games in earlier metas that were more pocket-centric. I did try to graph role-influence over time (year-by-year) but it didn't come out well so i haven't included in this post.

[b]Best Teams of All Time:[/b] Since i found [url=https://www.youtube.com/watch?v=ue3MhmtYuPw]this video[/url] interesting ranking all-time TF2 teams, I also used the model to combine individual player ratings to define an overall team rating. Here are the top-20 of all time: [img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Team%20Skill%20-%20Top%2020.png[/img]

[b]End:[/b] That is all i have to share today. If anyone has access to more match results than what is available on Liquidpedia from ETF2L, ESEA, RGL and Ozfortress I'd be interested to include them in for a potential version 2 ranking. Please get in touch.

Otherwise, interested to hear what people think.

#2

YeeHaw

44 Frags – +

finally, we combined every log

#3

plum

3 Frags – +

Would be interesting to see some subsets such as looking within each region (so not all froyotech lol), or looking at different eras to see when scout got a slight edge. I also have a feeling that openskill/elo might not be ideal for analyzing these types of games. At a high level this is because they don't really have a method of capuring the "team" element of people playing together and building chemistry (there's an argument that maybe it does this implicitly). They are more suited for pugs and similar.

Thank you for doing this though. If anyone is interested in this topic there are some previous threads and attempts here:
https://tf2metrics.wordpress.com/about/
https://www.teamfortress.tv/22042/tf2-player-rankings-official-thread (you need to use archive.is for the links)

Would be interesting to see some subsets such as looking within each region (so not all froyotech lol), or looking at different eras to see when scout got a slight edge. I also have a feeling that openskill/elo might not be ideal for analyzing these types of games. At a high level this is because they don't really have a method of capuring the "team" element of people playing together and building chemistry (there's an argument that maybe it does this implicitly). They are more suited for pugs and similar.

Thank you for doing this though. If anyone is interested in this topic there are some previous threads and attempts here:
[url=https://tf2metrics.wordpress.com/about/]https://tf2metrics.wordpress.com/about/[/url]
https://www.teamfortress.tv/22042/tf2-player-rankings-official-thread[url=https://www.teamfortress.tv/22042/tf2-player-rankings-official-thread][/url] (you need to use archive.is for the links)

#4

MarioManz

47 Frags – +

gnatHere are the top-20 of all time: https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Team%20Skill%20-%20Top%2020.png

https://th.bing.com/th/id/OIP.WUwyELdo3Nv9s-EiySR-ywHaFn?rs=1&pid=ImgDetMain

[quote=gnat]
Here are the top-20 of all time: [img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Team%20Skill%20-%20Top%2020.png[/img]
[/quote]
[img]https://th.bing.com/th/id/OIP.WUwyELdo3Nv9s-EiySR-ywHaFn?rs=1&pid=ImgDetMain[/img]

#5

gnat

5 Frags – +

plumWould be interesting to see some subsets such as looking within each region (so not all froyotech lol), or looking at different eras to see when scout got a slight edge.

I like your point regarding subsets. I've worked up top-10 peak ranks by region and class.

Top 10 Peak Rank by Region:

https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Region%20Top%2010%20Peak.png

Top 10 Peak Rank by Class:

https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Class%20Top%2010%20Peak.png

To look at when scout got an edge likely requires more match results or maybe some fresh eyes. I struggled to pull out a decent way to represent role power over time with the match data available.

plumAt a high level this is because they don't really have a method of capturing the "team" element of people playing together and building chemistry (there's an argument that maybe it does this implicitly). They are more suited for pugs and similar.

You are very right, ranking systems such as this can only go so far but still brings out some interesting findings.

[quote=plum]Would be interesting to see some subsets such as looking within each region (so not all froyotech lol), or looking at different eras to see when scout got a slight edge.[/quote]

I like your point regarding subsets. I've worked up top-10 peak ranks by region and class.

Top 10 Peak Rank by Region: [img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Region%20Top%2010%20Peak.png[/img]

Top 10 Peak Rank by Class: [img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Class%20Top%2010%20Peak.png[/img]

To look at when scout got an edge likely requires more match results or maybe some fresh eyes. I struggled to pull out a decent way to represent role power over time with the match data available.

[quote=plum]At a high level this is because they don't really have a method of capturing the "team" element of people playing together and building chemistry (there's an argument that maybe it does this implicitly). They are more suited for pugs and similar. [/quote]

You are very right, ranking systems such as this can only go so far but still brings out some interesting findings.

#6

dbk

6 Frags – +

gnatTop 10 Peak Rank by Class: https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Class%20Top%2010%20Peak.png

wild to censor one of them but not censor nur**y
edit: good work

[quote=gnat]Top 10 Peak Rank by Class: [img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Class%20Top%2010%20Peak.png[/img][/quote]
wild to censor one of them but not censor nur**y
edit: good work

#7

hannah

5 Frags – +

I created some quick histograms

Current skill(100 bins), mean: ~2.039, standard dev: ~6.277

Show Content

https://i.imgur.com/mLSj1YF.png

ELO (200 bins), mean: ~1200.956 (higher than the default 1200 ELO), standard dev: ~40.569 (the ELO distribution isn't very normal)

Show Content

https://i.imgur.com/8rEneQL.png

NA open skill (75 bins), mean: ~3.438, standard dev: ~8.186. The (relatively) higher standard deviation suggests that there are a larger number of players who are much better or worse than the average player.

Show Content

https://i.imgur.com/mkyGoUF.png

EU open skill (75 bins), mean: ~3.581, standard dev: ~6.776

Show Content

https://i.imgur.com/0U32UHR.png

AU/NZ open skill (75 bins), mean: ~1.737, standard dev: ~5.435. The (relatively) lower standard deviation would suggest that most prem players in AU/NZ are closer to the "average player" than in other regions.

Show Content

https://i.imgur.com/tHrNIeM.png

if anyone wants some other kind of histogram to be made please let me know and I will try to make it

I created some quick histograms

Current skill(100 bins), mean: ~2.039, standard dev: ~6.277
[spoiler][img]https://i.imgur.com/mLSj1YF.png[/img][/spoiler]

ELO (200 bins), mean: ~1200.956 (higher than the default 1200 ELO), standard dev: ~40.569 (the ELO distribution isn't very normal)
[spoiler][img]https://i.imgur.com/8rEneQL.png[/img][/spoiler]

NA open skill (75 bins), mean: ~3.438, standard dev: ~8.186. The (relatively) higher standard deviation suggests that there are a larger number of players who are much better or worse than the average player.
[spoiler][img]https://i.imgur.com/mkyGoUF.png[/img][/spoiler]

EU open skill (75 bins), mean: ~3.581, standard dev: ~6.776
[spoiler][img]https://i.imgur.com/0U32UHR.png[/img][/spoiler]

AU/NZ open skill (75 bins), mean: ~1.737, standard dev: ~5.435. The (relatively) lower standard deviation would suggest that most prem players in AU/NZ are closer to the "average player" than in other regions.
[spoiler][img]https://i.imgur.com/tHrNIeM.png[/img][/spoiler]

if anyone wants some other kind of histogram to be made please let me know and I will try to make it

#8

hannah

6 Frags – +

I also noticed that the same player might have multiple entries if they had their name changed. For example, I've had 2 different RGL names under my time playing in RGL invite (hannah and hannahburger) and I have two different entries with two different ELO's as a result

#9

gnat

3 Frags – +

dbkgnatTop 10 Peak Rank by Class: https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Class%20Top%2010%20Peak.pngwild to censor one of them but not censor nursey

Sorry I'm dumb. Fixed

[quote=dbk][quote=gnat]Top 10 Peak Rank by Class: [img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Class%20Top%2010%20Peak.png[/img][/quote]
wild to censor one of them but not censor nursey[/quote]

Sorry I'm dumb. Fixed