Upvote Upvoted 17 Downvote Downvoted
TF2 6s Skill Model
1
#1
0 Frags +

I built a TF2 6s Premier/Invite skill model based on the matches hosted on Liquidpedia. I thought some of the results were interesting so I'm sharing it in this post. You can access the CSVs and python code on github here.

Data: I extracted 4,373 games from Liquidpedia from 2008 to 2024 (not including the current in-progress seasons) at Premier/Invite level played between international LANs, regional LANS, ETF2L, RGL, ESEA and Ozfortress seasons. (basically every S and A tier event from here: https://liquipedia.net/teamfortress/Portal:Tournaments, not counting AsiaFortress and SA TF2).

There may be some cases where mercs/subs were used but given the amount of match data I haven't had time to check each one. There were a lot of seasons where regular season data was missing, but what is there is a good start. Here is a representation of the spread of games extracted for the model, by year:

https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Distribution%20of%20Matches%20by%20Year.png

Ranking Skill: I applied two methods of ranking player skill, an Elo method (info here) and an OpenSkill method (info here). The elo method had an 80.55% accuracy in predicting match results whilst the OpenSkill method had an 80.86% match prediction accuracy. It is pretty close, but given the OpenSkill method was slightly better, i'll use the results from that model for the remaining post.

I ranked players based on their player id/class combo. So there are many players who are listed multiple times on different classes.

A note about OpenSkill - the model assigns each player an average rating and a confidence interval. As more games are played, the player's average rating is adjusted whilst the confidence interval gets smaller. These two figures that represents a player's rating can be combined for an Ordinal Skill, used for ranking.

The full rankings are on Github, but here is the overall top 20 current ordinal skill rankings from the model:

https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Current%20Skill%20-%20Top%2020.png

And here is the top 20 peak ordinal skill rankings from all time:

https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Peak%20Skill%20-%20Top%2020.png

I definitely think some of these ratings are due to the match data available, but insightful to see what made the list. The full list is available here. I included, as best as i could, region codes for each player so you can filter on that.

Role Power: I found this discussion from 4yrs ago interesting around role influence in TF2 6s outcomes and I have used this model to have a go at a more data-informed answer. For each match in the dataset and using the current skill ratings at the time the match was played, i used the model to predict match outcome based only on a single role vs role match-up (i.e. combo scout vs combo scout). Here are the results, in priority order:

  • combo scout match-up predicted 72.4088% of matches
  • pocket soldier match-up predicted 71.7081% of matches
  • demo match-up predicted 71.6115% of matches
  • medic match-up predicted 71.4182% of matches
  • flank scout match-up predicted 70.8142% of matches
  • roamer soldier match-up predicted 70.8142% of matches

I don't think huge surprises that combo scout and demo are up there, but i was surprised by the high pocket influence. This could perhaps be skewing from games in earlier metas that were more pocket-centric. I did try to graph role-influence over time (year-by-year) but it didn't come out well so i haven't included in this post.

Best Teams of All Time: Since i found this video interesting ranking all-time TF2 teams, I also used the model to combine individual player ratings to define an overall team rating. Here are the top-20 of all time:

https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Team%20Skill%20-%20Top%2020.png

End: That is all i have to share today. If anyone has access to more match results than what is available on Liquidpedia from ETF2L, ESEA, RGL and Ozfortress I'd be interested to include them in for a potential version 2 ranking. Please get in touch.

Otherwise, interested to hear what people think.

I built a TF2 6s Premier/Invite skill model based on the matches hosted on Liquidpedia. I thought some of the results were interesting so I'm sharing it in this post. You can access the CSVs and python code on github [url=https://github.com/alex-p-6/TF2_Skill_Engine/tree/main]here[/url].

[b]Data:[/b] I extracted 4,373 games from Liquidpedia from 2008 to 2024 (not including the current in-progress seasons) at Premier/Invite level played between international LANs, regional LANS, ETF2L, RGL, ESEA and Ozfortress seasons. (basically every S and A tier event from here: https://liquipedia.net/teamfortress/Portal:Tournaments, not counting AsiaFortress and SA TF2).

There may be some cases where mercs/subs were used but given the amount of match data I haven't had time to check each one. There were a lot of seasons where regular season data was missing, but what is there is a good start. Here is a representation of the spread of games extracted for the model, by year: [img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Distribution%20of%20Matches%20by%20Year.png[/img]

[b]Ranking Skill: [/b] I applied two methods of ranking player skill, an Elo method ([url=https://www.unrankedsmurfs.com/blog/what-is-elo-in-league-of-legends-and-how-does-it-work]info here[/url]) and an OpenSkill method ([url=https://openskill.me]info here[/url]). The elo method had an 80.55% accuracy in predicting match results whilst the OpenSkill method had an 80.86% match prediction accuracy. It is pretty close, but given the OpenSkill method was slightly better, i'll use the results from that model for the remaining post.

I ranked players based on their player id/class combo. So there are many players who are listed multiple times on different classes.

A note about OpenSkill - the model assigns each player an [i]average[/i] rating and a [i]confidence interval[/i]. As more games are played, the player's average rating is adjusted whilst the confidence interval gets smaller. These two figures that represents a player's rating can be combined for an Ordinal Skill, used for ranking.

The full rankings are on Github, but here is the overall top 20 [i]current ordinal skill rankings[/i] from the model:
[img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Current%20Skill%20-%20Top%2020.png[/img]

And here is the top 20 [i]peak ordinal skill rankings[/i] from all time:
[img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Peak%20Skill%20-%20Top%2020.png[/img]

I definitely think some of these ratings are due to the match data available, but insightful to see what made the list. The full list is [url=https://github.com/alex-p-6/TF2_Skill_Engine/blob/main/player_open_skill.csv]available here[/url]. I included, as best as i could, region codes for each player so you can filter on that.

[b]Role Power:[/b] I found [url=https://www.teamfortress.tv/56401/6s-role-carry-potential-order-discussion]this discussion[/url] from 4yrs ago interesting around role influence in TF2 6s outcomes and I have used this model to have a go at a more data-informed answer. For each match in the dataset and using the current skill ratings at the time the match was played, i used the model to predict match outcome based only on a single role vs role match-up (i.e. combo scout vs combo scout). Here are the results, in priority order:
[list]
[*]combo scout match-up predicted 72.4088% of matches
[*]pocket soldier match-up predicted 71.7081% of matches
[*]demo match-up predicted 71.6115% of matches
[*]medic match-up predicted 71.4182% of matches
[*]flank scout match-up predicted 70.8142% of matches
[*]roamer soldier match-up predicted 70.8142% of matches
[/list]

I don't think huge surprises that combo scout and demo are up there, but i was surprised by the high pocket influence. This could perhaps be skewing from games in earlier metas that were more pocket-centric. I did try to graph role-influence over time (year-by-year) but it didn't come out well so i haven't included in this post.

[b]Best Teams of All Time:[/b] Since i found [url=https://www.youtube.com/watch?v=ue3MhmtYuPw]this video[/url] interesting ranking all-time TF2 teams, I also used the model to combine individual player ratings to define an overall team rating. Here are the top-20 of all time: [img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Team%20Skill%20-%20Top%2020.png[/img]

[b]End:[/b] That is all i have to share today. If anyone has access to more match results than what is available on Liquidpedia from ETF2L, ESEA, RGL and Ozfortress I'd be interested to include them in for a potential version 2 ranking. Please get in touch.

Otherwise, interested to hear what people think.
2
#2
8 Frags +

finally, we combined every log

finally, we combined every log
3
#3
0 Frags +

Would be interesting to see some subsets such as looking within each region (so not all froyotech lol), or looking at different eras to see when scout got a slight edge. I also have a feeling that openskill/elo might not be ideal for analyzing these types of games. At a high level this is because they don't really have a method of capuring the "team" element of people playing together and building chemistry (there's an argument that maybe it does this implicitly). They are more suited for pugs and similar.

Thank you for doing this though. If anyone is interested in this topic there are some previous threads and attempts here:
https://tf2metrics.wordpress.com/about/
https://www.teamfortress.tv/22042/tf2-player-rankings-official-thread (you need to use archive.is for the links)

Would be interesting to see some subsets such as looking within each region (so not all froyotech lol), or looking at different eras to see when scout got a slight edge. I also have a feeling that openskill/elo might not be ideal for analyzing these types of games. At a high level this is because they don't really have a method of capuring the "team" element of people playing together and building chemistry (there's an argument that maybe it does this implicitly). They are more suited for pugs and similar.

Thank you for doing this though. If anyone is interested in this topic there are some previous threads and attempts here:
[url=https://tf2metrics.wordpress.com/about/]https://tf2metrics.wordpress.com/about/[/url]
https://www.teamfortress.tv/22042/tf2-player-rankings-official-thread[url=https://www.teamfortress.tv/22042/tf2-player-rankings-official-thread][/url] (you need to use archive.is for the links)
4
#4
1 Frags +
gnatHere are the top-20 of all time: https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Team%20Skill%20-%20Top%2020.png

https://th.bing.com/th/id/OIP.WUwyELdo3Nv9s-EiySR-ywHaFn?rs=1&pid=ImgDetMain

[quote=gnat]
Here are the top-20 of all time: [img]https://raw.githubusercontent.com/alex-p-6/TF2_Skill_Engine/refs/heads/main/Images/Team%20Skill%20-%20Top%2020.png[/img]
[/quote]
[img]https://th.bing.com/th/id/OIP.WUwyELdo3Nv9s-EiySR-ywHaFn?rs=1&pid=ImgDetMain[/img]
Please sign in through STEAM to post a comment.