Hello,
During the "good guy"/"bad guy" list debacle, I was made aware that some were interested in a cleaned up version of the logs.tf dataset. I wrote a script to port the data to a simpler and more legible schema using sqlite3, then added the ability to update with fresh data from the API. It depends only on Python 3, which should be a common tool for data scientists, there are no external libraries required, for ease of use. The schema can be read at the beginning of the script.
https://github.com/ldesgoui/clone_logs
Clones of the first 2378000 logs processed in 100k chunks, as well as a csv dump only containing chat logs up until April 2019, can be found at: https://mega.nz/#F!l9oGiKCb!lTWT2RSkTYv-TJZb92_ksA (You don't need the script to make use of them, they're just sqlite3 databases)
Bests,
Computer nerd
Hello,
During the "good guy"/"bad guy" list debacle, I was made aware that some were interested in a cleaned up version of the logs.tf dataset. I wrote a script to port the data to a simpler and more legible schema using sqlite3, then added the ability to update with fresh data from the API. It depends only on Python 3, which should be a common tool for data scientists, there are no external libraries required, for ease of use. The schema can be read at the beginning of the script.
https://github.com/ldesgoui/clone_logs
Clones of the first 2378000 logs processed in 100k chunks, as well as a csv dump only containing chat logs up until April 2019, can be found at: https://mega.nz/#F!l9oGiKCb!lTWT2RSkTYv-TJZb92_ksA (You don't need the script to make use of them, they're just sqlite3 databases)
Bests,
Computer nerd
torrent, rclone, syncthing
if it's just a one-time snapshot i'd go for torrent, if it's gonna be updated occasionally probably syncthing
edit: i just realized it's an archive, how much bigger is it unpacked? if it's too big syncthing/rclone is probably out
eedit: 130mb per, i should really just read posts
torrent, rclone, syncthing
if it's just a one-time snapshot i'd go for torrent, if it's gonna be updated occasionally probably syncthing
edit: i just realized it's an archive, how much bigger is it unpacked? if it's too big syncthing/rclone is probably out
eedit: 130mb per, i should really just read posts
zenedit: i just realized it's an archive, how much bigger is it unpacked? if it's too big syncthing/rclone is probably out
eedit: 130mb per, i should really just read posts
Awkward use of words on my part, by archive I meant the entire history, it's uncompressed, I haven't waited through compression yet because it'd take ages, the few tests I ran gave an average of 4.25 compression ratio, so the grand total should arrive at 7 gigabytes.
I also realized I could use LTE to upload, I'll do that once I'm done catching up with the few months I'm missing (2320000 to 2370000)
EDIT: Uploaded a bunch to MEGA and updated OP
[quote=zen]edit: i just realized it's an archive, how much bigger is it unpacked? if it's too big syncthing/rclone is probably out
eedit: 130mb per, i should really just read posts[/quote]
Awkward use of words on my part, by archive I meant the entire history, it's uncompressed, I haven't waited through compression yet because it'd take ages, the few tests I ran gave an average of 4.25 compression ratio, so the grand total should arrive at 7 gigabytes.
I also realized I could use LTE to upload, I'll do that once I'm done catching up with the few months I'm missing (2320000 to 2370000)
EDIT: Uploaded a bunch to MEGA and updated OP
Is there an updated clone with the latest 600k? Or is the play to run
clone_logs.py --import archive/*.sqlite3
clone_logs.py --range 2_400_001 3_119_515
?
(Assuming I'm understanding the docs for the range command correctly).
sorry for necroposting.
Is there an updated clone with the latest 600k? Or is the play to run
[quote]
clone_logs.py --import archive/*.sqlite3
clone_logs.py --range 2_400_001 3_119_515
[/quote] ?
(Assuming I'm understanding the docs for the range command correctly).
sorry for necroposting.