The most interesting part for me was reviewing the full conversation logs afterward to figure out whether my steering actually helped or hurt. Turns out about 4 of my 24 interventions were counterproductive and the agent solved the last two phases completely on its own.
The repo has the full writeup, all the exploit scripts, and a table rating every single human message I sent: https://github.com/kafkasl/ctf
Happy to answer questions about the process, the agent, or the competition.