1 pointby anakin877 hours ago1 comment
  • anakin877 hours ago
    Hi HN, I've been spending some time lately trying to build Reinforcement Learning Environments and training small language models and wanted to share a little course I put together based on my experiments.

    Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs. Now with RLVR and GRPO, we can make models learn through trial and error in dynamic environments, which are software artifacts.

    But how to effectively build RL environments?

    In the repo, I cover:

    - Mapping core RL concepts (Agents, Environments) to the LLM domain.

    - Using the Verifiers open-source library to construct single-turn, multi-turn, and tool-use environments.

    - Hands-on: taking a small language model (LiquidAI's LFM2-2.6B) and turning it into a Tic-Tac-Toe master that beats GPT-5-mini. Build the game Environment, ese it to generate synthetic data for SFT warm-up, then Group-based Reinforcement Learning.

    ---

    Links

    Course: https://github.com/anakin87/llm-rl-environments-lil-course

    Video walkthrough: https://www.youtube.com/watch?v=71V3fTaUp2Q

    Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictacto...

    Datasets and Models on HF: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-...

    ---

    I'm fascinated by the idea of building these "little worlds" where LLMs can learn, so I hope it's useful.

    Feel free to share opinions...