2 pointsby AronDaron7 hours ago1 comment
  • AronDaron7 hours ago
    Hey,

    I've been building side projects with Claude Code for a few months, but I'm completely new to fine-tuning — started experimenting maybe a week ago. From day one I wanted a GUI for the dataset side of the workflow, so this desktop app grew alongside my very first FT attempts.

    I know there are similar apps out there, but I wanted something simple that non-technical users could run with open-source models end-to-end.

    To sanity-check whether the datasets were actually useful I fine-tuned Qwen2.5-Coder-7B-Instruct on them and ran HumanEval / HumanEval+ (pass@1, 5 runs). Picked these benchmarks because they match the dataset's focus and run fast on my machine:

    - Base: 55.5% / 49.0% - FT V2 (1135 samples from the app): 60.0% / 54.0%

    Error bars don't overlap so it's at least not noise. Obviously HumanEval is only one slice — YMMV with other categories / criteria.

    Stack: Next.js 16 + FastAPI + SQLite, packaged as standalone binary (Win/Linux).

    Code: https://github.com/AronDaron/dataset-generator Fine-tuned model: https://huggingface.co/AronDaron/Qwen2.5-Coder-7B-Instruct-D... Datasets: https://huggingface.co/datasets/AronDaron/dataset-gen-v1 / https://huggingface.co/datasets/AronDaron/dataset-gen-v2

    Happy to hear feedback, especially if something doesn't work on your setup or if the approach misses something obvious — this is my first public tool release.