43 pointsby dev-experiments5 hours ago4 comments
  • nl2 hours ago
    If you are going to go to the bother of fine tuning for trivial problems like subject classification then I think you'll find Scikit Learn with a SGDClassifier on 2-grams will do probably just as well and be under 1MB for the trained classifier.

    You can train it in under a minute, and it will work perfectly well on embedded devices.

    Small LLMs are good choices for text classification in two cases:

    - If you next to provide in-context examples and classifier based on them.

    - Your classification goes beyond simple subject-type classifiers. For example, multiple choice question answering is classification where small LLM will work but traditional ML methods won't/

    • djsjajahan hour ago
      Not with 800 examples. If you are going to consider an ngram model, I think you are better off getting a frontier llm to write you an absurd regex.
  • deepsquirrelnet41 minutes ago
    If you want to go deeper on language models, try these project ideas:

    - Zero-shot encoders like tasksource or GliNER

    - Natural language inference: https://huggingface.co/blog/dleemiller/nli-xenc-ways-to-use

    - GRPO training

    - GEPA prompt tuning Qwen 0.6B (or GEPA, then GRPO)

    - Use an embedding model and train a classifier (MLP, logistic, svm)

    - Use a larger LLM to generate a synthetic dataset (beware of lack of diversity, mine "seed text" from real sources first)

    - Synthetically generate "hard examples" where more than one category may be valid and DPO tune your preferred responses

  • mickael-kerjeanan hour ago
    If you are interested in small language model to fine tune, gemma3:270m is quite interesting for its size
  • jszymborski2 hours ago
    I think the Qwen 0.6B is so cool. It is super fast and as illustrated here it has a clear niche, esp. when fine-tuned.

    I'm also interested in it as a student for distillation.