1 pointby barefootsanders2 hours ago1 comment
  • barefootsanders2 hours ago
    OP here. I built skillthis.ai, a tool that takes a description of your professional expertise and generates a Claude Code skill file (a markdown prompt file that customizes Claude's behavior for specific tasks).

    155 people used it over 3.5 weeks. I analyzed the results and found some patterns I didn't expect.

    The headline finding: someone typed "I a bartender" (12 characters, with a typo) and scored 85/100. A 15,576-character technical specification about development process analysis scored 72/100. The bartender input was reproducible, I ran it twice.

    More surprisingly, "hey bro" scored 88/100. The system generated a "Casual Communication Skill" and suggested adding "quantifiable success metrics." The grading algorithm clearly has issues (acknowledged in the post).

    What actually predicted quality: - Specific, well-understood domains (plumber, bartender, OKR expert) - Task-oriented descriptions (what you do vs. what you are) - Brevity with clarity (top scores averaged under 100 characters) - Named frameworks or methodologies

    What didn't: length (negatively correlated with score), vague enthusiasm, attempts to jailbreak or override Claude's behavior.

    The tool uses Claude to generate the skill, then a separate Claude call to grade it. The grading inconsistency is a known problem. I built a guided question flow to address the input quality issue, which asks three follow-up questions when input is too vague.

    Stack: Next.js, Supabase, Claude API. Blog post has links to every skill mentioned so you can see the actual outputs.