Easy to author (at its most basic, just a markdown file), context efficient by default (only preloads yaml front-matter, can lazy load more markdown files as needed), can piggyback on top of existing tooling (for instance, instead of the GitHub MCP, you just make a skill describing how to use the `gh` cli).
Compared to purpose-tuned system prompts they don't require a purpose-specific agent, and they also compose (the agent can load multiple skills that make sense for a given task).
Part of the effectiveness of this, is that AI models are heavy enough, that running a sandbox vm for them on the side is likely irrelevant cost-wise, so now the major chat ui providers all give the model such a sandboxed environment - which means skills can also contain python scripts and/or js scripts - again, much simpler, more straightforward, and flexible than e.g. requiring the target to expose remote MCPs.
Finally, you can use a skill to tell your model how to properly approach using your MCP server - which previously often required either long prompting, or a purpose-specific system prompt, with the cons I've already described.
Compared to MCPs, this is a much faster and more approachable flow to add "capabilities" to your agents.
I'm having a hard time figuring out how could I leverage skills in a medium size web application project.
It's python, PostgreSQL, Django.
Thanks in advance.
I wonder if skills are more useful for non crud-like projects. Maybe data science and DevOps.
Maybe you have a custom auth backend that needs an annoying local proxy setup before it can be tested—you don’t need all of those instructions in the primary agents.md bloating the context on every request, a skill would let you separate them so they’re only accessed when needed.
Or if you have a complex testing setup and a multi-step process for generating realistic fixtures and mocks: the AI maybe only needs some basic instructions on how to run the tests 90% of the time, but when it’s time to make significant changes it needs info about your whole workflow and philosophy.
I have a django project with some hardcoded constants that I source from various third party sites, which need to be updated periodically. Originally that meant sitting down and visiting a few websites and copy pasting identifiers from them. As AI got better web search I was able to put together a prompt that did pretty well at compiling them. With a skill I can have the AI find the updated info, update the code itself, and provide it some little test scripts to validate it did everything right.
- listTables
- getTableSchema
- executeQuery (blocks destructive queries like anything containing DROP, DELETE, etc..)
I wouldn't trust a textual instructions to prevent LLMs from dropping a table.
but if it’s something more involved or less frequently used (perhaps some debugging methodology, or designing new data schemas) skills are probably a good fit
The key here is “on demand”. Not every agent or convention needs to know kung fu. But when they do, a skill is waiting to be consumed. This basic idea is “progressive disclosure” and it composes nicely to keep context windows focused. Eg i have a metabase skill to query analytics. Within that I conditionally refer to how to generate authentication if they arent authenticated. If they are authenticated, that information need not be consumed.
Some practical “skills”: writing tests, fetching sentry info, using playwright (a lot of local mcps are just flat out replaced by skills), submitting a PR according to team conventions (eg run lint, review code for X, title matches format, etc)
So if you have subtle logic in a Skill that’s not mentioned in a description, or you use the skill body to describe use-cases not obvious from the front-matter, it may never be discovered or used.
Additionally, skill descriptions are all essentially prompt injections, whether relevant/vector-adjacent to your current task or not; if they nudge towards a certain tone, that may apply to your general experience with the LLM. And, of course, they add to your input tokens on every agentic turn. (This feature was proudly brought to you by Big Token.) So be thoughtful about what you load in what context.
See e.g. https://github.com/openai/codex/blob/a6974087e5c04fc711af68f...
1. Open-Skills: https://github.com/BandarLabs/open-skills
This is really an agentic harness issue, not an LLM issue per se.
In 2026, I think we'll see agentic harnesses much more tightly integrated with their respective LLMs. You're already starting to see this, e.g. with Google's "Interactions" API and how different LLMs expect CoT to be maintained.
There's a lot of alpha in co-optimizing your agentic harness with how the LLM is RL-trained on tool use and reasoning traces.
What do "skills" look like, generically, in this framework?
<Skills>
<Skill>
<Name>postgres</Name>
<Description>Directions on how to query the pre-prod postgres db</Description>
<File>skills/postgres.md</File>
</Skill>
</Skills>The harness then may periodically resend this notification so that the LLM doesn't "forget" that skills are available. Because the notification is only name + description + file, this is cheap r.e tokens. The harness's ability to tell the LLM "IMPORTANT: this is a skill, so pay attention and use it when appropriate" and then periodically remind them of this is what differentiates a proper Anthropic-style skill from just sticking "If you need to do postgres stuff, read skills/postgres.md" in AGENTS.md. Just how valuable is this? Not sure. I suspect that a sufficiently smart LLM won't need the special skill infrastructure.
(Note that skill name is not technically required, it's just a vanity / convenience thing).
... And do we know how it does that? To my understanding there is still no out-of-band signaling.
So it's just like a standard way to bring in prompts/scripts to the LLM with support from the tooling directly.
I have many "folders"... each with a README.md, a scripts folder, and an optional GUIDE.md.
Whenever I arrive at some code that I know can be reused easily (for example: clerk.dev integration hat spans frontend and backend both), I used to create a "folder" of the same.
When needed, I used to just copy-paste all the folder content using my https://www.npmjs.com/package/merge-to-md package.
This has worked flawlessly well for me uptil now.
Glad we are bringing such capability natively into these coding agents.
Imagine having Skills available that implements authentication systems, multi-tenancy, etc.. in your codebase without having to know all the details about how to do this securely and correctly. This would probably boost code quality a lot and prevent insecure/buggy vibe coded products.
A lot of the things we want continuous learning for can actually be provided by the ability to obtain skills on the fly.
Some paths are emerging popular, but in a lot of cases we’re still not sure even these are the long term paths that will remain. It doesn’t help that there’s not a good taxonomy (that I’m aware of) to define and organize the different approaches out there. “Agent” for example is a highly overloaded term that means a lot of things and even in this space, agents mean different things to different groups.
For LLMs, we're just about at the stage where we've realized we can jam a sharp thing in the spinny part and use it to cut things. The race is on not only to improve the motors (models) themselves, but to invent ways of holding and manipulating and taking advantage of this fundamental thing that feel so natural that they seem obvious in hindsight.
Tools are useful so the AI can execute commands, but beyond that it's just ways to help you build the context for your prompt. Either pulling in premade prompts that provides certain instructions or documentation, or providing more specialized tools for the model to use along with instructions on using those tools.
- you will be getting a TON of spam. Just look at all the MCP folks, and how they're spamming everywhere with their claude-vibed mcp implementation over something trivial.
- the security implications are enormous. You'd need a way to vet stuff, moderate, keep track of things and so on. This only compounds with more traffic, so it'd probably be untenable really fast.
- there's probably 0 money in this. So you'd have to put a lot of work in maintaining a platform that attracts a lot of abuse/spam/prompt kiddies, while getting nothing in return. This might make sense to do for some companies that can justify this cost, but at that point, you'd be wondering what's in it for them. And what control do they exert on moderation/curation, etc.
I think the best we'll get in this space is from "trusted" entities (i.e. recognised coders / personalities / etc), from companies themselves (having skills in repos for known frameworks might be a thing, like it is with agents.md), and maybe from the token providers themselves.
not ranked with comments but I’d expect solid quality from these and they should “just work” in Codex etc.
More like a gallery than a marketplace
Obviously they are empowering Codex and Claude etc, and many will be open source or free.
But for those who have commercial resources or tools to add to the skills choice, is there documentation for doing that smoothly, or a pathway to it?
I can see at least a couple of ways it might be done - skills requiring API keys or or other authentication approaches, but this adds friction to an otherwise smooth skill integration process.
Having instead a transparent commission on usage sent to registered skill suppliers would be much cleaner but I'm not confident that would be offered fairly, and I've seen no guidance yet on plans in that regard.
Close enough, welcome back index.htm, can't wait to see the first ads being served in my skills
The non-deterministic statistical nature of LLMs means it's inherently an "inevitably unverifiable process" to begin with, even if you pass it some type-checked, linted, skills file or prompt format.
Besides, YAML or JSON or XML or free-form text, for the LLM it's just tokens.
At best you could parse the more structured docs with external tools more easily, but that's about it, not much difference when it comes to their LLM consumption.
There you go, you're welcome.
More generally I think testing AI by using its web search, code execution and ensembling is the missing ingredient to increased usage. We need to define the opposite of AI work - what validates it. This is hard, but once done you can trust the system and it becomes cheaper to change.
Not to mention the advantages it would present for iteration and improvement.
Just the format would be. There's no rigid structure that gets any preferrential treatment by the LLM, even if it did accept. In the end it's just instructions that are no different in any way from the prompt text.
And nothing stops you from making a "parameterized and normalized to some agreed-upon structure" and passing it directly to the LLM as skills content, or parsing it and dumping it as skills regular text content.
With Skills however, you just selectively append more text to prompt and pray.
That said, for many tasks (summaries and data extraction) I do use Gemini 2.5 Flash, as it cheap and fast. So excited to try Gemini 3 Flash as well.
It’s also interesting to see how instead of a plan mode like CC, Codex is implementing planning as a skill.
(To clarify, I meant that some engineers mostly use CC while others mostly use Codex, as opposed to engineers using both at the same time.)
Otherwise, why not just keep the password in an .env file, and state “grab the password from the .env file” in your Postgres skill?
Why not the filesystem?
I would create a local file (e.g. .env) in each project using postgres, then in my postgres skill, tell the agent to check that file for credentials.
Anthropic: https://www.anthropic.com/engineering/equipping-agents-for-t...
Copilot: https://github.blog/changelog/2025-12-18-github-copilot-now-...
Skills are available in both the Codex CLI and IDE extensions.
As of this week, this also applies to Hacker News.
I do also like to make skills on things that are more niche tools, like marimo (a very nice jupyter replacement). The model probably does known some stuff about it, but not enough, and the agent could find enough online or in context7, but it will waste a lot of time and context in figuring it out every time. So instead I will have a deep thinking agent do all that research up front and build a skill for it, and I might customize it to be more specific to my environment, but it's mostly the condensed research of the agent so that I don't need to redo that every time.