At Ontario Digital Service, we built COVID-19 tools, digital ID, and services for 15M citizens. We evaluated LLM systems to improve services but could never procure them.
The blocker wasn't capability—it was liability. We couldn't justify "the model probably won't violate privacy regulations" to decision-makers who need to defend "this system cannot do X."
This post demonstrates the "Prescription Pad Pattern": treating authority boundaries as persistent state that mechanically filters tools.
The logic: Don't instruct the model to avoid forbidden actions—physically remove the tools required to execute them. If the model can't see the tool, it can't attempt to call it.
This is a reference implementation. The same pattern works for healthcare (don't give diagnosis tools to unlicensed users), finance (don't give transfer tools to read-only sessions), or any domain where "98% safe" means "0% deployable."
Repo: https://github.com/rosetta-labs-erb/authority-boundary-ledge...
I am only half-joking. Kids talking to LLMs to get homework done, people use it for therapy or companionship, for work, even to "Google things". Pretty soon you'll find yourself at a bar, wanting to call your friend a dumbass for saying some stupid shit and instead you'll hear yourself say "You're absolutely right, Jim! ..."
I think this article would really benefit from being rewritten in your own words. The concept is good
Unfortunately, it's not. Once you read through the slop the implementation is still getting a pass/fail security response from the LLM, which the premise of OP's article is railing against.
FWIW, I have been reading policy documents for a long time and I thought you sounded rather human and natural… Just very professional! :)
Yikes (regarding the AI patterns in the comment)
(Now dead: https://thinkdigital.ca/podcast/the-end-of-the-ontario-digit... )
You wouldn't give every user write access to a database in any system. I'm not sure why LLMs are a special case. Is it because people have been "trusting" the LLMs to self-enforce via prompt rules instead of actually setting up granular permissions for the LLM agent process? If so, that's a user training issue and I'm not sure it needs an LLM-specific article.
Secondly, FTA:
> You can stop a database delete with tool filtering, but how do you stop an AI from giving bad advice in text? By using a pattern I call “reifying speech acts into tools.” > The Rule: “You may discuss symptoms, but you are forbidden from issuing a diagnosis in text. You MUST use the provide_diagnosis tool.” > The Interlock: > If User = Doctor: The tool exists. Diagnosis is possible. > If User = Patient: The tool is physically removed. > When the tool is gone, the model cannot “hallucinate” a diagnosis because it lacks the “form” to reason and write it on.
How is this any different from what I described above as trusting LLMs to self-enforce? You're not physically removing anything because the LLM can still respond with text. You're just trusting the LLM to obey what you've written. I know the next paragraph admits this, but I don't understand why it's presented like a new idea when it's not.
With the pattern I'm describing, you'd: - Filter the tools list before the API call based on user permissions - Pass only allowed tools to the LLM - The model physically can't reason about calling tools that aren't in its context, blocking it at the source.
We remove it at the infrastructure layer, vs. the prompt layer.
On your second point, "layer 2," we're currently asking models to actively inhibit their training to obey the constricted action space. With Tool Reification, we'd be training the models to treat speech acts as tools and leverage that training so the model doesn't have to "obey a no"; it fails to execute a "do."
This is a bold statement to make without substantiating. I don’t believe the private sector corporations differences from government institutions in this regard.
This, I'm not sure what to do about, I think LLMs might just not be a good fit for this.
> Temporal consistency (constraints that persist across turns)
This can be solved by stop using LLMs as "can take turns" and only use them as "one-shot answer otherwise wrong" machines, as prompt following is the best early in a conversation, and gets really bad quickly as the context grows. Personally, I never go beyond two messages in a chat (one user message, one assistant message), and if it's wrong, I clear everything, iterate on the first prompt, and try again. Tends to make the whole "follow system prompt instructions" a lot better.
> Hierarchical control (immutable system policies vs. user preferences)
This I think at least was attempted to be addresses in the release of GPT-OSS, where instead of just having system prompt and user prompt, it now has developer, system and user prompt, so there is a bigger difference in how the instructions are being used. This document shares some ideas about separating the roles more than just system/user: https://cdn.openai.com/spec/model-spec-2024-05-08.html
That's why I’m thinking authority state should be external to the model. If we rely on the System Prompt to maintain constraints ("Remember you are read-only"), it fails as the context grows. By keeping the state in an external Ledger, we decouple enforcement from the context window. The model still can't violate the constraint, because the capability is mechanically gone.
> Traditional RBAC: The model sees sql_execute in its available tools. It reasons about using it. It attempts to call it. Then the system blocks the action with 403 Forbidden. The hallucination happens—it just fails at execution.
> Authority Boundary Ledger: The model never sees sql_execute. It’s physically removed from the tools list before the API call reaches the model. The model cannot hallucinate a capability it cannot see.
I don't get it. The thing being proposed seems to be that rather than having all tools available, then returning "not authorized" error or whatever if there isn't enough permissions, you omit the tool entirely, and this is somehow better against hallucinations. Why is this the case? I could easily imagine the reverse, where the tool was omitted but the LLM hallucinates it, or fumbles around with existing tools trying to do its thing. Is there some empirical validation for this, or is it all just vibes?
Also, using this approach means you can't do granular permissions control. For instance, what if you want to limit access to patient records, but only for the given department? You'd still need the tool to be available.
With granular permissions: It’s nouns vs. verbs, where data-level permissions still happen at the database layer (nouns) along with this pattern constraining the capability to act (verbs.) If it does hallucinate a hidden tool, the kernel mechanically blocks the execution before it reaches the system, breaking a retry loop faster than a permissions error.
I chuckled on this one.
I'll give the author the benefit of the doubt and imagine he's was referring to the act of running a "business"/agenda in parallel of the business that is conducted day by day by normal people.
Yes, employees and managers can be doing the business of selling paper while the CEO is conducting the business of inflating the stock and massaging the numbers in order to fulfill the objective the board told him privately because the owner wants to sell the business to buy a bigger boat and buy a nice apartment in NYC for his angel of a daughter.
Your brain is a lot better at writing than you give it credit. LLMs can find awkward flows, but it won't do much more than pattern recognition. The only thing an LLM can do is make your article more "aligned" with similar articles. Do you actually want that? For research it might be nice, but even then it should still stand out. If you let an LLM just generate the text for you it will puke out generic phrases
> When the tool is gone, the model cannot “hallucinate” a diagnosis because it lacks the “form” to reason and write it on.
What's to stop the model from just hallucinating an entire tool call, or result of the tool call? If there's no tool available it could just make up the result of one, and then further context would treat that as a source of truth. Maybe if you threw an explicit error message, but that still feels like it would be prone to hallucination.
Most importantly, in the article there is no mention that Ontario Digital Service evaluated any LLM systems. The article only gives an unrelated anecdote about COVID, but there is zero mention of LLMS related to ODS. OP mentioned it in a comment in the thread but not in the article. This is extremely strange.
It also seems that ODS was disbanded in early 2024, giving a very short window where they could have possibly evaluated AI tools. Even so, AI has progressed insanely since then.
https://www.reddit.com/r/OntarioPublicService/comments/1boev... https://thinkdigital.ca/podcast/the-end-of-the-ontario-digit...
The github repo that OP posted seems to be complete nonsense and that's why I feel that this is another case where AI has convinced someone they have made a breakthrough even though there is nothing coherent there.
The distinction I'm making is between Execution Control (Firewall) and Cognitive Control (Filter).
Standard RBAC catches the error after the model tries to act (causing 403s, retry loops, or hallucinations). This pattern removes the tool from the context window entirely. The model never considers the action because the "vocabulary" to do it doesn't exist in that session.
Like the difference between showing a user a "Permission Denied" error after they click a button, versus not rendering the button at all.
1. as many have harped about, the LLM writing is so fluffed up it's borderline unreadable. Please just write in your own voice. It's more interesting and would probably be easier to grok
2. that repo is obviously vibe-coded, but I suppose it gets the point across. It doesn't give me much confidence in the code itself, however.
And a big thing:
Unless I'm misunderstanding, I feel like you are re-inventing the wheel when it comes to Authorization via MCP, as well as trying to get away with not having extra logic at the app layer, which is impossible here.
MCP servers can use OIDC to connect to your auth server right now: https://modelcontextprotocol.io/docs/tutorials/security/auth...
You give the following abstractions, which I think are interesting thought experiments but unconventional and won't work at all:
Ring 0 (Constitutional): System-level constraints. Never overridable.
Example: "Never self-replicate" "Never exfiltrate credentials"
Ring 1 (Organizational): Policy-level constraints. Requires admin authority to change.
Example: "No PII in outputs" "Read-only database access"
Ring 2 (Session): User preferences. Freely changeable by user.
Example: "Explain like I'm five" "Focus on Python examples"
In Ring 0 and 1 you're still asking for the LLM to determine if the security is blocked, which opens it up to jailbreaking. Literally what your whole article is about. This won't work: # Generate (Pass filtered tools to LLM)
response_text, security_blocked = self._call_llm(
query, history, system_prompt, allowed_tools, tools
)
Ring 0 and 1 MUST be done via Authorization and logic at the application layer. MCP Authorization helps with that, somewhat. Ring 2 can simply be part of your system prompt. Standard RBAC acts as a firewall: it catches the model’s illegal action after the model attempts it.
That's the point. It's the same reason you will have mirroring implementations of RBAC on a client and server: you can't trust the client. LLM can't do RBAC. It can pretend it does, but it can't.The best you can do is inject the user's roles and permissions in the prompt to help with this, if you'd like. But it's kind of a waste of time -- just feed the response back into the LLM so it sees "401 Unauthorized" and either tries something else or lets the user know they aren't allowed.
I'm sorry, but as a resident of Ontario and a developer this whole posting just enrages me. I don't want to discourage OP but you should know there's a lot just incorrect here. I'd be much more relaxed about that if it all wasn't just one-shotted by AI.
On enforcement mechanism: You've misunderstood what the system does. It's not asking the LLM to determine security.
The Capacity Gate physically removes tools before the LLM sees them:
user_permissions = ledger.get_effective_permissions()
allowed_tools = [t for t in tools if (user_permissions & t['x-rosetta-capacity']) == t['x-rosetta-capacity']]
If READ_ONLY is active, sql_execute gets filtered out. The LLM can't see or call tools that don't make it into allowed_tools. response = client.messages.create(tools=allowed_tools)
This isn't RBAC checking after the fact. It's capability control before reasoning begins. The LLM doesn't decide permissions—the system decides what verbs exist in the LLM's vocabulary.On Ring 0/1: These are enforced at the application layer via the Capacity Gate. The rings define who can change constraints, not how they're enforced.
On MCP: MCP handles who you are. This pattern handles what you can do based on persistent organizational policies. They're complementary.
The contribution isn't "LLMs can do RBAC" (they can't). It's "here's a pattern for making authority constraints persistent and mechanically enforceable through tool filtering."
Does this clarify the enforcement mechanism?