Why Ontario Digital Service couldn't procure '98% safe' LLMs (15M Canadians)(rosetta-labs-erb.github.io)

40 pointsby csemplea month ago17 comments

csemplea month ago
OP here. *** I'm seeing comments about AI-generated writing. This is my voice—I've been writing in this style for years in government policy docs. Happy to discuss the technical merits rather than the prose style. ***
At Ontario Digital Service, we built COVID-19 tools, digital ID, and services for 15M citizens. We evaluated LLM systems to improve services but could never procure them.
The blocker wasn't capability—it was liability. We couldn't justify "the model probably won't violate privacy regulations" to decision-makers who need to defend "this system cannot do X."
This post demonstrates the "Prescription Pad Pattern": treating authority boundaries as persistent state that mechanically filters tools.
The logic: Don't instruct the model to avoid forbidden actions—physically remove the tools required to execute them. If the model can't see the tool, it can't attempt to call it.
This is a reference implementation. The same pattern works for healthcare (don't give diagnosis tools to unlicensed users), finance (don't give transfer tools to read-only sessions), or any domain where "98% safe" means "0% deployable."
Repo: https://github.com/rosetta-labs-erb/authority-boundary-ledge...
- alex000kima month ago
  This was so clearly LLM-generated that I couldn't get through the whole thing.
  - rdtsca month ago
    In a few years everyone will be talking like this -- humans and LLMs alike. We're not there yet but our LLMs masters will train us soon enough.
    I am only half-joking. Kids talking to LLMs to get homework done, people use it for therapy or companionship, for work, even to "Google things". Pretty soon you'll find yourself at a bar, wanting to call your friend a dumbass for saying some stupid shit and instead you'll hear yourself say "You're absolutely right, Jim! ..."
    csemplea month ago
    I guess working in government has put me ahead of the curve sounding like a robot.
- yellow_leada month ago
  Hi OP, can you rewrite the article in your own words?
  - kspacewalk2a month ago
    I second this. Very difficult to read through the slop. I get that it saves time, but it's verbose and repetitive in all the wrong places.
- Flipflip79a month ago
  Im Canadian (Not Onario), so I really wanted to enjoy reading this as a peak inside what IT is like in that environment, but the LLM generated headers and patterns in the piece really put me off and I had to stop reading after a couple of minutes Im afraid.
  I think this article would really benefit from being rewritten in your own words. The concept is good
  - skipantsa month ago
    > The concept is good
    Unfortunately, it's not. Once you read through the slop the implementation is still getting a pass/fail security response from the LLM, which the premise of OP's article is railing against.
- abejfehra month ago
  > The blocker wasn't capability—it was liability.
  Yikes (regarding the AI patterns in the comment)
- Jean-Papoulos19 days ago
  >As Head of Product for the Ontario Digital Service
  Ah, this explains a lot about the state of Canada actually.
- an_d_rewa month ago
  OP Thank you for taking the time to write and post this! It was an interesting take on a very difficult problem.
  FWIW, I have been reading policy documents for a long time and I thought you sounded rather human and natural… Just very professional! :)
- neoma month ago
  What exactly is "Ontario Digital Service" in this context?
  - philipwhiuka month ago
    A department of the government of Ontario.
    (Now dead: https://thinkdigital.ca/podcast/the-end-of-the-ontario-digit... )
- hauntera month ago
  3 — in one single comment. Even your comment is AI generated
  - csemplea month ago
    Yep, I use em dashes all the time—still a human typing this.
    TYPE_FASTERa month ago
    Same here. It’s not the signal everybody thinks it is.
- phyzomea month ago
  Please try writing this article yourself. It's unreadable as-is due to the slop.
wackgeta month ago
I don't want to trivialise someone's hard work but isn't this really just applying to LLMs what every responsible developer/sysadmin already knows: granular permissions, thoughtfully delegated?
You wouldn't give every user write access to a database in any system. I'm not sure why LLMs are a special case. Is it because people have been "trusting" the LLMs to self-enforce via prompt rules instead of actually setting up granular permissions for the LLM agent process? If so, that's a user training issue and I'm not sure it needs an LLM-specific article.
Secondly, FTA:
> You can stop a database delete with tool filtering, but how do you stop an AI from giving bad advice in text? By using a pattern I call “reifying speech acts into tools.” > The Rule: “You may discuss symptoms, but you are forbidden from issuing a diagnosis in text. You MUST use the provide_diagnosis tool.” > The Interlock: > If User = Doctor: The tool exists. Diagnosis is possible. > If User = Patient: The tool is physically removed. > When the tool is gone, the model cannot “hallucinate” a diagnosis because it lacks the “form” to reason and write it on.
How is this any different from what I described above as trusting LLMs to self-enforce? You're not physically removing anything because the LLM can still respond with text. You're just trusting the LLM to obey what you've written. I know the next paragraph admits this, but I don't understand why it's presented like a new idea when it's not.
- csemplea month ago
  Yes, on your first point "layer 1" isn't fundamentally new. It's applying standard systems administration principles, because we're currently trusting prompts to do the work of permissions.
  With the pattern I'm describing, you'd: - Filter the tools list before the API call based on user permissions - Pass only allowed tools to the LLM - The model physically can't reason about calling tools that aren't in its context, blocking it at the source.
  We remove it at the infrastructure layer, vs. the prompt layer.
  On your second point, "layer 2," we're currently asking models to actively inhibit their training to obey the constricted action space. With Tool Reification, we'd be training the models to treat speech acts as tools and leverage that training so the model doesn't have to "obey a no"; it fails to execute a "do."
  - QuadrupleAa month ago
    You might be overestimating the rigor of tool calls - they're ultimately just words the LLM generates. Also I wonder if "tool stubs" might work better in your case, if an LLM uses a give_medical_advice() and there's no permission, just have it do nothing? Either way you're still trusting an inherently random-sampled LLM to adhere to some rules. Never going to be fully reliable, and nowhere near the determinism we've come to expect from traditional computing. Tool calls aren't some magic that gets around that.
    csemplea month ago
    You’re totally right—it's ultimately just probabilistic tokens. I’m thinking that by physically removing the tool definition from the context window, we avoid state desynchronization. If the tool exists in the context, the model plans to use it. When it hits a "stub" error, it can enter a retry loop or hallucinate success. By removing the definition entirely, we align the model's World Model with its Permissions. It doesn't try to call a phone that doesn't exist.
supriyo-biswasa month ago
I wish people would just write whatever they wanted to write instead of bloating them up through a LLM, and this is true for a lot of articles these days.
- jackyingera month ago
  I went on a date with a gal who told me about using LLMs to fluff up her work emails, and she was proud of it. I was aghast, imagining the game of telephone where a receiver drops the mail into an LLM for he TLDR. The date didn’t go well haha
QuadrupleAa month ago
I have the feeling this boils down to something really mundane - but the writing is so puffed-up with vague language it's hard to make out. Require human approval for all LLM actions? Log who approved?
fmbba month ago
> In most organizations, knowledge increases as you go up the hierarchy. CEOs understand their business better than middle managers. Executives have more context, more experience, more to lose.
This is a bold statement to make without substantiating. I don’t believe the private sector corporations differences from government institutions in this regard.
- kspacewalk2a month ago
  Has a lot more to do with scale than with the organization being government or heavily regulated.
embedding-shapea month ago
> Authority state (what constraints are actively enforced)
This, I'm not sure what to do about, I think LLMs might just not be a good fit for this.
> Temporal consistency (constraints that persist across turns)
This can be solved by stop using LLMs as "can take turns" and only use them as "one-shot answer otherwise wrong" machines, as prompt following is the best early in a conversation, and gets really bad quickly as the context grows. Personally, I never go beyond two messages in a chat (one user message, one assistant message), and if it's wrong, I clear everything, iterate on the first prompt, and try again. Tends to make the whole "follow system prompt instructions" a lot better.
> Hierarchical control (immutable system policies vs. user preferences)
This I think at least was attempted to be addresses in the release of GPT-OSS, where instead of just having system prompt and user prompt, it now has developer, system and user prompt, so there is a bigger difference in how the instructions are being used. This document shares some ideas about separating the roles more than just system/user: https://cdn.openai.com/spec/model-spec-2024-05-08.html
- csemplea month ago
  Yep, you nailed the problem: context drift kills instruction following.
  That's why I’m thinking authority state should be external to the model. If we rely on the System Prompt to maintain constraints ("Remember you are read-only"), it fails as the context grows. By keeping the state in an external Ledger, we decouple enforcement from the context window. The model still can't violate the constraint, because the capability is mechanically gone.
grueza month ago
>Here’s the distinction that matters for institutional deployment:
> Traditional RBAC: The model sees sql_execute in its available tools. It reasons about using it. It attempts to call it. Then the system blocks the action with 403 Forbidden. The hallucination happens—it just fails at execution.
> Authority Boundary Ledger: The model never sees sql_execute. It’s physically removed from the tools list before the API call reaches the model. The model cannot hallucinate a capability it cannot see.
I don't get it. The thing being proposed seems to be that rather than having all tools available, then returning "not authorized" error or whatever if there isn't enough permissions, you omit the tool entirely, and this is somehow better against hallucinations. Why is this the case? I could easily imagine the reverse, where the tool was omitted but the LLM hallucinates it, or fumbles around with existing tools trying to do its thing. Is there some empirical validation for this, or is it all just vibes?
Also, using this approach means you can't do granular permissions control. For instance, what if you want to limit access to patient records, but only for the given department? You'd still need the tool to be available.
- csemplea month ago
  If a tool is in the context window, the model assigns a non-zero probability to using it. By filtering it out upstream, you entirely remove that path from the inference tree. Instead of asking the model to ignore an affordance, you remove the affordance entirely.
  With granular permissions: It’s nouns vs. verbs, where data-level permissions still happen at the database layer (nouns) along with this pattern constraining the capability to act (verbs.) If it does hallucinate a hidden tool, the kernel mechanically blocks the execution before it reaches the system, breaking a retry loop faster than a permissions error.
ForHackernewsa month ago
I'm a little bit unclear why these permissions need to be enforced at the AI kernel layer. Couldn't you put the chatbot outside your normal system permissions boundary and treat it as an untrusted user? The bot becomes an assistant that helps formulate user requests, but doesn't have any elevated permissions relative to the user themself.
- csemplea month ago
  You're exactly right—treating the LLM as an untrusted user is the security baseline.
  The distinction I'm making is between Execution Control (Firewall) and Cognitive Control (Filter).
  Standard RBAC catches the error after the model tries to act (causing 403s, retry loops, or hallucinations). This pattern removes the tool from the context window entirely. The model never considers the action because the "vocabulary" to do it doesn't exist in that session.
  Like the difference between showing a user a "Permission Denied" error after they click a button, versus not rendering the button at all.
  - XenophileJKOa month ago
    As someone that has built many of these systems, it doesn't remove the tendency or "impulse" to act. Removing the affordance may "lower" the probability of the action, but it increases the probability that the model will misuse another tool and try to accomplish the same action.
    csemplea month ago
    Ya, makes sense—if the model is trained just to "be helpful," removing the tool forces it to improvise. I’m thinking this is where the architecture feeds back into the training/RLHF. We train the model to halt reasoning in that action space if the specific tool is missing. This changes the safety problem from training the model to understand complex permission logic to training the model to respect a binary absence of a tool.
  - ramon156a month ago
    You're absolutely right!
PedroBatistaa month ago
"In most organizations, knowledge increases as you go up the hierarchy. CEOs understand their business better than middle managers. "
I chuckled on this one.
I'll give the author the benefit of the doubt and imagine he's was referring to the act of running a "business"/agenda in parallel of the business that is conducted day by day by normal people.
Yes, employees and managers can be doing the business of selling paper while the CEO is conducting the business of inflating the stock and massaging the numbers in order to fulfill the objective the board told him privately because the owner wants to sell the business to buy a bigger boat and buy a nice apartment in NYC for his angel of a daughter.
ramon156a month ago
If you want to use LLMs for writing, only do so after YOU feel like it's done, and then only let it make comments.
Your brain is a lot better at writing than you give it credit. LLMs can find awkward flows, but it won't do much more than pattern recognition. The only thing an LLM can do is make your article more "aligned" with similar articles. Do you actually want that? For research it might be nice, but even then it should still stand out. If you let an LLM just generate the text for you it will puke out generic phrases
Zetaphora month ago
Hey OP, I'm curious about the accuracy of this quote:
> When the tool is gone, the model cannot “hallucinate” a diagnosis because it lacks the “form” to reason and write it on.
What's to stop the model from just hallucinating an entire tool call, or result of the tool call? If there's no tool available it could just make up the result of one, and then further context would treat that as a source of truth. Maybe if you threw an explicit error message, but that still feels like it would be prone to hallucination.
dfajgljsldkjaga month ago
This story is misleading and the author possibly having a case of AI Psychosis.
Most importantly, in the article there is no mention that Ontario Digital Service evaluated any LLM systems. The article only gives an unrelated anecdote about COVID, but there is zero mention of LLMS related to ODS. OP mentioned it in a comment in the thread but not in the article. This is extremely strange.
It also seems that ODS was disbanded in early 2024, giving a very short window where they could have possibly evaluated AI tools. Even so, AI has progressed insanely since then.
https://www.reddit.com/r/OntarioPublicService/comments/1boev... https://thinkdigital.ca/podcast/the-end-of-the-ontario-digit...
The github repo that OP posted seems to be complete nonsense and that's why I feel that this is another case where AI has convinced someone they have made a breakthrough even though there is nothing coherent there.
opengrassa month ago
AI slop bullshit (unbelievable) and for a high ranking Ontario manager, more people deserve to know you are responsible for the distopian crap the government tried pushing in the past 5 years.
EGrega month ago
Why was this flagged?
It speaks negatively about AI?
- grueza month ago
  Read the comments. People are hating on it because it reads like AI slop, and even if you get past that there's nothing particularly insightful.
parliament32a month ago
Interesting topic, but linked is just AI slop. Perhaps there's a human version of this content somewhere?
h34ta month ago
"pre-filter your MCP tools by user permissions"
- csemplea month ago
  Yes, thanks. That's the one-sentence summary.
skipantsa month ago
A couple small things:
1. as many have harped about, the LLM writing is so fluffed up it's borderline unreadable. Please just write in your own voice. It's more interesting and would probably be easier to grok
2. that repo is obviously vibe-coded, but I suppose it gets the point across. It doesn't give me much confidence in the code itself, however.
And a big thing:
Unless I'm misunderstanding, I feel like you are re-inventing the wheel when it comes to Authorization via MCP, as well as trying to get away with not having extra logic at the app layer, which is impossible here.
MCP servers can use OIDC to connect to your auth server right now: https://modelcontextprotocol.io/docs/tutorials/security/auth...
You give the following abstractions, which I think are interesting thought experiments but unconventional and won't work at all:
```
    Ring 0 (Constitutional): System-level constraints. Never overridable.
        Example: "Never self-replicate" "Never exfiltrate credentials"

    Ring 1 (Organizational): Policy-level constraints. Requires admin authority to change.
        Example: "No PII in outputs" "Read-only database access"
    
    Ring 2 (Session): User preferences. Freely changeable by user.
        Example: "Explain like I'm five" "Focus on Python examples"
```
In Ring 0 and 1 you're still asking for the LLM to determine if the security is blocked, which opens it up to jailbreaking. Literally what your whole article is about. This won't work:
```
    # Generate (Pass filtered tools to LLM)
    response_text, security_blocked = self._call_llm(
        query, history, system_prompt, allowed_tools, tools
    )
```
Ring 0 and 1 MUST be done via Authorization and logic at the application layer. MCP Authorization helps with that, somewhat. Ring 2 can simply be part of your system prompt.
```
     Standard RBAC acts as a firewall: it catches the model’s illegal action after the model attempts it.
```
That's the point. It's the same reason you will have mirroring implementations of RBAC on a client and server: you can't trust the client. LLM can't do RBAC. It can pretend it does, but it can't.
The best you can do is inject the user's roles and permissions in the prompt to help with this, if you'd like. But it's kind of a waste of time -- just feed the response back into the LLM so it sees "401 Unauthorized" and either tries something else or lets the user know they aren't allowed.
I'm sorry, but as a resident of Ontario and a developer this whole posting just enrages me. I don't want to discourage OP but you should know there's a lot just incorrect here. I'd be much more relaxed about that if it all wasn't just one-shotted by AI.
- csemplea month ago
  I appreciate the feedback. Let me address the key technical point:
  On enforcement mechanism: You've misunderstood what the system does. It's not asking the LLM to determine security.
  The Capacity Gate physically removes tools before the LLM sees them:
  user_permissions = ledger.get_effective_permissions() allowed_tools = [t for t in tools if (user_permissions & t['x-rosetta-capacity']) == t['x-rosetta-capacity']]
  If READ_ONLY is active, sql_execute gets filtered out. The LLM can't see or call tools that don't make it into allowed_tools.
  response = client.messages.create(tools=allowed_tools)
  This isn't RBAC checking after the fact. It's capability control before reasoning begins. The LLM doesn't decide permissions—the system decides what verbs exist in the LLM's vocabulary.
  On Ring 0/1: These are enforced at the application layer via the Capacity Gate. The rings define who can change constraints, not how they're enforced.
  On MCP: MCP handles who you are. This pattern handles what you can do based on persistent organizational policies. They're complementary.
  The contribution isn't "LLMs can do RBAC" (they can't). It's "here's a pattern for making authority constraints persistent and mechanically enforceable through tool filtering."
  Does this clarify the enforcement mechanism?
  - skipantsa month ago
    Really? Even with your AI generated article I took my own time to read and reply sans AI and you can't even respond to my comment without it? Thanks.
    a month ago
    undefined