Hacker News
new
top
best
ask
show
job
Capabilities Can't See Your Agent's Objective
(
jlmr.dev
)
3 points
by
jelmersnoeck
3 hours ago
1 comment
hiroto_lemon
3 hours ago
Reconciling intent has a bootstrap problem: it's inferred from the same model you're constraining, so it rationalizes. Side-effect gates — spend, irreversible writes — can't be talked around.