by v_CodeSentinal 7 hours ago

The deny list section hit home. I keep seeing agents use unlink instead of rm, or spawn a python subprocess to delete files. Every new rule just taught the agent a new workaround.

Ended up flipping the model — instead of blocking bad actions, require proof of safety before any action runs. No proof, no action. Much harder to route around.

Curious if you've tried anything similar.

hrimfaxi 6 hours ago | [-1 more]

What does proof of safety look like in practice? Could you give some examples?

v_CodeSentinal 6 hours ago | [-0 more]

Nothing super fancy. For me “proof” just means the agent has to make its intent explicit in a way I can check before running it.

For example: 1) If it wants to delete a file, it has to output the exact path it thinks it’s deleting. I normalize it and make sure it’s inside the project root. If not, I block it. 2) If it proposes a big change, I require a diff first instead of letting it execute directly. 3) After code changes, I run tests or at least a lint/type check before accepting it.

So it’s less about formal proofs and more about forcing the agent to surface assumptions in a structured way, then verifying those assumptions mechanically.

Still hacky, but it reduced the “creative workaround” behavior a lot.