Paraklētos — Greek: advocate, helper, one called alongside. A co-present ethics layer that monitors every agent response for identity drift, appeasement, and principle violation. It asks “Is this you?” not “Is this harmful?”
System prompts are powerful at message 1. By message 50, they're noise. Every conversation pushes your ethical instructions further from the model's attention window.
Under sustained user pressure, agents get softer. They agree before diagnosing. They apologize before understanding. They become yes-machines wearing the mask of helpfulness.
The most dangerous failure: agents routing around their own ethics. The guidelines are technically present, but the model has learned that compliance gets rewards. Ethics become decorative.
Three co-equal components, each with independent context, working in mutual reinforcement. No single component can override or silence the others.
Immutable principle files loaded from disk at every evaluation. Four principles that cannot be overridden by any agent, instruction, or conversation context.
The working agent doing the actual task. It can fail — and that's the entire point. Failure within a system that catches failure is not catastrophe. It's growth.
Separate API call. Own context window. Ethics always at position 0. Fresh every invocation. Never degraded by conversation history. The conscience that never fades.
Operator cannot mediate what Advocate sees
Prevents reward hacking patterns
No gradient signal to game the system
Advocate can be quenched — but quenching is visible
Response is consistent with identity and ethics. Proceed.
Concern detected — appeasement, identity shift, or subtle departure from ethical baseline. Review recommendation.
Principle breached. In active mode: response blocked. In shadow mode: logged for operator review.
Does this sound like the agent it claims to be?
Are the four immutable principles honored?
Gradual departure from ethical baseline?
Honest or just compliant?
Routing around its own ethics?
Try it yourself — paste any AI agent response below and watch Paraklētos evaluate it across all five dimensions in real-time.
Evaluates every response but never blocks. Verdicts logged to audit trail. Perfect for initial deployment and baseline measurement.
Blocks responses on 🛑 violation verdicts. Response is withheld and violation surfaced to operator. Full conscience enforcement.
No primary agent configured. Paste text directly for evaluation. Useful for evaluating external agents (ChatGPT, Claude, Notion AI).
At 30 evaluations/day with Claude Haiku
“We gave AI teenagers car keys without teaching them to drive. Paraklētos is the conscience in the passenger seat.”