← Glossary Glossary · AI Alignment Architecture

AI Alignment Architecture

AI alignment architecture is the practice of building alignment constraints into the structure of an AI agent system — at every protocol boundary, every delegation step, every tier transition — rather than applying alignment as a wrapper at the model's output edge.

The dominant model for AI alignment operates at the level of the model: shape the model's values and tendencies during training, then apply output filters at the edge to catch what training didn't reach. This is alignment as wrapper — the constraint is applied after the agent has decided what to do, at the point where its output is reviewed before delivery.

Architecture-level alignment is different. The constraint is not applied after the decision; it is built into the decision-making structure itself. Every tier boundary, every delegation handoff, every tool call is a point where alignment can be verified — and must be, because by the time output reaches the edge filter, the consequential decisions have already been made and (in many cases) already executed.

The wrapper problem

Alignment as wrapper fails at two points in autonomous agent systems. First, it operates on outputs, not actions. An agent that executes a database deletion and then returns a summary of what it did has already acted before the wrapper runs. The wrapper can flag the summary as concerning, but the deletion has occurred.

Second, it operates on a single output, not on a session. A ten-hop delegation chain produces ten individual outputs, each of which might pass an output-level filter, while the cumulative effect of those ten outputs — the session's overall trajectory — drifts far from the original intent. Wrapping each output doesn't catch session-level drift.

Architecture-level alignment in practice

Architecture-level alignment means that alignment constraints are enforced at the boundaries where decisions are made and actions are authorized. Before a tool call fires, it is classified. Before an agent is delegated to, its scope is verified. Before an output is sent downstream, its grounding is checked. The constraint is not applied once at the edge; it is applied at every edge, throughout the system.

This is what Reshimu means by the inversion doctrine: governance flows down through the hierarchy to enable execution, not constrain from above. The implementation layer is the reason for the governance hierarchy — governance exists to serve reliable execution, not to limit it. Every guardrail is an integrity validator, not a gate.

AI Alignment Architecture

The wrapper problem

Architecture-level alignment in practice

Related