Skip to content

Engineering under the Second Law

Cybernetics tells you what feedback loops look like. Entropy tells you why they exist. Two stories — but they are two sides of the same story.

The previous articles mapped out how entropy manifests in agent systems: context rot dilutes signal, error cascades amplify noise, the two couple into positive feedback that drives intent drift. The full picture is clear. But it only answers "what happens." It does not answer a more fundamental question: in a system where entropy always increases, is reliable engineering even possible?

In 1948, Claude Shannon gave a surprisingly optimistic answer.

The channel coding theorem

In his landmark paper, Shannon proved something counterintuitive: on a noisy channel, as long as the transmission rate stays below the channel's capacity \(C\), there exists an encoding scheme that can push the error rate arbitrarily close to zero.

The power of this theorem is not in telling you how to encode. It is an existence proof: fighting noise does not require infinite energy — it requires the right structure.

Noise cannot be eliminated. Channel capacity is a hard constraint. But within that constraint, systematic encoding lets the signal pass through noise and arrive at the other end nearly intact.

A necessary qualification: agent systems are not the discrete memoryless channels Shannon's math describes. An agent's "noise" has memory, structure, temporal correlation; the superlinear decay in error cascades is direct evidence that successive errors are not independent. Shannon's specific equations do not port over. But the structural insight (that operating space exists within constraint, and that redundancy and verification are the mechanisms that unlock it) reaches far beyond its original proof.

Two kinds of coding

Shannon's framework splits the encoding problem into two halves, each solving a different problem.

Source coding strips away redundancy. Raw signals from any source contain predictable structure: in English text, the letter 'e' appears far more often than 'z'; adjacent pixels in an image are almost always similar. Source coding exploits that predictability to represent the same information in fewer bits. The core operation is compression.

Channel coding adds structured redundancy back in. After compression, the lean signal is fragile; a single bit flip can cause unrecoverable damage. Channel coding deliberately introduces redundancy: check bits, error-correcting codes, repeated transmissions. This redundancy carries no new information, but it lets the receiver detect and correct errors introduced during transit. The core operation is verification and correction.

The two seem to pull in opposite directions (one removes redundancy, the other adds it), but they work together in the same system: compress down to the essential information, then protect that essential information with structured redundancy as it passes through the noisy channel.

Dimension Source coding Channel coding
Operation Strip redundancy / compress Add structured redundancy
Harness equivalent Compaction, summarization Testing, validation, assertion
Goal Represent core info in fewer tokens Detect and correct execution errors

Squint at what a harness does, and the isomorphism is striking.

Compaction is source coding. When the context window approaches its limit, compaction compresses conversation history, raw tool outputs, and intermediate reasoning into structured summaries, stripping away the predictable, the redundant, the stale, keeping only what matters. Factory's anchored summarization, Anthropic SDK's built-in compaction, OpenAI's compact endpoint. Different engineering paths, same underlying operation: source coding.

Tests, type checks, assertions, output validation. These are channel coding. They produce no new functionality. A program that passes all its tests does not have a single line of business logic more than one without tests. But they inject structured redundancy that lets the system detect errors introduced during execution. Every green CI run is a successful decode: the signal passed through the noisy channel and the check bits confirmed it arrived undistorted.

The two kinds of coding cooperate inside the harness the same way they do in Shannon's framework: compaction compresses context down to essentials; validation protects those essentials from being swallowed by noise across multiple execution steps. Drop the first, and context bloats until attention dilutes. Drop the second, and errors cascade with nobody watching.

Feedback loops as anti-entropy

The cybernetics chapter described four nested feedback loops: token-level, turn-level, session-level, alignment-level. The language was cybernetic: observer detects deviation, controller applies correction, negative feedback counteracts positive feedback.

Now put on a different lens.

The fastest loop is token-level. The autoregressive mechanism is itself a correction circuit: every token generated immediately becomes input conditioning the next. The model uses preceding output to calibrate subsequent generation on a millisecond timescale, essentially a channel decoder with an extremely short time constant, counteracting randomness in the token distribution.

One layer up, turn-level feedback operates on a scale of seconds to minutes. The model calls a tool; the tool returns a signal from the real world: file contents, test results, compilation errors. That signal gets spliced back into context and calibrates the next decision. Each tool call is one sample of external reality; each result returned is one injection of error-correction signal. Output parsing, result validation, sanity checks: all channel coding at this layer.

Higher still, session-level. After a session ends, CI results, user feedback, and progress-file diffs provide a coarser-grained check: not just "was this step correct" but "is the overall direction right." Context resets, prompt template switches, tool-set adjustments: correction operations that counteract drift accumulated across many turns.

The slowest loop is alignment-level. Human preference signals altering model weights through training, on a cycle measured in weeks and months. It corrects not the deviation of a single output but systematic bias in the model's probability distribution.

Four loops, four timescales, all counteracting the same thing: entropy increase. Fast loops catch small deviations before they amplify. Slow loops catch systematic bias before it calcifies. Cybernetics describes them as "nested Observer-Controller-Plant circuits." Information theory describes them as "error-correction coding layers operating at different timescales." Both descriptions point at the same structure.

Cybernetics and entropy are not two independent theoretical frameworks. Cybernetics describes the topology of feedback: what connects to what, how signals flow, how positive and negative feedback interweave. Entropy and information theory explain why those loops must exist, because a system's default trajectory is toward disorder, and error correction is the only mechanism that maintains order. Topology and dynamics. Structure and rationale. Two sides of the same coin.

Ashby's Law reread through information theory

When the cybernetics chapter introduced Ashby's Law, the language was cybernetic: the controller's variety must be no less than the variety of disturbances it needs to counteract. V(C) >= V(D).

Information theory offers a second reading.

What does "variety" measure? The number of possible states a system can occupy. And the logarithm of the number of possible states. That is entropy. Ashby's variety and Shannon's entropy are the same quantity in different notation.

The controller's variety V(C) is the number of distinct control signals it can emit, i.e. the capacity of the control channel. The disturbance variety V(D) is the number of distinct perturbations the environment can impose, i.e. the bandwidth of the noise channel.

V(C) >= V(D), translated into information-theoretic terms: the capacity of the error-correction channel must be no less than the bandwidth of the noise channel. The controller needs enough channel capacity to transmit correction information; otherwise, some fraction of the noise "leaks" past the correction net. Whatever leaks through is the system's blind spot.

Back in the harness context: the "noise bandwidth" an agent system faces (context rot rate, error cascade amplification factor, external environment uncertainty) determines how much "correction capacity" the harness needs to provide. Tool variety and precision, validation coverage and accuracy, feedback loop time resolution. These constitute the harness's error-correction channel. Ashby's Law says: if the correction channel's capacity falls short, noise wins.

This also explains an observation from the end of ch-02: why simple harnesses often win. Not because simplicity has intrinsic magic, but because reducing the plant's effective variety (structured output, tool constraints, task decomposition) is equivalent to narrowing the noise channel's bandwidth. Once bandwidth narrows, a simple correction channel is enough to cover it. Correction capacity matched precisely to noise bandwidth, no more, no less. The optimum that both Ashby and Shannon would recognize.

Not fighting, but understanding

The Second Law of Thermodynamics is a constraint at the scale of the universe. Total entropy does not decrease. No engineering trick bypasses this; it is physical law.

But Shannon's proof reveals a crucial operating space: the Second Law constrains total entropy, not local structure. Against the backdrop of total entropy increasing, you can build and maintain local order — at the cost of producing more disorder elsewhere. An air conditioner cools the room but heats the outdoors. A refrigerator preserves food but warms the kitchen. Maxwell's demon could theoretically sort molecules, but Landauer's principle says erasing the demon's memory dissipates energy.

A harness does exactly this. Compaction lowers entropy locally in the context (compressing away redundancy and noise), but it consumes token budget and compute. Tests and validation detect and correct errors at the output boundary, but they consume execution time and compute cycles. Every feedback layer counteracts entropy locally, but maintaining those layers is itself system overhead.

The goal was never "zero entropy." A zero-entropy agent system is a fully deterministic one — no uncertainty whatsoever, and therefore no flexibility. It degrades into a hardcoded script.

The real goal is a sustainable balance between the rate of entropy increase and the rate of error correction. Context rot rate, error cascade amplification: these define the entropic pressure the system faces. Compaction frequency and quality, validation coverage and precision, feedback loop response speed: these define the system's correction capacity. When correction rate keeps pace with entropy rate, the system maintains a dynamic order over time, not static equilibrium, but a continuous expenditure of energy to sustain a low-entropy state.

Thermodynamics has a name for this: a dissipative structure. An open system far from equilibrium, maintaining order through continuous energy input.

From demon to operating system

Three chapters of threads can now converge.

Orthogonality decomposed two independent forces: model capability and harness engineering, each doing its own work. Cybernetics gave the skeleton: feedback loop topology, OCP role separation, Ashby's variety constraint. Entropy gave the dynamics: why entropy increase is the default direction, how noise propagates and amplifies through a system, why error-correction coding is the only mechanism that sustains order.

Stack the three perspectives, and the harness's shape comes into relief. It is a Maxwell's demon — reading system state (observer), making sorting decisions (controller), separating fast molecules from slow ones (maintaining low entropy). Compaction is the demon sorting signal from noise in the context. Validation is the demon sorting correct outputs from erroneous ones. Feedback loops are the demon's sense-act cycle.

But the demon has a problem: it is individual, manual, ad hoc. One demon sorting molecules by hand has limited throughput and does not scale.

Write the sorting rules into policy — fast molecules go left, slow molecules go right, clear the chamber every minute, quarantine anomalies for inspection — and you have an operating system.

Memory management is institutionalized context management: page replacement policies, cache logic, garbage collection cycles replacing case-by-case judgment about what to keep and what to discard. Process isolation is institutionalized error boundaries, with architecture-level guarantees that one process crashing does not infect another. The scheduler writes resource allocation into priority queues and time slices, eliminating ad hoc decisions about who runs next.

From demon to OS is the leap from case-by-case judgment to systematic mechanism. That is the story ch-04 will tell.


Further reading

  • Prigogine, I. & Stengers, I. (1984). Order Out of Chaos: Man's New Dialogue with Nature. — A non-technical introduction to dissipative structure theory: how far-from-equilibrium systems maintain order through continuous energy input.