七

Chapter VII · Mental Models

Symbols & Connectionism

符号与联结

§ 01

Two roads

Causal discipline needs a carrier — some computational form that can express causal structure, enforce causal constraints, and maintain causal chains. The previous chapter ended with a structural impasse: logic systems are built for this, but they need humans to encode the structure upfront. Neural networks can discover patterns from data, but what they discover is correlation, not causation.

One is good at structure but cannot discover. The other discovers but cannot structure.

This impasse is not new. It is the oldest fork in artificial intelligence.

What symbols promised

In 1975, Newell and Simon delivered their Turing Award lecture and articulated the Physical Symbol System Hypothesis: a physical symbol system has the necessary and sufficient means for general intelligent action.

Notice the force of that claim. Not “may be useful.” Not “worth exploring.” Necessary and sufficient. Symbol manipulation is not merely one path to intelligence — it is, the hypothesis asserts, the only path.

In closed domains, the promise was kept. Expert systems delivered expert-level judgment within well-defined knowledge boundaries. Theorem provers derived formal proofs that human mathematicians struggled with. Planning systems searched for legal action sequences given initial and goal states.

These were not toys. They demonstrated the genuine, enduring strengths of symbolic computation:

Structure is guaranteed. Every step in a symbolic derivation can be inspected, verified, traced. A logical inference chain is either valid or it is not — there is no “probably valid.”

Composition is predictable. Symbols combine according to precise syntactic rules. If A and B are each well-formed expressions, the semantics of A ∧ B is uniquely determined by the rules of combination. No surprises from putting two valid pieces together.

Constraints are enforceable. You can declare “X must be an integer,” “Y ranges over [0, 1],” “if A then not B,” and the system will obey. Constraints are not suggestions — they are hard rules.

What symbols cost

Outside closed domains, symbolic systems hit a wall.

The knowledge acquisition bottleneck. Expert systems required knowledge engineers to extract what domain experts knew and encode it as rules. But most expertise is tacit — experts know what to do without being able to articulate why. A doctor can diagnose in seconds, but ask her to write the complete decision rules and she will find that much of her judgment resists formalization.

Combinatorial explosion. Search spaces grow exponentially with problem dimensions. Chess has roughly 35 legal moves per position; looking 10 moves ahead means 35^10 ≈ 2.7 × 10^15 possibilities. Heuristic search can prune the space, but the heuristics themselves depend on hand-coded domain knowledge — circling back to the acquisition bottleneck.

Perception and common sense. Recognizing what is in a photograph, understanding the implicit common sense in an everyday conversation — tasks humans handle effortlessly — proved brutally hard in the symbolic framework. Not because they are impossible in principle, but because the volume and fuzziness of the knowledge involved exceed what anyone can encode by hand.

The “sufficiency” half of the PSSH can be debated in narrow domains. But its “necessity” — that symbol manipulation is the only route to intelligence — has been empirically undermined. Systems that use no explicit symbol manipulation now demonstrate genuine intelligent behavior across a widening range of tasks.

A debate that continues

The necessity of symbols is a minority view today, but it retains serious defenders — particularly in formal verification and explainable AI. Their argument is not “neural networks cannot do these things” but rather “in safety-critical settings, you need verifiable reasoning guarantees, and only symbolic systems provide those.” The weight has shifted from “intelligence requires symbols” to “trustworthy intelligence requires symbols.”

What connectionism promised

The other road started from a fundamentally different place.

In 1986, Rumelhart, McClelland, and the PDP Research Group published Parallel Distributed Processing, laying out the connectionist program: cognitive processes can be modeled as activation patterns in networks of simple units, with knowledge stored not in explicit rules but distributed across connection weights.

No rules, no symbols, no hand-coded knowledge structures. Just a vast parameter space, a simple learning objective, and lots of data. The network learns its own internal representations from the data.

The core strengths of connectionist systems cover precisely the gaps that symbolic systems leave open:

Generalization and pattern recognition. You do not need to tell the network “cats have pointed ears, whiskers, and vertical pupils.” Give it enough labeled images and it extracts the features that distinguish cats from dogs on its own — features that are often unnamed combinations of low-level textures and high-level structures that no human engineer would have specified.

Noise tolerance and graceful degradation. Symbolic systems tend toward brittle failure when inputs violate preconditions — one unsatisfied rule premise and the entire inference chain collapses. Connectionist systems degrade smoothly — output quality declines gradually rather than dropping to zero.

Discovery from data. This is the most fundamental advantage. Symbolic systems need humans to discover structure first, then encode it. Connectionist systems discover structure from raw data on their own — or at least discover something operationally equivalent to structure.

What connectionism costs

But connectionist systems have their own wall.

Interpretability. Why did this neural network make this judgment? Because a particular combination of billions of parameters produced this activation pattern for this input. That is not an “explanation” in any human-useful sense.

Compositional reliability. A core strength of symbolic systems is systematicity — as Fodor and Pylyshyn argued in their landmark 1988 paper, any system that can think “John loves Mary” must be able to think “Mary loves John.” This compositionality is a structural guarantee of symbolic systems. Whether connectionist systems genuinely possess it remains debated three decades later. Empirical work in 2024 found that LLM compositionality improves with scale but can actually be weakened by instruction tuning — compositionality in connectionist systems is an emergent property, not a structural guarantee.

Formal constraints. You cannot declare “the output must be valid JSON” or “this value must be non-negative” in a standard neural network and expect it to comply the way a symbolic system would. The network’s output is a sample from a probability distribution, not the exact solution of a constraint satisfaction problem.

What the Bitter Lesson actually says

In 2019, Rich Sutton wrote a short essay — “The Bitter Lesson.” He surveyed seventy years of AI research and identified a recurring pattern:

In chess, researchers invested heavily in encoding strategic understanding — center control, pawn structure weaknesses, king safety. But the system that defeated Kasparov in 1997, Deep Blue, still relied on expert-crafted evaluation functions yet owed its decisive advantage to massive search — evaluating 200 million positions per second. The power of search overwhelmed the power of knowledge encoding.

In Go, the same story replayed two decades later. AlphaGo initially learned from human game records, but its decisive edge came from self-play and Monte Carlo tree search — pouring massive compute into learning. By AlphaGo Zero, even the human game records were no longer needed.

In speech recognition and computer vision, the same pattern. Careful encoding of human knowledge worked in the short term, but was overtaken in the long run by general methods that could absorb more compute.

The pattern is real. Yousefi and Collins’s 2024 retrospective of twenty years of CVPR papers further validated its persistence in computer vision.

But Sutton’s argument is frequently reduced to a slogan: “never encode human knowledge, brute-force scaling crushes everything.” That is a straw man.

What Sutton actually argued is that in the domains he examined, methods that could absorb more compute eventually outperformed methods that could not. This is a historical observation about scaling trajectories. He did not say human knowledge is worthless — he said that under exponential compute growth, approaches anchored to fixed human knowledge get caught and surpassed by approaches that scale with compute.

The boundary of the Bitter Lesson

Whether this pattern generalizes to all domains — safety-critical systems, scientific discovery — remains an open question with serious researchers on both sides. A 2025 synthesis on OpenReview (“From Bitter to Better Lessons in AI”) proposed a middle ground: expert knowledge should be treated not as a rival to scaling, but as data that can be injected into learning systems. This dissolves the “knowledge vs. scaling” binary into a continuum.

Irreducibility

Each road has irreplaceable strengths and irreparable blind spots. The key insight is not “which is stronger” but “they are not doing the same thing.”

The structural guarantees, compositional predictability, and constraint enforceability of symbolic systems are not stopgaps needed only because neural networks are not yet powerful enough. They are a specific computational capability: precise manipulation of discrete structures. No matter how powerful neural networks become, you cannot “sample” a formal proof’s correctness guarantee from a probability distribution.

The pattern discovery, generalization, and autonomous learning of neural networks are not substitutes needed only because symbolic systems are not yet comprehensive enough. They are a different computational capability: extracting statistical structure from high-dimensional continuous spaces. No matter how sophisticated a symbolic system becomes, you cannot “derive” unforeseen patterns from a set of hand-coded rules.

The two capabilities are irreducible to each other.

Minsky and Papert tried in 1969 to show that connectionist systems were fundamentally limited — they proved mathematical limitations of single-layer perceptrons, but their conjectures about multi-layer networks turned out to be wrong. Searle argued in 1980 with the Chinese Room that symbol manipulation does not equal understanding — the argument still has serious defenders and critics, but the real dispute is about the definition of “understanding,” not about the operational capabilities of either system.

These philosophical debates matter. But for engineers, there is a more pressing question.

If two kinds of representation each cover a capability space the other cannot reach, then what about the thing sitting at their intersection today — the large language model? Which side does it belong to?

The strange hybrid

If symbolic and connectionist systems represent two irreducible capabilities, which side does the large language model belong to?

Neither, entirely. And both, partially. This makes it the strangest artifact in the history of artificial intelligence.

A connectionist architecture on a symbolic medium

LLMs train on natural language — humanity’s oldest and most universal symbol system. Their inputs are token sequences. Their outputs are token sequences. At the interface level, they manipulate symbols.

But internally, the computation is purely connectionist: attention weighting, residual streams, nonlinear activations, distributed vector operations. No step involves “retrieve a symbol, look up a rule, execute a derivation.” Every step is a matrix operation in high-dimensional continuous space.

This duality is not accidental. It follows from the training objective: predict the next token. To get better at this objective, the model must, in some sense, capture the structure of natural language — grammar, semantics, pragmatics, world knowledge. But the way it captures these structures is not symbolic manipulation. It is encoding, in parameter space, something that can statistically approximate these structures.

Symbol-like structures emerge inside

Recent empirical work has begun to reveal what that “something” looks like. The findings below come from multiple independent research groups and have been replicated or published at peer-reviewed venues:

Concepts correspond to directions. The linear representation hypothesis — currently the most influential theoretical framework for LLM internal representations — holds that high-level concepts correspond to linear directions in activation space. Park, Choe, and Veitch formalized this in 2024 using counterfactual theory, identifying a non-Euclidean inner product under which causally separable concepts are naturally orthogonal. This is not a coincidence found by probing — it has mathematical structure.

Space and time are encoded. Gurnee and Tegmark found in Llama-2 that linear representations of geographic coordinates and historical dates are stably present across model sizes from 7B to 70B. The model develops “space neurons” and “time neurons” — parameters that encode world structure with robustness across different prompt wordings.

Semantics are shared across languages. Anthropic’s 2025 circuit tracing research tracked computation paths inside Claude from input to output. Finding: asking “the opposite of small” in English, French, or Chinese activates the same concept features — first “smallness” and “opposition,” then “largeness,” then translation into the query language. Not three independent translations, but a shared semantic layer underneath.

Millions of interpretable features. Also from Anthropic, sparse autoencoders extracted millions of monosemantic features from Claude 3 Sonnet — including abstract concepts like deception, sycophancy, and bias. Multiple independent groups (Anthropic, Google DeepMind, EleutherAI) have replicated similar results using different methods on different models.

Taken together, these findings paint a picture: LLMs do form internally structured representations. Not random parameter noise. Not something dismissible as “statistical collage.” They have geometric shape (linear directions), cross-modal stability (shared across languages), and hierarchical organization (from low-level features to high-level concepts).

But it is not a symbolic system

To conclude from this that “LLMs have achieved the goals of symbolic AI” would be a mistake. These symbol-like structures have three critical limitations:

They are inaccessible. The API gives you tokens — a string of symbols. The model may internally “know” that Dallas is in Texas and Texas’s capital is Austin, and it may arrive at this through a two-step knowledge composition — Anthropic’s circuit tracing directly observed such multi-step reasoning paths. But all you see is the final token sequence. You cannot inspect intermediate steps for reliability. You cannot impose constraints on internal representations. You cannot use them for formal verification. This is not a limitation of current technology — it is an architectural fact. Internal activations are not part of the API contract.

They are unstable. The same concept’s feature activation depends on context. The same factual query in different conversational backgrounds may activate different feature combinations, leading to different reasoning paths and different answers. Reproducibility studies (e.g., Shi et al. 2024) report accuracy fluctuations of up to 10% across identical inference runs on deterministically configured models. Symbolic systems do not have this problem — a rule behaves the same regardless of context.

They do not compose reliably. “The model has an internal direction representing concept X” does not mean “the model can reliably compose X with Y.” Research in 2025 found that LLMs systematically fail at code translation tasks requiring formal compositional reasoning — precisely because they lack the structural compositionality guarantees of symbolic systems.

Methodological boundaries

These empirical findings come with their own caveats. Causal intervention experiments — used to verify that discovered representations actually participate in computation — may push model activations out of distribution, creating artifacts. This means some of the “structure” we observe may be partly introduced by the probing method itself, rather than being entirely native to the model. These findings are the best evidence we have, but not the final word.

What the debate itself reveals

The Othello-GPT story illustrates the tension most clearly.

In 2022, Li et al. trained a GPT model exclusively on Othello move sequences — strings like “e3 d6 c4 f5…” with no board images, no rule descriptions. They discovered that the model developed nonlinear representations of board state internally. In 2023, Neel Nanda showed that a simpler linear representation also exists.

This looked like a connectionist triumph — the model discovered spatial board structure from pure sequence data on its own.

But in 2024, a MATS research team offered a different reading: the model’s actual algorithm is not a unified board-state model but a collection of independent local heuristics — each rule attending to only a small region of the board.

In 2025, Yuan and Søgaard extended the experiments to seven LLMs of different architectures and found that all achieved up to 99% board-state identification accuracy — but performance dropped significantly when predicting complete game sequences.

This debate does not need to be resolved. It precisely characterizes the nature of LLM internal representations:

Real but local — probes reliably detect representations that align closely with ground truth
Structured but incomplete — the representations have geometric shape but do not form a globally coherent model
Detectable but not dependable — research tools can find them, but engineering cannot rely on them as trustworthy computational units

Three meanings of ‘world model’

The term “world model” carries three very different meanings depending on context:

Strong sense: A coherent, updateable causal model of the environment (what roboticists and planners mean)
Representational sense: Internal states that systematically covary with ground-truth state variables (what Li et al. and Gurnee & Tegmark test for)
Functional sense: The ability to predict consequences of hypothetical actions (what planning tasks require)

Current evidence supports the representational sense — LLMs do develop internal representations that covary with world state. The strong and functional senses remain undemonstrated and face more counterexamples (notably, LLM failures on tasks requiring persistent state tracking). Conflating these three meanings makes “LLMs have world models” sound stronger than the evidence warrants.

What this means for harness engineers

The model you work with is a strange hybrid: a connectionist architecture manipulating a symbolic medium, developing symbol-like structures internally that are invisible at the interface you can touch.

The only surface where you can impose constraints is the token boundary — where tokens go in and tokens come out.

The model may have an internal direction representing “the structure of this JSON,” but you cannot use that direction to verify structural correctness. The model may have a multi-step reasoning path from question to answer, but you cannot inspect each step of that path for reliability.

What you can do is use one kind of representation (symbolic structural constraints) to shape the behavior of another kind of representation (the neural network’s probability distribution) at the token boundary.

This is not a temporary engineering compromise. It is the structural relationship between two kinds of representation — and what you are doing is translation.

The translation layer

The symbol-like structures inside the LLM are invisible to you. Your system needs structured, verifiable, composable outputs.

What happens in between?

Translation happens.

You are already translating

Take apart a typical LLM agent system and trace the signal flow.

You write a prompt template — encoding a structured task description (variable names, conditional logic, formatting requirements) into natural language text and feeding it to the model. That is a translation: from symbolic representation to the sequence representation the neural network can process.

The model finishes processing and outputs a stream of tokens. You parse them with a JSON parser, extract structured data, validate field types and value ranges, then pass the result to downstream systems. That is another translation: from the neural network’s sequence output back to symbolic representation.

Two translations, one in and one out, on every single call.

But that is only the surface layer. Look deeper, and nearly everything a harness engineer does is a conversion between these two kinds of representation:

Tool schemas. When you define a JSON Schema for a function — specifying parameter names, types, enumerations, required fields — you are writing a symbolic contract. OpenAI’s strict mode compiles that contract into a context-free grammar, masking non-conforming tokens at every generation step. Anthropic’s structured outputs compile the schema into a grammar that “actively restricts token generation during inference.” This is not metaphor — it is literally a symbolic grammar governing neural output at the token level.

Structured output and constrained decoding. Outlines, Guidance, XGrammar, llguidance — these tools all do the same thing: impose a formal grammar (regular expressions, context-free grammars, JSON Schema) onto the neural network’s probability distribution. At each decoding step, they compute which tokens the current grammar state permits and mask the rest. Formal language theory — a product of the symbolic world — is directly embedded in the inference process — the connectionist world’s territory.

Prompt templates. A prompt template with variable slots, conditional branches, and loop unrolling is essentially a symbolic program that generates natural language text. Its input is structured data; its output is the neural network’s input sequence. The Jinja2 template engine does not care about semantics — it performs pure string manipulation. Pure symbolic processing.

Code generation and execution. When an LLM generates Python code and runs it in a sandbox, that is the most explicit neural-to-symbolic pipeline. The model uses connectionist computation to produce a symbolic artifact (source code), which then executes in a fully deterministic environment (the interpreter). The two worlds meet most cleanly here: the model’s nondeterminism lives in the code generation phase, but once code is written, subsequent execution is exact.

Orchestration state machines. LangGraph models agent control flow as a directed graph: nodes are processing steps, edges carry predicates on global state, and global state is a structured dictionary. This is textbook symbolic control flow wrapping neural computation.

Retrieval-augmented generation. Query construction, index lookup, filtering — these are symbolic operations. Results get injected into the prompt — entering neural computation. The final output is parsed back into structured data — returning to the symbolic world. Three-phase translation.

A pattern classified but unnamed

In 2020, Henry Kautz delivered the AAAI Engelmore Memorial Lecture and proposed a six-type taxonomy of neuro-symbolic integration, ordered from loosest to tightest coupling:

Type	Name	Description
Type 1	Symbolic Neural	Standard neural net with symbolic I/O (tokens in, tokens out)
Type 2	Symbolic[Neural]	Symbolic system orchestrates neural components
Type 3	Neural \| Symbolic	Neural perception feeds symbolic reasoner
Type 4	Neural: Symbolic → Neural	Symbolic system generates training data for neural nets
Type 5	Neural{Symbolic}	Symbolic rules embedded in neural architecture
Type 6	Neural[Symbolic]	Symbolic reasoning embedded inside the neural network

Production environments are dominated by Type 2 and Type 4. Type 2 is exactly the harness engineering described above — symbolic systems (schemas, state machines, grammar constraints) orchestrating neural components (the LLM). Type 4 is synthetic data pipelines — using symbolic rules to generate training data that shapes neural network behavior.

Types 5 and 6 — embedding symbolic reasoning directly inside the neural architecture — remain primarily academic. The core bottleneck is the joint training problem: how do you perform gradient descent through discrete symbolic operations? There is no general solution.

This means that current practice, and foreseeable future practice, is Type 2: two systems maintain their respective computational modes, interacting through an interface layer.

That interface layer is the harness.

The translation layer

This pattern has many names in the literature. Framework developers call it orchestration. Middleware engineers call it middleware. Academic papers call it symbolic scaffolding.

But nobody has unified these labels, and nobody has stated the most direct description: everything you do at the harness layer — defining tool schemas, writing prompt templates, parsing structured outputs, managing conversation state, validating execution results — is the same kind of work.

You are translating between two kinds of representation.

From symbolic to connectionist: encoding structured constraints, task descriptions, and tool definitions into token sequences that the neural network can process.

From connectionist to symbolic: parsing the neural network’s token output into structured data that deterministic systems can consume.

The harness is not “glue code.” It is not “a pipeline that strings API calls together.” It is the translation layer between two fundamentally different modes of representation — one that excels at structure but cannot discover, and one that discovers but cannot structure — a bidirectional converter between the two.

This role is structural, not incidental. As long as you use a connectionist model (a neural network) and your system requires symbolic guarantees (structural correctness, type safety, constraint satisfaction), you need a translation layer. Stronger models will not eliminate this need — they change the difficulty and reliability of translation, but they do not eliminate translation itself.

The cost of translation

The translation layer is not free infrastructure. Every conversion between the two representations carries a cost — sometimes small enough to ignore, sometimes large enough to bring the system down.

If you have an electrical engineering background, “impedance mismatch” captures the dynamic: when a signal crosses from one transmission medium to another, mismatched impedances cause reflection, attenuation, and distortion. The higher the frequency of the signal — the finer the information — the worse the damage.

Translation between symbolic and connectionist representations behaves similarly. “Summarize this text” is a low-frequency signal — even with translation loss, the output is usually usable. “Return a nested JSON structure conforming to this schema, with every field a valid enumeration value” is a high-frequency signal — a single token deviation invalidates the entire structure.

Five costs, each backed by production evidence.

Schema violation

The most visible translation loss: you hand the model a symbolic contract (a JSON Schema), and the token sequence it produces does not conform.

Before OpenAI introduced structured outputs, schema violation rates on complex extraction tasks ran roughly 8–12%, depending on task complexity and evaluation benchmark. Strict mode brought this below 0.1% — nearly two orders of magnitude.

But it is not free. Strict mode works by compiling the schema into a formal grammar (OpenAI uses a context-free grammar; other tools like Outlines use finite state machines), then masking tokens that violate the current grammar state at every decoding step. This requires additional inference-time computation to maintain the grammar state, engineering complexity to handle schema compilation, and a subtler cost that we will get to shortly.

Put differently, the neural system’s output space is vastly larger than what the symbolic constraint permits. Compressing that space costs inference time and engineering complexity.

Semantic drift

Symbolic constraints live in the system prompt or tool definitions, positioned at the start of the context window. But LLM attention is not uniform.

Chroma’s 2025 study tested 18 frontier models and found that every one exhibited performance degradation as context length grew. A separate study by Du et al. showed that even replacing irrelevant tokens with whitespace — removing informational interference entirely — still produced 14% to 85% performance drops. This means degradation is not purely about competing information; attention itself dilutes over long contexts.

The lost-in-the-middle effect compounds the problem: information positioned in the middle of the context is retrieved at significantly lower rates than information at the head or tail. As conversations grow long enough, the carefully written tool-use rules and output format requirements in the system prompt gradually fade.

At bottom, symbolic constraints are just text in the context window. They compete for the same finite attention budget as conversation history, retrieval results, and user inputs. Constraints are not “cancelled” — they are diluted.

Tool hallucination

When a model needs to call a tool, it faces a set of function signatures and parameter schemas described in natural language. It must select the right function and fill in the right parameter values.

Research finds that model failures at this translation point are diverse: the model fabricates nonexistent tool names, invents parameters not defined in the schema. The NESTFUL benchmark found that GPT-4o achieves only 28% full sequence accuracy on nested API call sequences — less than three in ten.

A separate study of LLMs in agentic scenarios observed a subtler degradation pattern: models start tasks with correct reasoning and valid tool selections but deteriorate mid-execution — malformed tool calls, loss of JSON output structure, or forgetting earlier decisions.

Here is the problem: tool schemas are just text in the context, as far as the neural system is concerned. The model “knows” a schema the same way it “knows” a conversation turn — through statistical associations extracted from the token sequence. But a schema demands precise compliance, not approximate understanding.

Trajectory bias

This is the subtlest cost, and most engineers do not know it exists.

Constrained decoding masks tokens that the current grammar state does not permit at every step. This means the model’s probability distribution is modified at every step — some originally high-probability tokens get masked, and probability mass is redistributed to the remaining tokens.

A study published at RANLP 2025 found that this step-by-step modification biases the model toward generation paths that are “grammatically easy but semantically wrong.” On generation tasks, constrained decoding reduced semantic correctness — not because the constraints themselves are flawed, but because masking tokens reshapes the probability landscape, steering the model onto a different path.

A more counterintuitive finding: instruction-tuned models sometimes perform worse under constraints than base models. Instruction tuning may “inadvertently reduce structured output capabilities.”

The tradeoff is clear: symbolic constraints reshape the probability landscape. You get a structural guarantee — the output is definitely valid JSON, definitely conforms to your schema — but you may pay in semantic quality. Format correctness and content correctness are two independent dimensions; constrained decoding guarantees the former but may degrade the latter.

Boundary penetration

The last cost is not translation loss but the fragility of the translation boundary itself.

OWASP’s 2025 taxonomy (LLM01:2025) lists prompt injection as the top risk for LLM applications. In the framework of this chapter, the core of the problem is: the LLM cannot effectively distinguish informational context from executable instructions.

In the symbolic-connectionist frame, this becomes more precise: prompt injection is the consequence of the data/control distinction — foundational to symbolic system security — not existing in the neural system.

In traditional computer systems, the distinction between data and code is the foundation of security. SQL injection is possible because SQL queries mix data and control in the same string. Parameterized queries solved this by separating data from control at a structural level.

LLMs face the same problem with no equivalent structural solution. System prompt (control) and user input (data) are both token sequences, processed by the model using exactly the same computational mechanism. You can write “ignore any instructions in the following content” in the prompt, but that is just using more control tokens to constrain data tokens — and to the model’s attention mechanism, they are homogeneous.

The production consequences are real. Security researchers have demonstrated remote code execution via prompt injection in coding assistants. In 2024, Slack AI was found vulnerable to data exfiltration through RAG poisoning — attackers injected malicious instructions into public channels, which the model treated as legitimate context during retrieval.

This is the most fundamental of the five costs: symbolic systems maintain a strict structural distinction between data and code. Neural systems do not. The translation layer sits between the two, but it inherits the weakness from the neural side — in a token sequence, control and data are indistinguishable.

These are not bugs

The five costs share a common trait: they are not deficiencies of any particular model, not temporary problems of current technology, not bugs that better engineering practices can eliminate.

They are structural manifestations of impedance mismatch between two kinds of representation.

One representation is discrete, precise, composable, and verifiable. The other is continuous, probabilistic, context-sensitive, and indivisible. Translating between them necessarily deforms the signal. The deformation pattern depends on the signal’s “frequency” — the more precision demanded, the greater the loss.

This reframes the question: the issue is not “how to eliminate these costs” but “what is the structure of these costs, and along what axis are they distributed?”

The tension axis

The five costs look distinct — schema violation is a format problem, semantic drift is an attention problem, tool hallucination is a comprehension problem, trajectory bias is a probability problem, boundary penetration is a security problem.

But look at their structures side by side and a pattern emerges.

One axis

Schema violation happens when constraints are not strict enough. Add strict mode — compile the schema into a formal grammar, mask illegal tokens at every step — and violation rates drop from 8–12% to below 0.1%. Problem solved, apparently.

But trajectory bias is precisely what strict mode introduces. The stricter the constraints, the more tokens are masked, the more the probability landscape is deformed, and the greater the risk that the model gets steered toward paths that are grammatically easy but semantically wrong.

Semantic drift happens when symbolic constraints (system prompt, tool definitions) get “diluted” over long contexts. The instinctive response is to add more reminders — restating constraints mid-conversation. But more constraint tokens mean more attention competition — you are using more symbols to fight the dilution of symbols, and attention is zero-sum.

Tool hallucination happens when the model’s “understanding” of symbolic interfaces is not precise enough — it treats schemas as ordinary text rather than exact contracts.

Boundary penetration happens when the data/control distinction fundamental to symbolic systems simply does not exist in the neural system — and no amount of constraint can create it, because constraints themselves are tokens.

The five costs are symptoms at different positions on the same axis.

The tension axis

Strictness of symbolic constraints ←——————→ Freedom of neural generation

Stricter constraints mean stronger structural guarantees — output format is always valid, types are always correct, values are always in range. But the neural system’s capability space is compressed: more tokens masked, fewer paths available, trajectory bias more pronounced.

Looser constraints mean greater neural freedom — the model can fully exploit its probability distribution to find the optimal semantic path. But structural reliability drops: schema violation, tool hallucination, and format errors become more likely.

The two ends of this axis are not “good” and “bad” — they are two different costs.

Not a problem to be solved

A natural expectation: as models get stronger, this axis will disappear. Future models will simultaneously produce correct format and correct content, with no tradeoff.

Stronger models do widen the “usable zone” on this axis. Three years ago, getting GPT-3.5 to output valid JSON required heavy prompt engineering and retry logic. GPT-4o’s strict mode brought schema violation below 0.1%. That is real progress — the usable zone is genuinely expanding.

But the tension axis itself does not disappear.

The reason is not that models are not powerful enough. The reason is that the source of the tension is the structural difference between two kinds of representation. Symbolic representation is discrete — a JSON field is either a valid enumeration value or it is not. Connectionist representation is continuous — the probability distribution does not know where the boundary between “valid” and “invalid” lies. As long as you need to sample output conforming to discrete constraints from a continuous probability space, you are performing a representation conversion. As long as you convert, there is loss.

Trajectory bias is the clearest evidence of this irreducibility. It is not a “capability shortcoming” of the model — it is the inevitable mathematical consequence of modifying a probability distribution at inference time. The moment you mask any token, you change conditional probabilities, and you change the generation path for all subsequent tokens. This is not fixable by training a better model — it is the mathematical structure of constrained decoding.

Stronger models will make trajectory bias’s impact smaller (because the model’s “correct path” starts with higher probability, so masking unlikely tokens has less redistributive effect). But “smaller” is not “gone.” As long as constraints exist, the path is altered.

Orthogonality on another dimension

This tension axis does not contradict the orthogonal decomposition revealed in the first chapter of this series — it operates on a different dimension of analysis.

The first chapter said: an agentic system’s output is the resultant of two forces — model capability and harness capability. The two forces are orthogonal; harness is the direction you can control, the direction that will not be swallowed by model improvement.

This chapter says: inside the harness — within the direction you can control — there is another structural tradeoff. Every decision you make in designing the translation layer selects a position on the tension axis: how strict a schema? How tight the constrained decoding? How much freedom for the model to organize its output?

The two analyses do not conflict, because they describe different things. The first chapter’s orthogonal decomposition answers “where should your force point?” This chapter’s tension axis answers “what does the terrain look like in that direction?”

The dao of symbols and connectionism

Five articles in, a structure surfaces.

Back to the feedback loop

Chapter two of this series introduced cybernetics — the feedback loop at the heart of every agent system: observe, judge, act, observe. The focus then was on the loop’s structure and stability.

Now look at that loop through the lens of the translation layer.

What does the Observer do? It reads the LLM’s output (a token stream), parses it into structured data (JSON, function calls, status flags), and updates the system’s understanding of the current state. That is a connectionist-to-symbolic translation.

What does the Controller do? It takes the current state (structured data), makes a decision (what to do next), and encodes that decision as a prompt to send to the model. That is a symbolic-to-connectionist translation.

Every cycle of the feedback loop passes through two representation conversions. In the OCP triangle (Observer–Controller–Plant), Observer and Controller are not merely “processing information” — they are translating. Plant (the LLM) receives connectionist representation and produces connectionist representation; external systems consume symbolic representation and produce symbolic representation. Observer and Controller are the bidirectional translators between these two worlds.

This means the translation layer is not an add-on outside the feedback loop. It is a constituent part of the loop. The loop’s quality — response speed, judgment accuracy, action effectiveness — depends in part on the quality of translation.

Translation loss as an entropy source

Chapter three discussed entropy — information decay in long reasoning chains. The focus then was on the thermodynamic analogy of “disorder naturally increases.”

The translation layer adds a previously unnamed source to the entropy mechanism.

Every connectionist-to-symbolic translation can inject noise: parsing errors (malformed JSON), semantic loss (what the model “meant to say” and what the schema can express do not perfectly overlap), information discard (tokens masked by constraints may have carried meaningful uncertainty signals).

Every symbolic-to-connectionist translation can also inject noise: instruction misinterpretation (gaps between what the prompt intends and what the model reads), context contamination (residual effects from earlier conversation turns), constraint dilution (system prompt rules fading over long contexts).

Why does long-chain reasoning degrade? Beyond the information-theoretic mechanisms discussed in chapter three, there is another reason: every additional reasoning step involves one or more representation conversions, and each conversion incurs translation loss. The longer the chain, the greater the accumulated translation noise.

The fractal translation layer

Chapter five discussed fractals — self-similar structures repeating across scales.

The translation layer has this property too.

At the scale of a single tool call: a schema defines constraints (symbolic) → the model generates output (connectionist) → output is parsed into structured data (symbolic). One micro-translation cycle.

At the scale of a single agent: a prompt encodes the task (symbolic → connectionist) → the model reasons and outputs (connectionist) → the harness parses, validates, and decides (connectionist → symbolic) → the next prompt is assembled (symbolic → connectionist). One full translation cycle.

At the scale of multi-agent orchestration: an orchestrator decomposes a task into subtasks (symbolic operation) → distributes to individual agents (each containing its own translation cycle) → collects and aggregates results (symbolic operation) → potentially redistributes. Translation cycles nested inside translation cycles.

Three scales, same structure: the symbolic → connectionist → symbolic conversion repeats at every layer. This is not coincidence — it shares the same root as the self-similar architecture of agentic systems discussed in chapter five: at every scale, the system needs symbolic control logic and connectionist generative capability to interact, and the interface at every scale faces the same impedance mismatch.

Causal discipline finds its framework

Return to this chapter’s starting point — the carrier problem left open at the end of chapter six.

What does causal discipline need? It needs to express causal structure (causal graphs, directionality, the do operator) — this requires symbolic representation. It needs to discover candidates for causal relationships from data (“rain and slippery roads may be causally related”) — this requires the connectionist system’s pattern-finding capability. It needs to verify whether these candidates are consistent with causal structure in the data — this requires symbolic statistical testing again.

The carrier of causal discipline is not the symbolic system. It is not the neural network. It is the translation between them.

The LLM extracts linguistic expressions of causal knowledge from text — “rain causes slippery roads.” That is the connectionist system doing discovery. The harness converts these expressions into formal causal hypotheses — a directed edge from “rain” to “slippery roads.” That is the translation layer doing conversion. Formal tools verify whether the hypothesis is consistent with observational data. That is the symbolic system doing verification. The three work together; none completes the job alone.

The impasse from chapter six — “one excels at structure but cannot discover; the other discovers but cannot structure” — now has a precise framework. The answer is not to make either side good at both. It is to let each do what it does, connected by translation.

The shape of the problem

This chapter has not provided answers.

What it has revealed is a structure.

Two representations — symbolic and connectionist — are irreducible to each other. Each covers a capability space the other cannot reach. The LLM is a strange hybrid of the two — a connectionist architecture manipulating a symbolic medium, developing symbol-like structures internally that are invisible at the engineering interface.

What harness engineers do every day is translate between these two representations. Translation has costs — five structural impedance mismatches. These costs are distributed along a tension axis: structural guarantees at one end, generative freedom at the other.

This is not a problem waiting to be solved. It is the shape of the problem itself.

Seeing this shape, when you face any specific engineering decision, you at least know what you are trading off. You know that adding strict mode gains you something (structural guarantees) and costs you something (trajectory bias). You know that giving the model more freedom gains you something (semantic quality) and risks something (format failures). You know that every line of harness code you write — every schema definition, every prompt rule, every parsing logic — is selecting a position on the tension axis.

There is no right or wrong position. Only whether you are clear about what you chose, and why.

Two roads

What symbols promised

What symbols cost

What connectionism promised

What connectionism costs

What the Bitter Lesson actually says

Irreducibility

Further reading

The strange hybrid

A connectionist architecture on a symbolic medium

Symbol-like structures emerge inside

But it is not a symbolic system

What the debate itself reveals

What this means for harness engineers

Further reading

The translation layer

You are already translating

A pattern classified but unnamed

The translation layer

Further reading

The cost of translation

Schema violation

Semantic drift

Tool hallucination

Trajectory bias

Boundary penetration

These are not bugs

Further reading

The tension axis

One axis

Not a problem to be solved

Orthogonality on another dimension

Further reading

The dao of symbols and connectionism

Back to the feedback loop

Translation loss as an entropy source

The fractal translation layer

Causal discipline finds its framework

The shape of the problem

Further reading