时空世界模型（Spatiotemporal World Model）

时空世界模型：LLM 内部自发形成的地理坐标与历史时间线性表征，世界模型的必要成分

念

CONCEPT · SPATIOTEMPORAL WORLD MODEL · GURNEE & TEGMARK 2024 · LINEAR-PROBE EVIDENCE

Spatiotemporal World Model

Spatiotemporal World Model — LLMs spontaneously develop linearly decodable space and time representations

Gurnee & Tegmark (ICLR 2024) show, on the Llama-2 family, that unsupervised LLM training spontaneously produces linear representations of geographic space and historical time. Linear probes reach R²=0.911 on city-coordinate tasks and R²=0.835 on historical-event dating. This directly rebuts the “stochastic parrots” hypothesis — the model isn’t just replaying token sequences, it builds a genuine structural model of the world.

Key experimental evidence

R² = 0.911Spatial taskLinear probe on city geo-coordinates

R² = 0.835Temporal taskLinear probe on historical event dates

Causal interventionActivating space/time neurons shifts the model’s output locations/dates accordingly. The probes are not just correlational — they are causal.

Implications for AI system design

Against stochastic parrots

Mere statistical pattern-matching cannot explain linearly separable, high-R² world representations — the internal structure is real

Isomorphic to Othello

Othello board representations (Li 2022) → geography + time (Gurnee 2024): ever larger task domains

Probe limits

Linear probes only detect linear decodability — more complex nonlinear representations may be missed

Llama-2 generality

Same pattern seen on both 13B and 70B, but generality to other architectures still needs verification

→ Othello World Model · Probing Classifiers · Mechanistic InterpretabilityGurnee & Tegmark (2024)

时空世界模型（Spatiotemporal World Model）

定义

时空世界模型是指神经网络内部自发形成的、对现实世界空间坐标和时间坐标的连贯结构化表征。“自发”的含义是：这种表征并非通过显式地理或历史知识的监督训练获得，而是从 next-token prediction 的语言建模目标中涌现出来的副产品。

时空表征是更广义世界模型的必要成分，但并不充分——完整的世界模型还需要动态因果关系、物理约束、跨时间的状态追踪等能力。

核心证据：Gurnee & Tegmark (2023)

Language Models Represent Space and Time（ICLR 2024）提供了迄今最系统的实证证据：

实验规模

模型：Llama-2 系列（7B/13B/70B）+ Pythia 系列（160M~6.9B）
数据集：6 个跨尺度时空数据集（世界/美国/纽约 + 历史人物/艺术品/新闻）
方法：每层提取最后实体 token 的残差流激活，训练线性探针预测真实坐标

主要发现

发现	关键数字
空间表征（世界地点）R²	0.911（Llama-2-70B，线性探针）
时间表征（历史人物）R²	0.835（Llama-2-70B，线性探针）
非线性探针增益	<0.02（可忽略）
表征形成层深	约 60% 层深处饱和
跨实体类型泛化	城市/自然地标、歌曲/电影/书籍均使用统一方向

空间/时间神经元

研究进一步定位了个体”空间神经元”和”时间神经元”——与线性探针方向余弦相似度极高的单个神经元，并通过因果干预验证其功能：

将时间神经元 L19.3610 固定为特定值 → 模型对年代预测 token 的概率分布可定向改变
消融空间神经元 L20.7573 → 地理相关 token 预测的 loss 显著增加（如 Köppen 气候分类词）

结构特征

多尺度性

表征在不同空间尺度（世界→美国→纽约市）均可发现，精度随尺度细化而相对降低。这与论文作者猜想的”离散层级网格”（discretized hierarchical mesh）结构一致。

统一性

同一个探针方向可跨越不同实体类型（城市与自然地标、历史人物与艺术品），表明模型在几何意义上使用了统一的坐标系，而非为每类实体分别编码。

对提示词的鲁棒性

时空表征的质量基本不受 prompt 变化影响（显式要求给出坐标、加上下文提示等），表明这种表征是在预训练时自发固化的，而非被 prompt 激活的。

理论地位

与”随机鹦鹉”论的对立

Bender et al. 的”随机鹦鹉”（stochastic parrots）论点认为 LLM 只是无理解的统计机器。时空线性表征的发现提供了反例：模型确实在内部组织了真实世界的几何结构。

与 Othello-GPT 的延续

Li et al. (2022) + Nanda et al. (2023) 在 Othello 棋盘上发现了类似的线性状态表征，但那是在简单合成环境中。Gurnee & Tegmark 把这一发现推广到：

真实 LLM（而非专门任务模型）
真实世界坐标（而非游戏状态）
连续量（而非离散棋盘格）

局限性

静态表征，不是动态因果模型
探针泛化在跨区域（block holdout）时有一定精度损失
不能排除”国家/年代成员关系特征的加权和”这一替代解释（但跨类型泛化实验提供了反驳证据）
数据偏英语圈（基于英文维基百科）

未来方向（论文讨论）

沿训练检查点追踪：何时从离散”国家成员”特征演化为连贯几何结构？
稀疏自编码器（SAE）：在模型自身坐标系中提取表征，而非强行映射到人类坐标
与生物神经科学的对话：位置细胞（place cells）和网格细胞（grid cells）是否提供了启发？
随模型规模扩展：更大的模型是否发展出更细粒度的空间网格（如城市内部街区层级）？

References

sources/arxiv_papers/2310.02207-language-models-represent-space-and-time.md — Gurnee & Tegmark (2023)，核心来源

Spatiotemporal World Model

时空世界模型（Spatiotemporal World Model）

定义

核心证据：Gurnee & Tegmark (2023)

实验规模

主要发现

空间/时间神经元

结构特征

多尺度性

统一性

对提示词的鲁棒性

理论地位

与”随机鹦鹉”论的对立

与 Othello-GPT 的延续

局限性

未来方向（论文讨论）

相关概念

References