Seven Mental · 心智七篇
← Knowledge Atlas · Entity

Wes Gurnee

Wes Gurnee:MIT 可解释性研究者,稀疏探针方法和 LLM 时空表征研究的核心贡献者
ENTITY · WES GURNEE · MIT · LLM INTERPRETABILITY · SPATIOTEMPORAL WORLD MODEL · SPARSE PROBING

Wes Gurnee

MIT interpretability researcher — systematic validation combining large-scale empirics, linear probing and causal intervention

With Max Tegmark, Gurnee used 6 spatiotemporal datasets (over 180k samples in total) to systematically show that Llama-2 carries linear representations of real-world geographic and historical-time coordinates, and to localize individual “space neurons” and “time neurons” — spatial probe R²=0.911, temporal probe R²=0.835.

Key Spatiotemporal Results (Llama-2 family)
R² = 0.911Spatial RepresentationLinear probe on geographic coordinates — validated across 6 spatial datasets
R² = 0.835Temporal RepresentationLinear probe on historical time coordinates — validated on chronological sequences
Neuron ablation confirms causality — intervening on “space neurons” changes spatial predictions, proving a causal role rather than mere correlation
Methodological Signature
Large-Scale Datasets
180k+ samples, 6 spatiotemporal datasets — pushes against small-sample case analysis
Full-Layer Probe Sweep
Linear probes applied to every transformer layer — identifies the layer-wise distribution of representations
Sparse Probing (2023)
Finding Neurons in a Haystack — uses a tiny set of neurons to localize where specific information is encoded
→ Spatiotemporal World Model · Linear Representation · Max TegmarkICLR 2024 arXiv:2310.02207

Wes Gurnee

机构: 麻省理工学院(MIT) 研究方向: LLM 可解释性、神经网络内部表征、稀疏探针

主要贡献

Wes Gurnee 是 LLM 可解释性领域的核心研究者之一。

Language Models Represent Space and Time(2023)

Max Tegmark 合作,发表于 ICLR 2024。首次系统证明 Llama-2 在内部形成了真实世界地理坐标和历史时间坐标的线性表征,并定位了个体”空间神经元”和”时间神经元”。

详见:时空世界模型线性表征假说

Finding Neurons in a Haystack(2023)

提出稀疏探针方法,通过极少数神经元定位模型内部编码的特定信息(如性别、职业、年份)。

研究风格

倾向于构建大规模实证数据集,结合线性探针与因果干预进行系统验证,而非理论先行。代表性方法:构建 6 个时空数据集(累计 >18 万样本)+ 全层探针扫描 + 神经元消融验证。

References

  • sources/arxiv_papers/2310.02207-language-models-represent-space-and-time.md