实
Factory.ai
AI coding-agent company — a primary source of empirical work on context compression
On 36,000+ production session messages, Factory compared three compression strategies (Factory, OpenAI, Anthropic) and built a probe-based functional-quality evaluation framework that directly measures an agent’s ability to continue its task after compression. Headline finding: artifact tracking is a universal weakness across compression methods — regardless of strategy, post-compression ability to track code artifacts and file state drops sharply.
Compression research contributions Anchored iterative summarization Structured sections and incremental merging prevent information loss — distinct from naive truncation Probe-based evaluation Functional-quality framework: measures whether a compressed agent can still complete its task, not just textual similarity 36K production messages Real production data, not a synthetic test set — gives results practical engineering weight Three-way comparison Factory vs. OpenAI vs. Anthropic — surfaces real-world tradeoffs of each approach in production
Universal weakness uncovered Artifact-tracking failure Common to all methods: after compression, models lose grip on code artifacts and file state Engineering implication Externalized artifact tracking (feature tracking + progress files) is a necessary compensating mechanism at the harness layer
→ Context Compression · Harness Engineering · AnthropicFactory (2025)
Factory.ai
AI 编码 agent 公司,专注于软件工程自动化。
与本 wiki 的关联
Factory 在 上下文压缩 评估领域提供了重要的实证研究:
- 构建了 probe-based 功能质量评估框架,直接衡量压缩后 agent 的任务继续能力
- 提出锚定式迭代摘要(Anchored Iterative Summarization)——通过结构化 section 和增量合并防止信息丢失
- 在 36,000+ 条生产 session 消息上对比了三种压缩策略(Factory、OpenAI、Anthropic)
- 揭示了 artifact tracking 是所有压缩方法的普遍弱点
相关实体
References
sources/factory-evaluating-context-compression.md