机器学习与 AI 系列参考资料总索引

说明#

这份索引整理了两部分内容：

src/data/blog/zh/machine-learning 下各系列文章中出现的显式参考资料、参考资源、外部资源，以及少量承担关键论证作用的正文外链
参考系列 AI 是怎么回事 16 篇文章中的主要参考资料与关键外部链接

这不是标准学术参考文献格式，而是一份面向技术写作者和知识库维护者的 来源地图。每一项尽量包含：

来源名称
URL
来源类型
来源所在文章
1-3 句概述：它讲什么、为什么重要、在文章里承担什么作用

来源类型说明#

论文：原始研究论文、技术报告、系统综述
官方文档：API 文档、模型文档、SDK 文档、规范文档
官方博客/公告：厂商公告、研究博客、发布说明
代码仓库：GitHub 仓库、示例项目、实现代码
数据集/基准：benchmark、leaderboard、数据集官网
教程/课程：解释型长文、课程页面、可视化教程
新闻报道：媒体报道、案例追踪、事件复盘
机构报告/政策：标准、法规、行业报告、安全指南
百科/维基：Wikipedia 等背景资料页

第一部分：本地 `machine-learning` 目录参考资料索引#

0.1 总览与 README#

`00-系列导读.md`#

OpenAI API 文档
- URL: https://platform.openai.com/docs
- 类型：官方文档
- 概述：作为整套学习路线的 API 入门入口，支撑“从模型调用到应用开发”的实践方向。
Anthropic Claude 文档
- URL: https://docs.anthropic.com/
- 类型：官方文档
- 概述：为 Claude 模型、工具调用和 Agent 能力提供官方来源，是多模型对比时的重要基线。
Google Gemini 文档
- URL: https://ai.google.dev/docs
- 类型：官方文档
- 概述：补足 Gemini 模型在多模态、上下文窗口和 API 使用上的官方信息。
LangChain 文档
- URL: https://python.langchain.com/
- 类型：官方文档
- 时间：持续更新
- 层次：L4-知识与工具层
- 机制：工具调用机制、检索机制
- Tags: topic:agent topic:rag layer:knowledge mechanism:tool-use mechanism:retrieval type:doc status:engineering
- 概述：承接路线图中 RAG、Agent、应用开发的工程部分，是框架层的重要实践入口。
Hugging Face 文档
- URL: https://huggingface.co/docs
- 类型：官方文档
- 概述：作为模型、数据集和推理部署生态的总入口，适合做更深入的工具链延伸阅读。
Attention Is All You Need
- URL: https://arxiv.org/abs/1706.03762
- 类型：论文
- 时间：2017
- 层次：L1-模型机制层
- 机制：注意力机制、架构机制
- Tags: topic:architecture topic:transformer layer:model mechanism:attention mechanism:architecture type:paper time:2017 status:foundational
- 概述：Transformer 开山之作，也是整个 LLM 学习路线最重要的原点之一。
GPT-3
- URL: https://arxiv.org/abs/2005.14165
- 类型：论文
- 时间：2020
- 层次：L0-范式层
- 机制：架构机制
- Tags: topic:transformer topic:inference layer:paradigm mechanism:architecture type:paper time:2020 status:foundational
- 概述：用来说明 few-shot learning 与大规模预训练如何推动通用大模型进入主流。
InstructGPT
- URL: https://arxiv.org/abs/2203.02155
- 类型：论文
- 时间：2022
- 层次：L2-训练与对齐层
- 机制：对齐机制
- Tags: topic:alignment topic:rlhf layer:training mechanism:alignment type:paper time:2022 status:foundational
- 概述：解释“从会续写到会听指令”的关键转折，是 RLHF 进入产品化的代表作。
Chain of Thought
- URL: https://arxiv.org/abs/2201.11903
- 类型：论文
- 时间：2022
- 层次：L3-推理与解码层
- 机制：推理机制
- Tags: topic:reasoning topic:cot layer:inference mechanism:reasoning type:paper time:2022 status:foundational
- 概述：是推理增强路线的代表论文，为后续 Prompt、推理模型和 Agent 奠定背景。
DeepSeek R1
- URL: https://arxiv.org/abs/2501.12948
- 类型：论文
- 时间：2025 Q1
- 层次：L2-训练与对齐层
- 机制：对齐机制、推理机制
- Tags: topic:reasoning topic:alignment layer:training mechanism:alignment mechanism:reasoning type:paper time:2025Q1 status:frontier
- 概述：作为开源推理模型的代表，为这套目录补上近年的重要技术节点。
The Illustrated Transformer
- URL: https://jalammar.github.io/illustrated-transformer/
- 类型：教程/课程
- 概述：用可视化方式帮助读者快速建立 Transformer 的结构理解，是极高频的辅助资料。
Attention? Attention!
- URL: https://lilianweng.github.io/posts/2018-06-24-attention/
- 类型：教程/课程
- 概述：用更细的解释补充注意力机制的概念背景，适合作为阅读论文前的预热材料。
Stanford CS224N
- URL: https://web.stanford.edu/class/cs224n/
- 类型：课程
- 概述：作为 NLP 系统学习的经典课程，用于给整套知识库提供学院派背景资源。
Fast.ai
- URL: https://www.fast.ai/
- 类型：课程
- 概述：补充偏实践导向的机器学习学习路径，适合与论文和官方文档形成互补。
LangChain
- URL: https://github.com/langchain-ai/langchain
- 类型：代码仓库
- 时间：2022-2026
- 层次：L4-知识与工具层
- 机制：工具调用机制、检索机制
- Tags: topic:agent topic:rag layer:knowledge mechanism:tool-use mechanism:retrieval type:repo status:engineering
- 概述：LLM 应用开发框架的代表仓库，承接应用开发和 Agent 系列的工程实践。
AutoGen
- URL: https://github.com/microsoft/autogen
- 类型：代码仓库
- 概述：微软的多 Agent 框架仓库，用于展示对话式 Agent 编排路线。
LlamaIndex
- URL: https://github.com/run-llama/llama_index
- 类型：代码仓库
- 时间：2023-2026
- 层次：L4-知识与工具层
- 机制：检索机制、索引机制
- Tags: topic:rag topic:retrieval layer:knowledge mechanism:retrieval mechanism:indexing type:repo status:engineering
- 概述：RAG 和文档索引方向的代表框架，是知识增强应用的高频基础设施。
Hugging Face Transformers
- URL: https://github.com/huggingface/transformers
- 类型：代码仓库
- 概述：现代模型实验、推理与微调的核心库之一，也是论文与工程之间的重要桥梁。

`agent-guide/00-系列导读.md`#

OpenAI Agents SDK
- URL: https://openai.github.io/openai-agents-python/
- 类型：官方文档
- 时间：2025-2026
- 层次：L4-知识与工具层
- 机制：工具调用机制、环境交互机制
- Tags: topic:agent topic:tool-use layer:knowledge mechanism:tool-use mechanism:environment-interaction type:doc status:engineering
- 概述：作为 Agent SDK 与 handoff 能力的直接参考，帮助读者理解现代 Agent 工程接口。
LangChain Agents
- URL: https://python.langchain.com/docs/concepts/agents/
- 类型：官方文档
- 概述：提供 Agent 抽象、类型与工作方式的框架级解释，是对 OpenAI 路线的一个重要对照。
AutoGen 文档
- URL: https://microsoft.github.io/autogen/
- 类型：官方文档
- 概述：对应多 Agent 会话式协作路线，适合用于比较框架哲学差异。
CrewAI 文档
- URL: https://docs.crewai.com/
- 类型：官方文档
- 概述：强调 role-based 的团队式 Agent 设计，在多角色协作场景下很有代表性。
MCP SDK
- URL: https://modelcontextprotocol.io/
- 类型：官方文档
- 时间：2024-2026
- 层次：L4-知识与工具层
- 机制：协议机制
- Tags: topic:mcp topic:agent layer:knowledge mechanism:protocol type:doc time:2025 status:foundational
- 概述：作为 MCP 协议入口，支撑“标准化工具接入”这一核心工程主题。
ReAct
- URL: https://arxiv.org/abs/2210.03629
- 类型：论文
- 时间：2022
- 层次：L3-推理与解码层
- 机制：推理机制、工具调用机制
- Tags: topic:reasoning topic:agent layer:inference mechanism:reasoning mechanism:tool-use type:paper time:2022 status:foundational
- 概述：整套 Agent 指南最核心的方法论来源，定义了“推理 + 行动”的基本闭环。
Generative Agents
- URL: https://arxiv.org/abs/2304.03442
- 类型：论文
- 时间：2023
- 层次：L4-知识与工具层
- 机制：记忆机制、Agent 机制
- Tags: topic:agent topic:memory layer:knowledge mechanism:memory mechanism:agent type:paper time:2023 status:foundational
- 概述：为记忆、反思和虚拟社会行为提供经典案例，是 Agent 研究中的高辨识度论文。
Attention Residuals
- URL: https://arxiv.org/abs/2603.15031
- 类型：论文
- 概述：作为前沿架构文章的代表，承接 Agent 系列对最新底层模型进展的覆盖。
DeepSeek-R1
- URL: https://arxiv.org/abs/2501.12948
- 类型：论文
- 概述：体现了推理模型如何影响 Agent 设计和执行能力，是应用层和模型层的连接点。
MRKL Systems
- URL: https://arxiv.org/abs/2205.00445
- 类型：论文
- 时间：2022
- 层次：L4-知识与工具层
- 机制：工具调用机制、模块化机制
- Tags: topic:agent topic:tool-use layer:knowledge mechanism:tool-use mechanism:modularity type:paper time:2022 status:foundational
- 概述：把 LLM 与工具系统的组合上升为模块化神经符号架构，是 Tool Use 和 Agent 设计的思想前身。
LangChain
- URL: https://github.com/langchain-ai/langchain
- 类型：代码仓库
- 概述：提供可复用的 Agent、Chain、Retriever 组件，是很多实践教程的实现基础。
AutoGen
- URL: https://github.com/microsoft/autogen
- 类型：代码仓库
- 概述：用于展示多 Agent 协作的另一条落地路线。
CrewAI
- URL: https://github.com/crewAI/crewAI
- 类型：代码仓库
- 概述：为 role-based Agent 编排提供直接实现入口。
Attention-Residuals (Kimi)
- URL: https://github.com/MoonshotAI/Attention-Residuals
- 类型：代码仓库
- 概述：让读者从论文进一步查看实验与实现，是前沿模型演进的工程入口。
Kimi 发布 Attention Residuals：颠覆十年残差连接
- URL: http://www.itsolotime.com/archives/26372
- 类型：新闻报道
- 概述：作为中文世界的传播性材料，帮助读者理解这项技术在社区里的讨论热度。
OpenAI o3 发布公告
- URL: https://openai.com/index/introducing-o3-and-o4-mini/
- 类型：官方博客/公告
- 概述：支撑推理模型时代的产品节点叙述。
Anthropic Claude 4 发布
- URL: https://www.anthropic.com/news/claude-4
- 类型：官方博客/公告
- 概述：为 Claude 系列能力更新提供直接来源。
Google Gemini 2.0 发布
- URL: https://blog.google/innovation-and-ai/models-and-research/google-deepmind/google-gemini-ai-update-december-2024/
- 类型：官方博客/公告
- 概述：补充 Gemini 进入 Agent / multimodal 时代的产品背景。
DeepSeek R1 开源公告
- URL: https://siliconangle.com/2025/01/20/deepseek-open-sources-r1-reasoning-model-series/
- 类型：新闻报道
- 概述：作为外部媒体视角的新闻材料，补充开源推理模型扩散的社区语境。

`llm-paper-history/00-系列导读.md`#

GPT-1
- URL: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
- 类型：论文
- 概述：标记生成式预训练路线的关键起点。
BERT
- URL: https://arxiv.org/abs/1810.04805
- 类型：论文
- 概述：代表双向预训练路线，是 GPT 路线的重要对照组。
PaLM
- URL: https://arxiv.org/abs/2204.02311
- 类型：论文
- 概述：用来解释大规模训练与 scaling law 对能力提升的推动。
LLaMA
- URL: https://arxiv.org/abs/2302.13971
- 类型：论文
- 概述：代表开源基础模型生态的爆发点。
Mixtral
- URL: https://arxiv.org/abs/2401.04088
- 类型：论文
- 概述：作为 MoE 架构的代表，为“更高效率的大模型设计”提供具体样本。
DeepSeek R1
- URL: https://arxiv.org/abs/2501.12948
- 类型：论文
- 概述：开源推理模型的，为推理时代章节提供终点坐标。
The Illustrated GPT-2
- URL: https://jalammar.github.io/illustrated-gpt2/
- 类型：教程/课程
- 概述：帮助读者理解从 Transformer 到 GPT 生成式架构的过渡。

`llm-security/00-系列导读.md`#

GCG: Greedy Coordinate Gradient
- URL: https://arxiv.org/abs/2307.04757
- 类型：论文
- 概述：代表自动化越狱攻击方法，是安全系列中最重要的攻击研究来源之一。
Prompt Injection Attacks
- URL: https://www.jailbreaksearch.com/
- 类型：案例库/安全站点
- 概述：汇集越狱与注入攻击样例，帮助读者理解攻击面是如何演化和传播的。
Cursor Security Advisory
- URL: https://cursor.sh/security
- 类型：官方文档
- 概述：把安全话题从论文攻击带入真实产品漏洞与安全通告层面。

`llm/index.md`#

ChatGPT背后的语言模型简史
- URL: https://bmpi.dev/dev/deep-learning/nlp-language-models
- 类型：教程/课程
- 概述：提供一条中文世界里较友好的语言模型演进导读。
happy-llm
- URL: https://github.com/datawhalechina/happy-llm
- 类型：代码仓库
- 概述：兼具教程与实践示例功能，适合从目录页直接延伸到动手学习。
LangChain 官方文档
- URL: https://python.langchain.com/
- 类型：官方文档
- 概述：把基础概念引向应用开发的直接工程入口。

0.2 `llm-guide`#

`llm-guide/01-ai-history.md`#

ChatGPT 背后的语言模型简史
- URL: https://bmpi.dev/dev/deep-learning/nlp-language-models
- 类型：教程/课程
- 概述：提供 AI 演进叙事的中文友好入口，用来补充历史脉络。
Attention Is All You Need
- URL: https://arxiv.org/abs/1706.03762
- 类型：论文
- 概述：作为 Transformer 时代起点，被用来解释 LLM 为什么在近年迎来质变。
GPT-1
- URL: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
- 类型：论文
- 概述：说明生成式预训练的真正起点。
BERT
- URL: https://arxiv.org/abs/1810.04805
- 类型：论文
- 概述：作为双向预训练路线代表，帮助文章构建完整历史坐标系。
GPT-3
- URL: https://arxiv.org/abs/2005.14165
- 类型：论文
- 概述：支撑 few-shot 与规模化训练带来能力跃迁的叙述。
InstructGPT
- URL: https://arxiv.org/abs/2203.02155
- 类型：论文
- 概述：解释从续写模型走向对话助手的关键训练变化。
Emergent Abilities of Large Language Models
- URL: https://arxiv.org/abs/2206.07682
- 类型：论文
- 概述：为“涌现能力”这一流行概念提供学术出处。
LLaMA: Open and Efficient Foundation Language Models
- URL: https://arxiv.org/abs/2302.13971
- 类型：论文
- 概述：作为开源模型路线的代表，让历史线不只停留在闭源模型。
happy-llm 教程
- URL: https://github.com/datawhalechina/happy-llm
- 类型：代码仓库
- 概述：为读者继续系统学习提供实践型延伸入口。
DeepSeek-V3 Technical Report
- URL: https://arxiv.org/abs/2412.19437
- 类型：论文
- 概述：让历史线自然延伸到近年的国产开源模型。

`llm-guide/02-how-llm-works.md`#

Attention Is All You Need
- URL: https://arxiv.org/abs/1706.03762
- 类型：论文
- 概述：是解释 token、embedding、attention、生成流程的核心理论来源。
Tokenizer Tool
- URL: https://platform.openai.com/tokenizer
- 类型：官方文档
- 概述：帮助读者把 token 化从抽象概念变成可感知操作。
The Illustrated Transformer
- URL: https://jalammar.github.io/illustrated-transformer/
- 类型：教程/课程
- 概述：用可视化方式解释 Transformer 的层结构与注意力传播。
The Curious Case of Neural Text Degeneration
- URL: https://arxiv.org/abs/1904.09751
- 类型：论文
- 概述：Top-p 采样的重要论文，用来解释生成策略为何影响输出质量。
FlashAttention
- URL: https://arxiv.org/abs/2205.14135
- 类型：论文
- 概述：把“模型原理”延伸到“推理优化”，帮助读者理解现代大模型为什么能更高效运行。
Fast Inference from Transformers via Speculative Decoding
- URL: https://arxiv.org/abs/2211.17192
- 类型：论文
- 概述：作为推测解码起点，用于说明大模型生成速度并非只能靠更强硬件提升。
LLM.int8()
- URL: https://arxiv.org/abs/2208.07339
- 类型：论文
- 概述：支撑量化技术的基本介绍，是模型部署与成本优化的重要桥梁。

`llm-guide/03-choose-model.md`#

API Pricing
- URL: https://openai.com/api/pricing/
- 类型：官方文档
- 概述：作为成本维度比较的直接来源，是选型框架中最现实的约束之一。
Claude Models Overview
- URL: https://docs.anthropic.com/en/docs/about-claude/models
- 类型：官方文档
- 概述：提供 Claude 系列能力、上下文窗口和定位信息。
Gemini API Pricing
- URL: https://ai.google.dev/pricing
- 类型：官方文档
- 概述：补足 Gemini 在价格和能力上的官方说明。
DeepSeek-V3 Technical Report
- URL: https://arxiv.org/abs/2412.19437
- 类型：论文
- 概述：作为开源模型代表，为中文和高性价比路线提供参考依据。
Llama 3.1 Model Card
- URL: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md
- 类型：官方文档
- 概述：用于补充开放权重模型的规格、局限与许可信息。
Qwen2.5 Technical Report
- URL: https://arxiv.org/abs/2412.15115
- 类型：论文
- 概述：为中文场景与国产模型选型提供技术背景。
MMLU
- URL: https://arxiv.org/abs/2009.03300
- 类型：论文
- 概述：作为通用知识 benchmark 的经典来源，支撑“能力边界”维度。
HumanEval
- URL: https://arxiv.org/abs/2107.03374
- 类型：论文
- 概述：代码评测基准，用于比较模型在编程场景下的能力。
PagedAttention
- URL: https://arxiv.org/abs/2309.06180
- 类型：论文
- 概述：帮助文章把模型选型扩展到部署与服务效率层面。
生成式人工智能服务管理暂行办法
- URL: http://www.cac.gov.cn/2023-07/13/c_1690898327029107.htm
- 类型：机构报告/政策
- 概述：提醒模型选型不仅是能力和成本问题，还受到合规约束。

`llm-guide/04-prompt-engineering.md`#

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- URL: https://arxiv.org/abs/2201.11903
- 类型：论文
- 概述：CoT 的奠基论文，是整篇提示工程文章最核心的理论来源。
Tree of Thoughts
- URL: https://arxiv.org/abs/2305.10601
- 类型：论文
- 概述：把线性思维链扩展到搜索式推理，为高级提示技巧提供理论支撑。
Self-Consistency
- URL: https://arxiv.org/abs/2203.11171
- 类型：论文
- 概述：说明多路径采样投票为何能提升复杂推理任务的稳定性。
ReAct
- URL: https://arxiv.org/abs/2210.03629
- 类型：论文
- 概述：用来说明提示词不仅能引导推理，还能引导行动。
OpenAI Prompt Engineering Guide
- URL: https://platform.openai.com/docs/guides/prompt-engineering
- 类型：官方文档
- 概述：提供面向工程实践的提示设计指南。
Anthropic Prompt Engineering Overview
- URL: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
- 类型：官方文档
- 概述：提供 Claude 场景下的提示写作原则，与 OpenAI 指南形成互补。
Large Language Models are Zero-Shot Reasoners
- URL: https://arxiv.org/abs/2205.11916
- 类型：论文
- 概述：说明 zero-shot CoT 为什么会对推理表现产生显著影响。
Language Models are Few-Shot Learners
- URL: https://arxiv.org/abs/2005.14165
- 类型：论文
- 概述：作为 few-shot 学习的里程碑来源，为示例驱动型提示提供理论基础。

`llm-guide/05-rag.md`#

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- URL: https://arxiv.org/abs/2005.11401
- 类型：论文
- 时间：2020
- 层次：L4-知识与工具层
- 机制：检索机制
- Tags: topic:rag topic:retrieval layer:knowledge mechanism:retrieval type:paper time:2020 status:foundational
- 概述：RAG 原始论文，定义了“检索增强生成”的标准起点。
GraphRAG
- URL: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
- 类型：官方博客/公告
- 概述：说明 RAG 已从纯向量检索发展到图结构检索与关系建模。
Introducing Contextual Retrieval
- URL: https://www.anthropic.com/news/contextual-retrieval
- 类型：官方博客/公告
- 概述：用来补充“如何提升检索质量”这一工程问题。
LangChain RAG 文档
- URL: https://python.langchain.com/docs/tutorials/rag/
- 类型：官方文档
- 概述：提供从理论到实践的主工程入口。
LlamaIndex 文档
- URL: https://docs.llamaindex.ai/
- 类型：官方文档
- 概述：补充索引、检索和文档处理层的实现材料。

`llm-guide/06-function-calling.md`#

OpenAI Function Calling 文档
- URL: https://platform.openai.com/docs/guides/function-calling
- 类型：官方文档
- 时间：持续更新
- 层次：L4-知识与工具层
- 机制：工具调用机制
- Tags: topic:tool-use layer:knowledge mechanism:tool-use type:doc status:engineering
- 概述：定义模型如何通过结构化参数调用函数，是工具调用章节的核心来源。
Anthropic Tool Use 文档
- URL: https://docs.anthropic.com/en/docs/build-with-claude/tool-use
- 类型：官方文档
- 时间：持续更新
- 层次：L4-知识与工具层
- 机制：工具调用机制
- Tags: topic:tool-use layer:knowledge mechanism:tool-use type:doc status:engineering
- 概述：作为 Tool Use 方向的官方实现文档，与 OpenAI 路线形成对照。
Anthropic Computer Use
- URL: https://docs.anthropic.com/en/docs/agents-and-tools/computer-use
- 类型：官方文档
- 时间：2025
- 层次：L4-知识与工具层
- 机制：工具调用机制、环境交互机制
- Tags: topic:agent topic:computer-use layer:knowledge mechanism:tool-use mechanism:environment-interaction type:doc time:2025 status:frontier
- 概述：把工具调用扩展到 GUI 操作层，代表工具使用能力的新边界。
MCP 官方文档
- URL: https://modelcontextprotocol.io/
- 类型：官方文档
- 时间：2024-2026
- 层次：L4-知识与工具层
- 机制：协议机制
- Tags: topic:mcp topic:agent layer:knowledge mechanism:protocol type:doc time:2025 status:foundational
- 概述：说明工具接入进一步标准化后的协议方向。

`llm-guide/07-ai-agent.md`#

ReAct
- URL: https://arxiv.org/abs/2210.03629
- 类型：论文
- 概述：定义 Agent 的最小核心循环，是整篇文章的方法论基础。
LangGraph 文档
- URL: https://langchain-ai.github.io/langgraph/
- 类型：官方文档
- 概述：帮助把 Agent 架构从概念层落到状态图和工作流实现。
OpenAI Agents SDK
- URL: https://openai.github.io/openai-agents-python/
- 类型：官方文档
- 概述：提供现代平台原生 Agent SDK 的实践入口。
CrewAI 文档
- URL: https://docs.crewai.com/
- 类型：官方文档
- 概述：支撑文章对多角色协作式 Agent 框架的介绍。
A2A Protocol
- URL: https://google.github.io/A2A/
- 类型：官方文档
- 概述：把 Agent 能力扩展到 Agent-to-Agent 协同通信。
Devin
- URL: https://www.cognition.ai/blog/introducing-devin
- 类型：官方博客/公告
- 概述：作为软件工程 Agent 的标志性案例，为文章提供现实想象锚点。

`llm-guide/08-memory-mcp-ecosystem.md`#

MemGPT: Towards LLMs as Operating Systems
- URL: https://arxiv.org/abs/2310.08560
- 类型：论文
- 概述：用“像操作系统管理内存”这一框架解释长短期记忆分层问题。
MCP 官方文档
- URL: https://modelcontextprotocol.io/
- 类型：官方文档
- 概述：支撑工具、资源、prompt 模板统一接入的协议部分。
A2A Protocol
- URL: https://google.github.io/A2A/
- 类型：官方文档
- 概述：扩展生态层的通信视角，说明未来不仅是工具接入，还有 Agent 互联。
DPO
- URL: https://arxiv.org/abs/2305.18290
- 类型：论文
- 概述：为“对齐技术”部分提供偏好优化的代表性方法来源。
EU AI Act
- URL: https://artificialintelligenceact.eu/
- 类型：机构报告/政策
- 概述：补足生态系统建设中的治理、合规与风险框架。

`llm-guide/09-ai-programming.md`#

Cursor 官方文档
- URL: https://docs.cursor.com/
- 类型：官方文档
- 概述：作为 AI 原生 IDE 的代表资源，用于支撑工具对比。
Claude Code
- URL: https://docs.anthropic.com/en/docs/claude-code
- 类型：官方文档
- 概述：对应 CLI / agentic coding 的工具路线。
GitHub Copilot
- URL: https://github.com/features/copilot
- 类型：官方文档
- 概述：作为“代码补全式 AI 编程”的典型产品页。
Vibe Coding
- URL: https://en.wikipedia.org/wiki/Vibe_coding
- 类型：百科/维基
- 概述：为文章使用的新概念提供背景定义和外部语境。

`llm-guide/10-build-ai-app.md`#

LangChain 官方文档
- URL: https://python.langchain.com/
- 类型：官方文档
- 概述：承接 AI 应用编排与组件化开发。
LangGraph 文档
- URL: https://langchain-ai.github.io/langgraph/
- 类型：官方文档
- 概述：支撑复杂工作流和 Agent 编排部分。
LlamaIndex 文档
- URL: https://docs.llamaindex.ai/
- 类型：官方文档
- 概述：补充文档索引、检索与知识层应用实现。
Vercel AI SDK
- URL: https://sdk.vercel.ai/
- 类型：官方文档
- 概述：为 Web / 前端 AI 应用提供直接实现路径。
LangSmith 文档
- URL: https://docs.smith.langchain.com/
- 类型：官方文档
- 概述：用于可观测性、trace 和 prompt / chain 调试。
Langfuse
- URL: https://langfuse.com/
- 类型：官方文档
- 概述：作为开源 AI 可观测性平台的代表，支撑“上线后如何持续观察系统”的部分。

`llm-guide/11-fine-tuning.md`#

LoRA
- URL: https://arxiv.org/abs/2106.09685
- 类型：论文
- 时间：2021
- 层次：L2-训练与对齐层
- 机制：微调机制
- Tags: topic:finetuning topic:lora layer:training mechanism:finetuning type:paper time:2021 status:foundational
- 概述：参数高效微调的经典方法，是整篇微调文章的理论支点。
QLoRA
- URL: https://arxiv.org/abs/2305.14314
- 类型：论文
- 时间：2023
- 层次：L2-训练与对齐层
- 机制：微调机制、量化机制
- Tags: topic:finetuning topic:qlora layer:training mechanism:finetuning mechanism:quantization type:paper time:2023 status:foundational
- 概述：说明在有限显存环境下微调大模型的可行路线。
DPO
- URL: https://arxiv.org/abs/2305.18290
- 类型：论文
- 时间：2023
- 层次：L2-训练与对齐层
- 机制：对齐机制
- Tags: topic:alignment topic:dpo layer:training mechanism:alignment type:paper time:2023 status:foundational
- 概述：为偏好优化和对齐方法部分提供更简洁的训练范式。
Axolotl
- URL: https://github.com/OpenAccess-AI-Collective/axolotl
- 类型：代码仓库
- 概述：常见开源微调工具链之一，适合实践导向读者继续深入。
LLaMA-Factory
- URL: https://github.com/hiyouga/LLaMA-Factory
- 类型：代码仓库
- 概述：中文社区常用微调工具，适合多模型、多方法的统一训练入口。
Unsloth
- URL: https://github.com/unslothai/unsloth
- 类型：代码仓库
- 概述：强调更快更省显存的训练体验，是近年常见微调实践资源。
OpenAI Fine-tuning 文档
- URL: https://platform.openai.com/docs/guides/fine-tuning
- 类型：官方文档
- 概述：为“云平台微调”提供官方操作入口。

0.3 `agent-guide`#

`agent-guide/14-Agent架构模式.md`#

OpenAI Agents SDK Handoffs
- URL: https://docs.google.com/document/d/1mWMD5jVJw9WkB5J-impwnNP3jdk8c2Dc6NTHTiqJLPg
- 类型：官方文档
- 概述：作为 handoff 模式的直接来源，用来解释多 Agent 间任务交接的设计思路。
LangChain Agent Patterns
- URL: https://python.langchain.com/docs/concepts/agent_types/
- 类型：官方文档
- 概述：提供不同 Agent 类型与使用场景的框架视角分类。

`agent-guide/15-Agent评估体系.md`#

GAIA Benchmark
- URL: https://huggingface.co/benchmarks/gaia
- 类型：数据集/基准
- 概述：用于评估面向真实任务的 Agent 综合能力，强调不只是问答准确率。
API-Bank
- URL: https://arxiv.org/abs/2304.09142
- 类型：论文
- 概述：把工具调用成功率纳入评估视角，是 Agent 测评的重要补充。
AgentBench
- URL: https://arxiv.org/abs/2308.03688
- 类型：论文
- 概述：这里改为 AgentBench 论文入口，用来补齐早期体系化 Agent 评测的代表来源，避免保留无效链接。

`agent-guide/16-Agent可观测性与调试.md`#

LangSmith
- URL: https://smith.langchain.com
- 类型：官方文档
- 概述：为 trace、run 对比、prompt 版本调试提供平台级支持。
Langfuse
- URL: https://langfuse.com
- 类型：官方文档
- 概述：适合做开源、可自部署的可观测性方案。
OpenTelemetry for AI
- URL: https://opentelemetry.io
- 类型：官方文档
- 概述：说明 AI 应用也可以接入通用可观测规范，而不是只能依赖专有平台。

`agent-guide/17-Agent成本优化.md`#

LangChain Cost Optimization
- URL: https://python.langchain.com
- 类型：官方文档
- 概述：用来补充链路级成本优化思路，如缓存、压缩和分级调用。
OpenAI Token Counting
- URL: https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
- 类型：官方文档
- 概述：为 token 计费与上下文成本分析提供基础入口。

`agent-guide/18-Agent可靠性设计.md`#

Tenacity Retry
- URL: https://tenacity.readthedocs.io
- 类型：官方文档
- 概述：支撑重试机制实现，是把 Agent 从“能跑”变成“更稳”的基础工具。
Circuit Breaker Pattern
- URL: https://martinfowler.com/bliki/CircuitBreaker.html
- 类型：教程/课程
- 概述：把传统分布式系统里的熔断模式迁移到 AI/Agent 可靠性设计语境中。

`agent-guide/19-Agent测试策略.md`#

LangChain Testing
- URL: https://python.langchain.com/docs/how-to/testing
- 类型：官方文档
- 概述：提供 Agent/chain 组件测试的框架层入口。
Agent Evaluation Methods
- URL: https://arxiv.org/abs/2309.11235
- 类型：论文
- 概述：用来补充更系统化的 Agent 测评方法论。

`agent-guide/20-Agent安全防御.md`#

LangChain Security
- URL: https://python.langchain.com/docs/security
- 类型：官方文档
- 概述：提供 Agent 应用层的安全实践建议，如输入隔离、工具边界与输出验证。
Agent Security Patterns
- URL: https://arxiv.org/abs/2309.11235
- 类型：论文
- 概述：帮助把单点防护经验上升为可复用的安全模式。

0.4 `llm-paper-history`#

`llm-paper-history/24-SpeculativeDecoding推理加速.md`#

Fast Speculative Decoding
- URL: https://arxiv.org/abs/2311.04981
- 类型：论文
- 概述：是推测解码方向的重要起点，用于解释如何在不显著损失质量的前提下加速推理。
Medusa
- URL: https://arxiv.org/abs/2401.10774
- 类型：论文
- 概述：展示推理加速的另一条实现路线，强调多 token 预测。
EAGLE
- URL: https://arxiv.org/abs/2402.12726
- 类型：论文
- 概述：补充投机式采样在更大模型推理中的优化思路。

`llm-paper-history/25-T5与FLAN指令微调.md`#

T5
- URL: https://arxiv.org/abs/1910.10683
- 类型：论文
- 概述：通过统一 text-to-text 格式重塑任务表达，是指令微调前史的重要节点。
FLAN
- URL: https://arxiv.org/abs/2109.01652
- 类型：论文
- 概述：把指令微调推向主流，是现代 instruction tuning 的早期代表。
UL2
- URL: https://arxiv.org/abs/2205.05131
- 类型：论文
- 概述：说明 Google 路线如何继续统一多种预训练范式。

`llm-paper-history/26-Qwen与InternLM开源模型.md`#

Qwen 2.5
- URL: https://arxiv.org/abs/2407.21783
- 类型：论文
- 概述：用于说明国产开源模型在能力与生态上的快速提升。
InternLM2
- URL: https://arxiv.org/abs/2403.17297
- 类型：论文
- 概述：作为另一条国产开源路线的代表，形成对照视角。
Qwen GitHub
- URL: https://github.com/QwenLM
- 类型：代码仓库
- 概述：提供模型、工具和生态入口。
InternLM GitHub
- URL: https://github.com/InternLM/InternLM
- 类型：代码仓库
- 概述：补充国产模型生态与实现入口。

`llm-paper-history/27-PaLM2技术报告.md`#

PaLM 2 Technical Report
- URL: https://ai.google/static/documents/palm2techreport.pdf
- 类型：论文
- 概述：是 PaLM 2 架构与能力描述的直接来源。
Google AI Blog
- URL: https://blog.google/technology/ai/google-palm-2-ai-large-language-model/
- 类型：官方博客/公告
- 概述：补充 PaLM 2 的发布语境与产品化方向。

`llm-paper-history/28-AlphaCode编程竞赛.md`#

Competition-level code generation with AlphaCode
- URL: https://arxiv.org/abs/2203.07814
- 类型：论文
- 概述：用于说明大模型在竞赛级代码生成任务上的突破。
AlphaCode GitHub
- URL: https://github.com/deepmind/code_contests
- 类型：代码仓库
- 概述：提供数据与实现入口，适合继续查看评测和训练材料。

`llm-paper-history/29-Mistral7B小而美.md`#

Mistral 7B 论文
- URL: https://arxiv.org/abs/2310.06825
- 类型：论文
- 概述：用来说明“小模型也能通过架构优化取得强性能”。
Mistral AI 官网
- URL: https://mistral.ai/news/mistral-7b/
- 类型：官方博客/公告
- 概述：补充发布背景与生态定位。

`llm-paper-history/30-Grok与LLaMA3开源新星.md`#

LLaMA 3 论文
- URL: https://arxiv.org/abs/2407.21783
- 类型：论文
- 概述：代表新一代开放权重模型能力基线。
Command R 技术报告
- URL: https://cohere.com/command
- 类型：官方博客/公告
- 概述：作为企业向模型路线的补充样本。
Grok-1 公告
- URL: https://x.ai/grok-1
- 类型：官方博客/公告
- 概述：对应 xAI 路线的公开发布入口。

`llm-paper-history/31-RAG与LongContext知识增强.md`#

Retrieval-Augmented Generation for Knowledge-Intensive NLP
- URL: https://arxiv.org/abs/2005.11401
- 类型：论文
- 概述：定义 RAG 起点，是知识增强路线的关键来源。
Longformer
- URL: https://arxiv.org/abs/2004.05150
- 类型：论文
- 概述：代表长文档建模路线，用于与 RAG 形成方法论对比。
CoQA
- URL: https://arxiv.org/abs/1812.09596
- 类型：数据集/基准
- 概述：作为对话式问答任务背景，支撑知识增强场景说明。

`llm-paper-history/32-o1o3推理时代.md`#

OpenAI o1 发布公告
- URL: https://openai.com/index/introducing-openai-o1
- 类型：官方博客/公告
- 概述：标记推理模型产品化节点。
Learning to Reason with LLMs
- URL: https://openai.com/index/learning-to-reason-with-llms
- 类型：官方博客/公告
- 概述：解释 o 系列背后的训练和评估思想，是推理模型路线的重要公开材料。

`llm-paper-history/33-PromptEngineering提示工程.md`#

Prompt Engineering Guide
- URL: https://promptguide.github.io
- 类型：教程/课程
- 概述：是提示工程领域较系统的外部实践资料站。
Anthropic Prompt Engineering
- URL: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
- 类型：官方文档
- 概述：提供更贴近 Claude 系统的提示实践入口。

0.5 `llm-security`#

`llm-security/01-提示注入与越狱攻击.md`#

Prompt Injection Attacks
- URL: https://www.jailbreaksearch.com/
- 类型：案例库/安全站点
- 概述：汇总注入与越狱样例，帮助建立攻击面直觉。

`llm-security/02-系统提示词泄露与数据提取.md`#

Cursor Security Advisory
- URL: https://cursor.sh/security
- 类型：官方文档
- 概述：适合补充现代 AI 工具在系统提示与权限边界上的现实风险。

`llm-security/03-代码执行与基础设施攻击.md`#

Cursor Security Advisory
- URL: https://cursor.sh/security
- 类型：官方文档
- 概述：作为产品级漏洞例子，说明代码代理场景的攻击后果更接近传统安全问题。

`llm-security/04-对抗性自动化攻击.md`#

GCG
- URL: https://arxiv.org/abs/2307.04757
- 类型：论文
- 概述：自动化越狱攻击的重要代表方法。

`llm-security/05-数据泄露与供应链攻击.md`#

Cursor Security Advisory
- URL: https://cursor.sh/security
- 类型：官方文档
- 概述：适合作为供应链与开发工具链风险的现实例子。

`llm-security/06-特定领域高危漏洞.md`#

GCG
- URL: https://arxiv.org/abs/2307.04757
- 类型：论文
- 概述：可迁移到高风险垂直场景中的攻击方法代表。

`llm-security/07-AI驱动的自动化攻击.md`#

Prompt Injection Attacks
- URL: https://www.jailbreaksearch.com/
- 类型：案例库/安全站点
- 概述：在 AI 自动化攻击时代仍然是重要的样例来源。

0.6 `llm` 旧系列#

`llm/004-calculate-llm-cost.md`#

Tiktokenizer
- URL: https://tiktokenizer.vercel.app/
- 类型：工具
- 概述：提供替代 token 估算器，让读者更方便地做成本估算。
Prompt Caching
- URL: https://www.anthropic.com/news/prompt-caching
- 类型：官方博客/公告
- 概述：为“如何节省上下文成本”提供厂商级实践思路。

`llm/005-master-prompt-engineering.md`#

Learn Prompting
- URL: https://learnprompting.org/
- 类型：教程/课程
- 概述：较系统的提示工程学习站点，适合作为入门和练习材料。

`llm/006-rag-knowledge-injection.md`#

Microsoft GraphRAG
- URL: https://github.com/microsoft/graphrag
- 类型：代码仓库
- 概述：提供图增强检索的实现入口，适合在传统向量检索之上继续深入。
Rerank Guide
- URL: https://www.cohere.com/documents/rerank-guide.pdf
- 类型：教程/课程
- 概述：补足召回后重排序这一常被忽略但很关键的工程环节。

`llm/009-boost-dev-efficiency.md`#

AGENTS.md Guide
- URL: https://github.com/anthropics/claude-code/blob/main/AGENTS.md
- 类型：代码仓库
- 概述：解释代理式编程如何通过仓库规范增强协作效果，是 AI 编程实践中很实用的资料。

`llm/010-build-ai-application.md`#

Building LLM Applications
- URL: https://www.pinecone.io/learn/series/llm-projects/
- 类型：教程/课程
- 概述：用项目化方式补充“怎么真正做出一个 AI 应用”。

0.7 独立文章#

`ai-vup.md`#

node-chatgpt-api
- URL: https://github.com/waylaidwanderer/node-chatgpt-api
- 类型：代码仓库
- 概述：提供 AI VTuber 方案中的对话接口实现参考。
ms-ra-forwarder
- URL: https://github.com/wxxxcxx/ms-ra-forwarder
- 类型：代码仓库
- 概述：用于补充 TTS 服务代理与中间层实现思路。
Live2D
- URL: https://www.live2d.com/
- 类型：官方文档
- 概述：对应 2D 虚拟形象驱动技术入口。
VRM
- URL: https://vrm.dev/
- 类型：官方文档
- 概述：对应 3D 虚拟人建模标准，是多模态角色系统的重要底座。
Whisper
- URL: https://github.com/openai/whisper
- 类型：代码仓库
- 概述：为语音识别链路提供开源模型入口。

`machine-learning-101.md`#

scikit-learn 官网
- URL: https://scikit-learn.org/stable/
- 类型：官方文档
- 概述：作为传统机器学习实战工具链的权威入口，适合做进一步练习与项目实验。

`ML系统设计.md`#

未发现显式 参考资料 区或稳定外部链接。
- 概述：这篇文章更偏内部方法总结和架构梳理，不以外部资料为主要论证方式。后续如果要长期维护，建议补上特征存储、模型注册、监控、漂移检测等方向的官方与开源参考链接。

第二部分：`AI 是怎么回事` 系列参考资料索引#

0.8 第 1 篇：AI 到底聪明在哪——从手机人脸识别说起#

Sobel operator - Wikipedia
- URL: https://en.wikipedia.org/wiki/Sobel_operator
- 类型：百科/维基
- 概述：用于说明 Sobel 算子这一经典边缘检测方法的历史来源和基本定位，支撑文章从“边缘检测”切入解释计算机视觉。
Artificial neuron - Wikipedia
- URL: https://en.wikipedia.org/wiki/Artificial_neuron
- 类型：百科/维基
- 概述：作为 McCulloch & Pitts 人工神经元思想的入口资料，帮助交代“神经网络”名称和早期思想来源。
ImageNet Classification with Deep Convolutional Neural Networks
- URL: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
- 类型：论文
- 时间：2012
- 层次：L1-模型机制层
- 机制：架构机制
- Tags: topic:architecture layer:model mechanism:architecture type:paper time:2012 status:foundational
- 概述：AlexNet 原始论文，用来证明 2012 年图像识别突破的技术细节和量化结果。
AlexNet - Wikipedia
- URL: https://en.wikipedia.org/wiki/AlexNet
- 类型：百科/维基
- 概述：补充 AlexNet 的层数、参数规模与硬件背景，帮助读者建立工程尺度感。
FaceNet
- URL: https://arxiv.org/abs/1503.03832
- 类型：论文
- 时间：2015
- 层次：L1-模型机制层
- 机制：架构机制
- Tags: topic:architecture layer:model mechanism:architecture type:paper time:2015 status:foundational
- 概述：作为人脸识别代表工作，用来说明“人脸向量”与高准确率识别的关系。
LFW
- URL: https://vis-www.cs.umass.edu/lfw/
- 类型：数据集/基准
- 时间：2007
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval layer:system mechanism:evaluation type:benchmark time:2007 status:foundational
- 概述：给出人脸识别测试集的权威出处，支撑 FaceNet 在真实数据上的效果描述。
Learning representations by back-propagating errors
- URL: https://www.nature.com/articles/323533a0
- 类型：论文
- 时间：1986
- 层次：L2-训练与对齐层
- 机制：对齐机制
- Tags: topic:architecture layer:training mechanism:alignment type:paper time:1986 status:foundational
- 概述：作为反向传播奠基论文，支撑文章对“模型如何学习”的解释。
ImageNet: A Large-Scale Hierarchical Image Database
- URL: https://www.image-net.org/static_files/papers/imagenet_cvpr09.pdf
- 类型：论文
- 时间：2009
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval layer:system mechanism:evaluation type:paper time:2009 status:foundational
- 概述：为海量标注数据在视觉突破中的作用提供正式学术来源。
The data that transformed AI research — and possibly the world
- URL: https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world
- 类型：新闻报道
- 概述：补充 ImageNet 众包标注背后的历史和劳动规模，让技术叙事更完整。
Large-scale Deep Unsupervised Learning using Graphics Processors
- URL: https://robotics.stanford.edu/~ang/papers/icml09-LargeScaleUnsupervisedDeepLearningGPU.pdf
- 类型：论文
- 概述：提供 GPU 对深度学习训练的速度提升证据，解释为什么 2012 年前后算力成为关键变量。
NVIDIA GeForce GTX 580 - VideoCardz
- URL: https://videocardz.com/nvidia/geforce-500/geforce-gtx-580
- 类型：新闻报道
- 概述：给出 AlexNet 训练所用 GPU 的具体硬件背景，增强现实感。

0.9 第 2 篇：AI 怎么读懂文字——国王减去男人等于什么#

Efficient Estimation of Word Representations in Vector Space
- URL: https://arxiv.org/abs/1301.3781
- 类型：论文
- 时间：2013
- 层次：L1-模型机制层
- 机制：架构机制
- Tags: topic:architecture layer:model mechanism:architecture type:paper time:2013 status:foundational
- 概述：Word2Vec 的关键论文，用来引出词向量与语义空间。
Distributed Representations of Words and Phrases and their Compositionality
- URL: https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf
- 类型：论文
- 时间：2013
- 层次：L1-模型机制层
- 机制：架构机制
- Tags: topic:architecture layer:model mechanism:architecture type:paper time:2013 status:foundational
- 概述：补充 Word2Vec 的训练方式与组合性，强化“语义可计算”的核心论点。
GloVe
- URL: https://nlp.stanford.edu/projects/glove/
- 类型：论文
- 时间：2014
- 层次：L1-模型机制层
- 机制：架构机制
- Tags: topic:architecture layer:model mechanism:architecture type:paper time:2014 status:foundational
- 概述：说明词向量并非单一路线，而是 NLP 领域普遍方向。
word2vec-google-news-300
- URL: https://code.google.com/archive/p/word2vec/
- 类型：代码仓库
- 概述：作为经典预训练词向量实例，用来支撑 300 维语义空间的具体感知。
tiktoken
- URL: https://github.com/openai/tiktoken
- 类型：代码仓库
- 概述：用于说明现代大模型的分词实现，让 token 化更接近真实工程。
OpenAI 模型文档
- URL: https://platform.openai.com/docs/models
- 类型：官方文档
- 概述：支撑上下文窗口等现代模型规格的现实说明。
King - Man + Woman = King?
- URL: https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935a85
- 类型：教程/课程
- 概述：用于纠正常见的词向量神话表达，增加文章严谨性。
What are tokens and how to count them?
- URL: https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
- 类型：官方文档
- 概述：为 token 换算与实际使用体验提供官方支撑。

0.10 第 3 篇：AI 是怎么突然变厉害的——2012 所有人以为他们作弊了#

ImageNet - Wikipedia
- URL: https://en.wikipedia.org/wiki/ImageNet
- 类型：百科/维基
- 概述：提供 ImageNet 的规模和比赛背景，是整篇故事的环境说明。
ImageNet Classification with Deep Convolutional Neural Networks
- URL: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
- 类型：论文
- 概述：AlexNet 原始论文，证明 2012 年成绩断层式领先并非神话。
ILSVRC 2012 Results
- URL: https://image-net.org/challenges/LSVRC/2012/results.html
- 类型：数据集/基准
- 概述：直接给出比赛结果，是“所有人以为他们作弊了”这一戏剧化叙事背后的硬证据。
ILSVRC 2015 Results
- URL: https://image-net.org/challenges/LSVRC/2015/results
- 类型：数据集/基准
- 概述：说明 AlexNet 之后深度学习路线仍在快速演进。
ILSVRC 2017 Results
- URL: https://image-net.org/challenges/LSVRC/2017/results
- 类型：数据集/基准
- 概述：用于展示 ImageNet 在深度学习时代被迅速“做穿”的阶段性终点。
What I learned from competing against a ConvNet on ImageNet
- URL: http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
- 类型：教程/课程
- 概述：提供人类对照成绩，让“超越人类”更具象。
Why the deep learning boom caught almost everyone by surprise
- URL: https://www.understandingai.org/p/why-the-deep-learning-boom-caught
- 类型：教程/课程
- 概述：为 2012 年突破的历史观感提供研究者回忆与后见分析。

0.11 第 4 篇：神经网络到底是什么——6000 万个旋钮的真相#

Artificial neuron
- URL: https://en.wikipedia.org/wiki/Artificial_neuron
- 类型：百科/维基
- 概述：解释人工神经元与人脑之间只是粗糙类比，而非真正仿生复制。
Vanishing gradient problem
- URL: https://en.wikipedia.org/wiki/Vanishing_gradient_problem
- 类型：百科/维基
- 概述：用于解释深层网络训练困难与激活函数选择的重要性。
Equal numbers of neuronal and nonneuronal cells make the human brain
- URL: https://pubmed.ncbi.nlm.nih.gov/19226510/
- 类型：论文
- 概述：用来给出人脑神经元数量的估计，强化“人工神经网络不等于大脑”的论点。
Deep Learning
- URL: https://www.deeplearningbook.org/
- 类型：教程/课程
- 概述：作为系统教材，为网络结构和激活函数等概念提供学理后盾。

0.12 第 5 篇：AI 是怎么学会的——从做错一道题说起#

Learning representations by back-propagating errors
- URL: https://www.nature.com/articles/323533a0
- 类型：论文
- 概述：解释“从错误中学习”的数学基础，是本篇最关键的理论来源。
GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE
- URL: https://www.semianalysis.com/p/gpt-4-architecture-infrastructure
- 类型：教程/课程
- 概述：用于估算 GPT-4 的训练规模、硬件与成本，把训练原理延伸到工业级现实。
Demystifying GPT-3
- URL: https://lambdalabs.com/blog/demystifying-gpt-3
- 类型：教程/课程
- 概述：用行业视角帮助读者理解 GPT-3 训练成本和硬件消耗。
Sam Altman Says the Age of Giant AI Models Is Already Over
- URL: https://www.wired.com/story/openai-ceo-sam-altman-the-age-of-giant-ai-models-is-already-over/
- 类型：新闻报道
- 概述：作为厂商和行业视角补充“规模训练”的现实边界与争议。

0.13 第 6 篇：ChatGPT 为什么能对话——一篇引用 17 万次的论文#

Attention Is All You Need
- URL: https://arxiv.org/abs/1706.03762
- 类型：论文
- 时间：2017
- 层次：L1-模型机制层
- 机制：注意力机制、架构机制
- Tags: topic:architecture topic:transformer layer:model mechanism:attention mechanism:architecture type:paper time:2017 status:foundational
- 概述：解释 Transformer 架构、注意力机制和现代大模型的起点。
Scaling Laws for Neural Language Models
- URL: https://arxiv.org/abs/2001.08361
- 类型：论文
- 时间：2020
- 层次：L0-范式层
- 机制：架构机制
- Tags: topic:scaling topic:architecture layer:paradigm mechanism:architecture type:paper time:2020 status:foundational
- 概述：说明为什么增加模型、数据和算力会带来可预测收益。
GPT-2 太危险，不公开？
- URL: https://slate.com/technology/2019/02/openai-gpt2-text-generating-algorithm-ai-dangerous.html
- 类型：新闻报道
- 概述：让 GPT-2 从技术论文进入社会反应层面，增加叙事张力。
Training language models to follow instructions with human feedback
- URL: https://arxiv.org/abs/2203.02155
- 类型：论文
- 时间：2022
- 层次：L2-训练与对齐层
- 机制：对齐机制
- Tags: topic:alignment topic:rlhf layer:training mechanism:alignment type:paper time:2022 status:foundational
- 概述：解释 ChatGPT 为什么比 GPT-3 更会“对话”和“听话”。
ChatGPT Release Date and Timeline
- URL: https://www.ofzenandcomputing.com/chatgpt-release-date/
- 类型：新闻报道
- 概述：用用户增长与时间线说明产品发布的社会影响力。
Attention Is All You Need - Semantic Scholar
- URL: https://www.semanticscholar.org/paper/Attention-is-All-you-Need-Vaswani-Shazeer/204e3073870fae3d05bcbc2f6a8e263d9b72e776
- 类型：论文数据库
- 概述：用引用次数来量化这篇论文的学术影响力。

0.14 第 7 篇：AI 为什么会撒谎——一个律师被 ChatGPT 骗了#

Lawyer apologizes for fake court citations from ChatGPT
- URL: https://www.cnn.com/2023/05/27/business/chat-gpt-avianca-mata-lawyers
- 类型：新闻报道
- 概述：是 Schwartz 律师伪造判例事件的核心媒体来源，帮助文章从真实事故切入。
Judge sanctions lawyers for brief written by A.I. with fake citations
- URL: https://www.cnbc.com/2023/06/22/judge-sanctions-lawyers-whose-ai-written-filing-contained-fake-citations.html
- 类型：新闻报道
- 概述：补充法律处罚后果，让“幻觉”从技术问题变成制度风险。
Mata v. Avianca, Inc. - Wikipedia
- URL: https://en.wikipedia.org/wiki/Mata_v._Avianca,_Inc.
- 类型：百科/维基
- 概述：汇总案件时间线和关键细节，适合作为事件索引页。
Inceptionism: Going Deeper into Neural Networks
- URL: https://research.google/blog/inceptionism-going-deeper-into-neural-networks/
- 类型：官方博客/公告
- 概述：用 DeepDream 的诡异图像说明模型会放大统计共现模式，而非真正理解对象。
Explaining and Harnessing Adversarial Examples
- URL: https://arxiv.org/abs/1412.6572
- 类型：论文
- 时间：2014
- 层次：L6-安全与治理层
- 机制：安全攻击机制
- Tags: topic:security layer:security mechanism:security-attack type:paper time:2014 status:foundational
- 概述：以“熊猫被加噪后变长臂猿”为例说明模型识别的脆弱性。
What Are AI Hallucinations?
- URL: https://www.ibm.com/think/topics/ai-hallucinations
- 类型：官方文档
- 概述：为幻觉定义与行业性解释提供补充材料。
Why Language Models Hallucinate
- URL: https://openai.com/index/why-language-models-hallucinate/
- 类型：官方博客/公告
- 时间：2025
- 层次：L6-安全与治理层
- 机制：安全防御机制
- Tags: topic:security layer:security mechanism:security-defense type:doc time:2025 status:reference
- 概述：呈现模型开发方对幻觉成因的解释，用来支撑“这不是偶发 bug，而是架构后果”。

0.15 第 8 篇：ChatGPT 回答你的三秒钟里，发生了什么？#

前 1-7 篇文章站内交叉引用
- 类型：站内交叉引用
- 概述：这篇文章本身几乎不新增外部来源，而是综合前 1-7 篇已经建立的来源体系，完成第一章的总收束。

0.16 第 9 篇：AI 到底有多聪明？——一份让 AI 研究者也困惑的成绩单#

GPT-4 Technical Report
- URL: https://openai.com/index/gpt-4-research/
- 类型：官方博客/公告
- 时间：2023
- 层次：L0-范式层
- 机制：架构机制
- Tags: topic:architecture topic:reasoning layer:paradigm mechanism:architecture type:doc time:2023 status:foundational
- 概述：用作 GPT-4 在标准化考试中表现的官方来源，是“高分表现”部分的核心证据。
Introducing GPT-5.2
- URL: https://openai.com/index/introducing-gpt-5-2/
- 类型：官方博客/公告
- 概述：用于更新更近代模型在数学竞赛等场景的官方成绩描述。
GSM-Symbolic
- URL: https://machinelearning.apple.com/research/gsm-symbolic
- 类型：官方博客/公告
- 时间：2025
- 层次：L5-系统工程层
- 机制：评测机制、推理机制
- Tags: topic:reasoning topic:eval layer:system mechanism:evaluation mechanism:reasoning type:paper time:2025 status:frontier
- 概述：用来说明模型可能是在匹配题型，而不是真正掌握推理规则。
The Illusion of Thinking
- URL: https://machinelearning.apple.com/research/illusion-of-thinking
- 类型：官方博客/公告
- 时间：2025
- 层次：L5-系统工程层
- 机制：评测机制、推理机制
- Tags: topic:reasoning topic:eval layer:system mechanism:evaluation mechanism:reasoning type:paper time:2025 status:frontier
- 概述：支撑“复杂度稍高就崩溃”的批判性论证，是这一篇的关键反例来源。
ARC Prize 2025 Results Analysis
- URL: https://arcprize.org/blog/arc-prize-2025-results-analysis
- 类型：数据集/基准
- 时间：2025
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval topic:reasoning layer:system mechanism:evaluation type:benchmark time:2025 status:frontier
- 概述：用于说明当前模型在抽象推理与组合泛化上仍明显落后。
MMLU-Pro Leaderboard
- URL: https://artificialanalysis.ai/evaluations/mmlu-pro
- 类型：数据集/基准
- 时间：2025
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval topic:reasoning layer:system mechanism:evaluation type:benchmark time:2025 status:frontier
- 概述：给出更难 benchmark 上的横向比较结果。
Humanity’s Last Exam
- URL: https://agi.safe.ai/
- 类型：数据集/基准
- 时间：2025
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval topic:reasoning layer:system mechanism:evaluation type:benchmark time:2025 status:frontier
- 概述：用来强调“更难评测”的提出本身，就是对旧 benchmark 被做穿的回应。

0.17 第 10 篇：AI 能“创造”吗？——从一团噪声到一幅画#

Denoising Diffusion Probabilistic Models
- URL: https://arxiv.org/abs/2006.11239
- 类型：论文
- 时间：2020
- 层次：L1-模型机制层
- 机制：架构机制
- Tags: topic:multimodal layer:model mechanism:architecture type:paper time:2020 status:foundational
- 概述：本篇最核心的技术来源，用来解释 AI 绘画本质上是“去噪”而不是“凭空作画”。
Latent Diffusion Models
- URL: https://arxiv.org/abs/2112.10752
- 类型：论文
- 时间：2021
- 层次：L1-模型机制层
- 机制：架构机制
- Tags: topic:multimodal layer:model mechanism:architecture type:paper time:2021 status:foundational
- 概述：解释 Stable Diffusion 这类模型如何在潜在空间中高效生成图像。
CLIP
- URL: https://openai.com/index/clip/
- 类型：官方博客/公告
- 时间：2021
- 层次：L1-模型机制层
- 机制：多模态机制
- Tags: topic:multimodal layer:model mechanism:multimodal type:doc time:2021 status:foundational
- 概述：用于说明文本如何成为图像生成的指导信号。
Why AI hands are nightmares
- URL: https://www.sciencefocus.com/future-technology/why-ai-generated-hands-are-the-stuff-of-nightmares-explained-by-a-scientist
- 类型：新闻报道
- 概述：用“画不好手”这一失败案例具体说明生成模型并不真正理解三维结构。
Stable Diffusion with Diffusers
- URL: https://huggingface.co/blog/stable_diffusion
- 类型：教程/课程
- 概述：补充开发者视角的工程实现路径。

0.18 第 11 篇：为什么 AI 能赢世界冠军，却开不好车？#

AlphaGo versus Lee Sedol
- URL: https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
- 类型：百科/维基
- 时间：2016
- 层次：L0-范式层
- 机制：搜索机制
- Tags: topic:reasoning layer:paradigm mechanism:search type:wiki time:2016 status:reference
- 概述：提供围棋世界冠军对战的背景与结果，是“封闭规则任务中 AI 极强”的代表案例。
Highly accurate protein structure prediction with AlphaFold
- URL: https://www.nature.com/articles/s41586-021-03819-2
- 类型：论文
- 时间：2021
- 层次：L0-范式层
- 机制：架构机制
- Tags: topic:architecture topic:multimodal layer:paradigm mechanism:architecture type:paper time:2021 status:foundational
- 概述：支撑 AlphaFold 在蛋白质结构预测上的突破性表现，是“结构明确任务中 AI 极强”的另一支点。
Nobel Prize in Chemistry 2024
- URL: https://www.nobelprize.org/prizes/chemistry/2024/press-release/
- 类型：机构报告/政策
- 概述：证明 AlphaFold 类工作的科学影响力已获最高级别认可。
Tesla Vehicle Safety Report
- URL: https://www.tesla.com/VehicleSafetyReport
- 类型：官方文档
- 概述：提供自动驾驶在常规场景下的安全性数据背景。
Waymo crash data comparison
- URL: https://waymo.com/research/comparison-of-waymo-rider-only-crash-data-to-human/
- 类型：官方文档
- 概述：用于对比人类驾驶与自动驾驶系统在现实道路中的表现。
List of Tesla Autopilot crashes
- URL: https://en.wikipedia.org/wiki/List_of_Tesla_Autopilot_crashes
- 类型：百科/维基
- 概述：作为长尾失败案例入口，帮助文章从“平均安全”转向“极端风险”。
skin cancer diagnosis meta-analysis
- URL: https://www.nature.com/articles/s41746-024-01103-x
- 类型：论文
- 概述：说明医疗影像类标准化任务为什么是 AI 的强项。
Whisper
- URL: https://arxiv.org/abs/2212.04356
- 类型：论文
- 时间：2022
- 层次：L1-模型机制层
- 机制：语音理解机制
- Tags: topic:multimodal topic:speech layer:model mechanism:speech-understanding type:paper time:2022 status:foundational
- 概述：作为语音识别强场景的代表，补充“AI 擅长什么”的能力谱系。

0.19 第 12 篇：这个框架会过时吗？——AI 的天花板和你的判断力#

Performance of ChatGPT on USMLE
- URL: https://pmc.ncbi.nlm.nih.gov/articles/PMC9931230/
- 类型：论文
- 时间：2023
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval topic:reasoning layer:system mechanism:evaluation type:paper time:2023 status:reference
- 概述：支撑“AI 能通过医学考试”这一热点叙事，但也为后续反思提供前提。
AI tools tackle paper mill fraud
- URL: https://www.chemistryworld.com/features/ai-tools-tackle-paper-mill-fraud-overwhelming-peer-review/4022253.article
- 类型：新闻报道
- 概述：说明 AI 在论文文本和评审场景中的双刃剑效应。
The case against predicting tokens to build AGI
- URL: https://the-decoder.com/the-case-against-predicting-tokens-to-build-agi/
- 类型：新闻报道
- 概述：引入 Yann LeCun 对“预测下一个 token”路线的根本性质疑。
The Great AI Retrenchment has begun
- URL: https://garymarcus.substack.com/p/the-great-ai-retrenchment-has-begun
- 类型：教程/课程
- 概述：作为“统计模式匹配无法产生真正理解”路线的批评代表。
AGI by 2032 is extremely unlikely
- URL: https://forum.effectivealtruism.org/posts/sQSCqpm9Ymwiu8rdb/agi-by-2032-is-extremely-unlikely
- 类型：教程/课程
- 概述：用于平衡激进 AGI 时间线的流行叙事。
ARC Prize 2025 Results Analysis
- URL: https://arcprize.org/blog/arc-prize-2025-results-analysis
- 类型：数据集/基准
- 概述：作为“是否越过真正抽象推理门槛”的探针 benchmark。

0.20 第 13 篇：怎么让 AI 听懂你的话——同一个 AI 为什么他用得比你好 10 倍#

Chain-of-Thought Prompting
- URL: https://arxiv.org/abs/2201.11903
- 类型：论文
- 时间：2022
- 层次：L3-推理与解码层
- 机制：推理机制
- Tags: topic:reasoning topic:cot layer:inference mechanism:reasoning type:paper time:2022 status:foundational
- 概述：支撑“显式展示中间步骤会提升推理表现”的主论点。
Language Models are Few-Shot Learners
- URL: https://arxiv.org/abs/2005.14165
- 类型：论文
- 概述：为 few-shot prompting 的有效性提供历史来源。
Self-Consistency
- URL: https://arxiv.org/abs/2203.11171
- 类型：论文
- 时间：2022
- 层次：L3-推理与解码层
- 机制：推理机制、搜索机制
- Tags: topic:reasoning topic:cot layer:inference mechanism:reasoning mechanism:search type:paper time:2022 status:foundational
- 概述：说明多次推理取一致答案的价值。
The Prompt Report
- URL: https://arxiv.org/abs/2406.06608
- 类型：论文
- 概述：作为提示工程综述，为整篇文章提供系统化背景。
When “A Helpful Assistant” Is Not Really Helpful
- URL: https://arxiv.org/abs/2311.10054
- 类型：论文
- 概述：提醒角色设定和风格提示并不必然提升事实准确性。
Playing Pretend: Expert Personas Don’t Improve Factual Accuracy
- URL: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5879722
- 类型：论文
- 概述：进一步支持“角色提示影响风格多于事实正确性”的判断。
The Impact of Prompt Bloat on LLM Output Quality
- URL: https://mlops.community/the-impact-of-prompt-bloat-on-llm-output-quality/
- 类型：教程/课程
- 概述：用工程视角说明“提示词不是越长越好”。

0.21 第 14 篇：怎么跟 AI 协作不翻车——AI 说的话你该信几分#

Legal Dive: fake legal cases generated by ChatGPT
- URL: https://www.legaldive.com/news/chatgpt-fake-legal-cases-generative-ai-hallucinations/651557/
- 类型：新闻报道
- 概述：作为“AI 不能盲信”的法律案例入口。
As more lawyers fall for AI hallucinations
- URL: https://cronkitenews.azpbs.org/2025/10/28/lawyers-ai-hallucinations-chatgpt/
- 类型：新闻报道
- 概述：说明法律场景中的幻觉事故并非一次性事件。
AI Hallucination Rates by Model
- URL: https://www.visualcapitalist.com/sp/ter02-ranked-ai-hallucination-rates-by-model/
- 类型：数据集/基准
- 概述：用模型横向比较帮助读者理解幻觉是系统性问题。
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
- URL: https://dho.stanford.edu/wp-content/uploads/Legal_RAG_Hallucinations.pdf
- 类型：论文
- 概述：说明即便专业工具也无法保证无幻觉。
Delayed diagnosis caused by ChatGPT
- URL: https://pmc.ncbi.nlm.nih.gov/articles/PMC11006786/
- 类型：论文
- 概述：把协作风险延伸到医疗高风险场景。
automation bias
- URL: https://link.springer.com/article/10.1007/s00146-025-02422-7
- 类型：论文
- 概述：解释为什么人类即使知道 AI 可能出错，也仍会倾向于相信它。
MIT Sloan: humans and AI work best together
- URL: https://mitsloan.mit.edu/ideas-made-to-matter/when-humans-and-ai-work-best-together-and-when-each-better-alone
- 类型：机构报告/政策
- 概述：帮助文章形成“AI 是强助手，但不该替你做最终判断”的协作框架。

0.22 第 15 篇：AI 写代码有多厉害？——快了 55%，但错多了 75%#

Karpathy: The hottest new programming language is English
- URL: https://x.com/karpathy/status/1617979122625712128
- 类型：社交媒体/公开发言
- 概述：作为自然语言编程文化转向的标志性表达。
Karpathy: vibe coding
- URL: https://x.com/karpathy/status/1886192184808149383
- 类型：社交媒体/公开发言
- 概述：是本篇文章的文化中心，标记 AI 编程新范式的命名。
The Impact of AI on Developer Productivity: Evidence from GitHub Copilot
- URL: https://arxiv.org/abs/2302.06590
- 类型：论文
- 概述：为“快了 55%”提供学术来源。
State of AI vs Human Code Generation Report
- URL: https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report
- 类型：机构报告/政策
- 概述：为“逻辑错误高出 75%”提供关键行业数据来源。
State of AI Code Quality in 2025
- URL: https://www.qodo.ai/reports/state-of-ai-code-quality/
- 类型：机构报告/政策
- 概述：说明 AI 代码质量的核心问题往往不是语法，而是项目上下文缺失。
METR open-source developer productivity study
- URL: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
- 类型：机构报告/政策
- 概述：提供“真实大型项目里未必更快”的重要反例。
TechCrunch: YC current cohort AI-generated codebases
- URL: https://techcrunch.com/2025/03/06/a-quarter-of-startups-in-ycs-current-cohort-have-codebases-that-are-almost-entirely-ai-generated/
- 类型：新闻报道
- 概述：用创业公司案例说明 AI 代码生成已经进入真实生产链。
BLS employment projections
- URL: https://www.bls.gov/opub/ted/2025/ai-impacts-in-bls-employment-projections.htm
- 类型：机构报告/政策
- 概述：用于讨论 AI 对软件岗位结构的现实影响。

0.23 第 16 篇：AI 会取代我们吗——它不懂孤独是什么意思#

Chinese room
- URL: https://en.wikipedia.org/wiki/Chinese_room
- 类型：百科/维基
- 概述：作为哲学锚点，支撑“会生成语言不等于理解语言”的命题。
Hubert Dreyfus' views on artificial intelligence
- URL: https://en.wikipedia.org/wiki/Hubert_Dreyfus%27s_views_on_artificial_intelligence
- 类型：百科/维基
- 概述：引入具身认知和反符号主义批评，帮助文章上升到哲学层面。
Embodied Cognition and AI
- URL: https://royalsocietypublishing.org/doi/10.1098/rstb.2019.0694
- 类型：论文
- 概述：从认知科学角度支撑“人类理解与身体经验深度绑定”的论点。
Large Language Models are Geographically Biased
- URL: https://arxiv.org/abs/2310.12931
- 类型：论文
- 概述：说明语言模型并未真正建立世界模型，而是在语言分布上拟合规律。
The Future of Jobs Report 2025
- URL: https://www.weforum.org/publications/the-future-of-jobs-report-2025/
- 类型：机构报告/政策
- 概述：把哲学层讨论落回到就业结构变化。
Generative AI and the future of work in America
- URL: https://www.mckinsey.com/mgi/our-research/generative-ai-and-the-future-of-work-in-america
- 类型：机构报告/政策
- 概述：支撑“AI 替代的是工作中的部分任务，而非整个职业”的现实判断。
The debate over understanding in AI’s large language models
- URL: https://www.pnas.org/doi/10.1073/pnas.2215907120
- 类型：论文
- 概述：为“LLM 是否真正理解”的高层学术争论提供权威入口。

第三部分：标准化核心参考图谱（按方向 / 时间线 / 层次 / 机制）#

这一部分是整份文件的 规范层。与前两部分“按文章回溯来源”不同，这里把关键资源统一标准化，所有关键条目都尽量包含：时间 / 层次 / 机制 / Tags。

0.24 标准字段说明#

时间：优先使用 年份 或 年份 + 季度，例如 2017、2025 Q1、2026 Q1
层次：
- L0-范式层：整体范式、任务框架、技术路线
- L1-模型机制层：模型内部结构、注意力、位置编码、架构创新
- L2-训练与对齐层：预训练、后训练、偏好优化、微调
- L3-推理与解码层：CoT、搜索、采样、解码、test-time compute
- L4-知识与工具层：RAG、Tool Use、MCP、A2A、外部系统接入
- L5-系统工程层：serving、部署、观测、评测、性能优化、本地运行
- L6-安全与治理层：攻击、防御、合规、组织治理
机制：对资源主要作用点的归类，如 注意力机制、推理机制、检索机制、协议机制
Tags：统一使用：
- topic:*
- layer:*
- mechanism:*
- type:*
- time:*
- status:*

0.25 基础架构与模型机制#

时间线#

2017 Transformer
2019-2023 MQA / GQA / 稀疏与高效注意力
2021-2024 RoPE / 长上下文位置编码扩展
2022 DeepNet / 深层稳定训练
2026 Q1 Attention Residuals

核心资源#

Attention Is All You Need
- URL: https://arxiv.org/abs/1706.03762
- 类型：论文
- 时间：2017
- 层次：L1-模型机制层
- 机制：注意力机制、架构机制
- Tags: topic:architecture topic:transformer layer:model mechanism:attention mechanism:architecture type:paper time:2017 status:foundational
- 概述：Transformer 原点，定义了自注意力与并行化建模，是整个现代 LLM 体系的起点。
RoFormer: Enhanced Transformer with Rotary Position Embedding
- URL: https://arxiv.org/abs/2104.09864
- 类型：论文
- 时间：2021
- 层次：L1-模型机制层
- 机制：位置编码机制
- Tags: topic:architecture topic:long-context layer:model mechanism:position-encoding type:paper time:2021 status:foundational
- 概述：RoPE 是近年主流开源与商用模型广泛采用的位置编码机制，也是长上下文扩展技术的重要基底。
YaRN: Efficient Context Window Extension of Large Language Models
- URL: https://arxiv.org/abs/2309.00071
- 类型：论文
- 时间：2023
- 层次：L1-模型机制层
- 机制：位置编码机制、长上下文机制
- Tags: topic:long-context layer:model mechanism:position-encoding mechanism:long-context type:paper time:2023 status:frontier
- 概述：代表 RoPE 外推与长上下文扩展的一条重要工程路线，适合与 Long Context 相关文章配套阅读。
DeepNet: Scaling Transformers to 1,000 Layers
- URL: https://arxiv.org/abs/2203.00555
- 类型：论文
- 时间：2022
- 层次：L1-模型机制层
- 机制：架构机制、深层稳定性机制
- Tags: topic:architecture topic:scaling layer:model mechanism:architecture mechanism:stability type:paper time:2022 status:foundational
- 概述：深层 Transformer 稳定训练谱系中的关键论文，适合与 AttnRes 一起建立“为什么深层会稀释”的背景。
Fast Transformer Decoding: One Write-Head is All You Need
- URL: https://arxiv.org/abs/1911.06507
- 类型：论文
- 时间：2019
- 层次：L1-模型机制层
- 机制：注意力机制、推理加速机制
- Tags: topic:architecture topic:inference layer:model mechanism:attention mechanism:inference-acceleration type:paper time:2019 status:foundational
- 概述：MQA 的代表性来源，直接影响后续长上下文和推理性能优化。
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
- URL: https://arxiv.org/abs/2305.13245
- 类型：论文
- 时间：2023
- 层次：L1-模型机制层
- 机制：注意力机制、推理加速机制
- Tags: topic:architecture topic:inference layer:model mechanism:attention mechanism:inference-acceleration type:paper time:2023 status:foundational
- 概述：GQA 是 MHA 与 MQA 的折中路线，已经成为很多现代模型的默认工程配置。
FlashAttention
- URL: https://arxiv.org/abs/2205.14135
- 类型：论文
- 时间：2022
- 层次：L1-模型机制层
- 机制：注意力机制、推理加速机制
- Tags: topic:attention topic:serving layer:model mechanism:attention mechanism:inference-acceleration type:paper time:2022 status:foundational
- 概述：把 IO-aware 思想带入注意力计算，是训练与推理性能优化的关键工作。
Attention Residuals
- URL: https://arxiv.org/abs/2603.15031
- 类型：论文
- 时间：2026 Q1
- 层次：L1-模型机制层
- 机制：注意力机制、架构机制、深层稳定性机制
- Tags: topic:architecture topic:reasoning layer:model mechanism:attention mechanism:architecture mechanism:stability type:paper time:2026Q1 status:frontier
- 概述：Kimi 团队提出的 AttnRes 用内容相关的深度注意力替代固定残差累加，目标是缓解 PreNorm 深层网络中的信息稀释问题。
MoonshotAI/Attention-Residuals
- URL: https://github.com/MoonshotAI/Attention-Residuals
- 类型：代码仓库
- 时间：2026 Q1
- 层次：L1-模型机制层
- 机制：注意力机制、架构机制
- Tags: topic:architecture topic:kimi layer:model mechanism:attention mechanism:architecture type:repo time:2026Q1 status:frontier
- 概述：AttnRes 官方仓库，包含论文 PDF、结构示意图和 Block AttnRes 伪代码，是跟进这条新架构路线的第一手工程入口。

0.26 推理机制与测试时计算#

时间线#

2022 CoT / Zero-shot CoT / Self-Consistency / ReAct
2023 Tree of Thoughts / verifier / deliberate reasoning 扩展
2024-2025 o 系列、R1、reasoning models 商用化
2025-2026 reasoning 批判、互动式 reasoning benchmark、新的评测框架

核心资源#

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- URL: https://arxiv.org/abs/2201.11903
- 类型：论文
- 时间：2022
- 层次：L3-推理与解码层
- 机制：推理机制
- Tags: topic:reasoning topic:cot layer:inference mechanism:reasoning type:paper time:2022 status:foundational
- 概述：思维链奠基论文，是 prompting、reasoning、agent planning 三条线共同的前史。
Large Language Models are Zero-Shot Reasoners
- URL: https://arxiv.org/abs/2205.11916
- 类型：论文
- 时间：2022
- 层次：L3-推理与解码层
- 机制：推理机制
- Tags: topic:reasoning topic:cot layer:inference mechanism:reasoning type:paper time:2022 status:foundational
- 概述：证明即使不给示例，简单触发语句也能显著提升推理质量，是 reasoning 技巧普及的重要起点。
Self-Consistency Improves Chain of Thought Reasoning in Language Models
- URL: https://arxiv.org/abs/2203.11171
- 类型：论文
- 时间：2022
- 层次：L3-推理与解码层
- 机制：推理机制、搜索机制
- Tags: topic:reasoning topic:cot layer:inference mechanism:reasoning mechanism:search type:paper time:2022 status:foundational
- 概述：把“多次推理再投票”系统化，是 test-time compute 的早期代表思路。
Tree of Thoughts
- URL: https://arxiv.org/abs/2305.10601
- 类型：论文
- 时间：2023
- 层次：L3-推理与解码层
- 机制：推理机制、搜索机制
- Tags: topic:reasoning topic:search layer:inference mechanism:reasoning mechanism:search type:paper time:2023 status:foundational
- 概述：把线性思维链扩展到树搜索，是结构化 deliberation 的经典入口。
ReAct
- URL: https://arxiv.org/abs/2210.03629
- 类型：论文
- 时间：2022
- 层次：L3-推理与解码层
- 机制：推理机制、工具调用机制
- Tags: topic:reasoning topic:agent layer:inference mechanism:reasoning mechanism:tool-use type:paper time:2022 status:foundational
- 概述：把 reasoning 与 acting 结合起来，是 Agent 路线最重要的原型论文之一。
Learning to Reason with LLMs
- URL: https://openai.com/index/learning-to-reason-with-llms/
- 类型：官方博客/公告
- 时间：2024-2025
- 层次：L3-推理与解码层
- 机制：推理机制、测试时计算机制
- Tags: topic:reasoning topic:test-time-compute layer:inference mechanism:reasoning mechanism:search type:doc time:2025 status:frontier
- 概述：OpenAI 对 reasoning models 路线的公开解释材料，适合连接 CoT 方法史与产品化 reasoning models。
Introducing o3 and o4-mini
- URL: https://openai.com/index/introducing-o3-and-o4-mini/
- 类型：官方博客/公告
- 时间：2025
- 层次：L3-推理与解码层
- 机制：推理机制、测试时计算机制
- Tags: topic:reasoning topic:o-series layer:inference mechanism:reasoning mechanism:test-time-compute type:doc time:2025 status:frontier
- 概述：推理模型商用化的重要节点，适合作为 reasoning 时代的产品化时间戳。
DeepSeek-R1
- URL: https://arxiv.org/abs/2501.12948
- 类型：论文
- 时间：2025 Q1
- 层次：L2-训练与对齐层
- 机制：对齐机制、推理训练机制
- Tags: topic:reasoning topic:alignment layer:training mechanism:alignment mechanism:reasoning type:paper time:2025Q1 status:frontier
- 概述：是开源推理模型与后训练强化学习结合的重要节点，也适合作为 reasoning 训练路线的核心材料。
GSM-Symbolic
- URL: https://machinelearning.apple.com/research/gsm-symbolic
- 类型：官方博客/公告
- 时间：2025
- 层次：L5-系统工程层
- 机制：评测机制、推理批判机制
- Tags: topic:reasoning topic:eval layer:system mechanism:evaluation mechanism:reasoning type:paper time:2025 status:frontier
- 概述：提醒 reasoning benchmark 的高分并不必然意味着真正的抽象理解，是思维链路线的重要反思资料。
The Illusion of Thinking
- URL: https://machinelearning.apple.com/research/illusion-of-thinking
- 类型：官方博客/公告
- 时间：2025
- 层次：L5-系统工程层
- 机制：评测机制、推理批判机制
- Tags: topic:reasoning topic:eval layer:system mechanism:evaluation mechanism:reasoning type:paper time:2025 status:frontier
- 概述：用更严苛任务设定重新审视 reasoning models 的边界，是 2025 年必须保留的批判性材料。
ARC Prize
- URL: https://arcprize.org/
- 类型：数据集/基准
- 时间：持续更新
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval topic:reasoning layer:system mechanism:evaluation type:benchmark status:frontier
- 概述：抽象推理与组合泛化 benchmark，是 reasoning 讨论里最值得长期跟踪的 harder benchmark 之一。
ARC-AGI-3
- URL: https://arcprize.org/arc-agi/3
- 类型：数据集/基准
- 时间：2026 Q1
- 层次：L5-系统工程层
- 机制：评测机制、交互式推理机制
- Tags: topic:eval topic:interactive-reasoning layer:system mechanism:evaluation mechanism:reasoning type:benchmark time:2026Q1 status:frontier
- 概述：ARC-AGI-3 把 benchmark 从静态题目推进到交互式推理环境，是 agentic reasoning 时代值得重点关注的新评测方向。

0.27 RAG、长上下文与知识增强#

时间线#

2020 RAG / Longformer
2021-2023 RoPE / YaRN / 长上下文扩展工程化
2023-2024 long context 使用边界开始被系统评测，Self-RAG / CRAG 出现
2024 GraphRAG / Contextual Retrieval
2025-2026 long context 与 RAG 的重新分工、rerank / hybrid retrieval 再受重视

核心资源#

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- URL: https://arxiv.org/abs/2005.11401
- 类型：论文
- 时间：2020
- 层次：L4-知识与工具层
- 机制：检索机制
- Tags: topic:rag topic:retrieval layer:knowledge mechanism:retrieval type:paper time:2020 status:foundational
- 概述：RAG 的起点，定义了“先查资料再生成”的标准框架。
Longformer
- URL: https://arxiv.org/abs/2004.05150
- 类型：论文
- 时间：2020
- 层次：L1-模型机制层
- 机制：长上下文机制、注意力机制
- Tags: topic:long-context layer:model mechanism:attention mechanism:long-context type:paper time:2020 status:foundational
- 概述：代表长文档建模路线的早期经典工作，适合和 RAG 起点形成对照。
Lost in the Middle: How Language Models Use Long Contexts
- URL: https://arxiv.org/abs/2307.03172
- 类型：论文
- 时间：2023
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:long-context topic:eval layer:system mechanism:evaluation type:paper time:2023 status:foundational
- 概述：这是长上下文能力讨论里最常被引用的反例论文之一，说明“上下文窗口很大”不等于“模型能稳定利用上下文中间信息”。
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
- URL: https://arxiv.org/abs/2310.11511
- 类型：论文
- 时间：2023
- 层次：L4-知识与工具层
- 机制：检索机制、推理机制
- Tags: topic:rag topic:reasoning layer:knowledge mechanism:retrieval mechanism:reasoning type:paper time:2023 status:foundational
- 概述：Self-RAG 让模型在检索、生成和自我批判之间形成闭环，是 RAG 从“外接资料”走向“带反思控制的知识增强系统”的代表工作。
Corrective Retrieval Augmented Generation
- URL: https://arxiv.org/abs/2401.15884
- 类型：论文
- 时间：2024
- 层次：L4-知识与工具层
- 机制：检索机制、重排机制
- Tags: topic:rag topic:retrieval layer:knowledge mechanism:retrieval mechanism:reranking type:paper time:2024 status:frontier
- 概述：CRAG 把“检索结果本身可能有问题”纳入系统设计，是 RAG 从简单拼接资料走向检索纠错和质量控制的重要节点。
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
- URL: https://arxiv.org/abs/2308.14508
- 类型：论文
- 时间：2023
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:long-context topic:eval layer:system mechanism:evaluation type:paper time:2023 status:foundational
- 概述：LongBench 是长上下文能力评测的高频基准之一，适合用来补齐“长 context 能力不能只看厂商公告”的验证层。
GraphRAG
- URL: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
- 类型：官方博客/公告
- 时间：2024
- 层次：L4-知识与工具层
- 机制：检索机制、图结构知识机制
- Tags: topic:rag topic:graphrag layer:knowledge mechanism:retrieval mechanism:knowledge-graph type:doc time:2024 status:frontier
- 概述：把关系结构与叙事文档引入检索层，代表 RAG 从“向量召回”走向“结构化知识发现”。
Introducing Contextual Retrieval
- URL: https://www.anthropic.com/news/contextual-retrieval
- 类型：官方博客/公告
- 时间：2024
- 层次：L4-知识与工具层
- 机制：检索机制、重排机制
- Tags: topic:rag topic:retrieval layer:knowledge mechanism:retrieval mechanism:reranking type:doc time:2024 status:frontier
- 概述：强调 chunk 之外的上下文增强，是提升 RAG 召回质量的工程型参考。
LangChain RAG 文档
- URL: https://python.langchain.com/docs/tutorials/rag/
- 类型：官方文档
- 时间：持续更新
- 层次：L4-知识与工具层
- 机制：检索机制
- Tags: topic:rag layer:knowledge mechanism:retrieval type:doc status:engineering
- 概述：适合把 RAG 从论文概念推进到应用原型。
LlamaIndex 文档
- URL: https://docs.llamaindex.ai/
- 类型：官方文档
- 时间：持续更新
- 层次：L4-知识与工具层
- 机制：检索机制、索引机制
- Tags: topic:rag layer:knowledge mechanism:retrieval mechanism:indexing type:doc status:engineering
- 概述：更偏文档工程与索引层，是知识增强应用的重要实现资料。
Cohere Rerank Guide
- URL: https://www.cohere.com/documents/rerank-guide.pdf
- 类型：教程/课程
- 时间：2024-2025
- 层次：L4-知识与工具层
- 机制：重排机制
- Tags: topic:rag topic:reranking layer:knowledge mechanism:reranking type:doc status:engineering
- 概述：适合补齐“召回之后如何排序”这一 RAG 实践里常被忽视但很关键的层次。

0.28 Agent、工具调用与协议#

时间线#

2022 MRKL / ReAct
2023 Generative Agents / Toolformer / Memory 扩展
2024 MCP
2025-2026 A2A / Computer Use / tool-native agent protocols

核心资源#

MRKL Systems
- URL: https://arxiv.org/abs/2205.00445
- 类型：论文
- 时间：2022
- 层次：L4-知识与工具层
- 机制：工具调用机制、模块化机制
- Tags: topic:agent topic:tool-use layer:knowledge mechanism:tool-use mechanism:modularity type:paper time:2022 status:foundational
- 概述：把 LLM 与工具系统组合成模块化架构，是 Agent 工程很早的思想来源。
Toolformer
- URL: https://arxiv.org/abs/2302.04761
- 类型：论文
- 时间：2023
- 层次：L4-知识与工具层
- 机制：工具调用机制
- Tags: topic:tool-use topic:agent layer:knowledge mechanism:tool-use type:paper time:2023 status:foundational
- 概述：说明模型如何在训练层面学会使用工具，是 Tool Use 方向的重要早期论文。
Generative Agents
- URL: https://arxiv.org/abs/2304.03442
- 类型：论文
- 时间：2023
- 层次：L4-知识与工具层
- 机制：记忆机制、Agent 机制
- Tags: topic:agent topic:memory layer:knowledge mechanism:memory mechanism:agent type:paper time:2023 status:foundational
- 概述：记忆、反思、行为连续性在 Agent 系统中的经典代表。
OpenAI Function Calling
- URL: https://platform.openai.com/docs/guides/function-calling
- 类型：官方文档
- 时间：持续更新
- 层次：L4-知识与工具层
- 机制：工具调用机制
- Tags: topic:tool-use layer:knowledge mechanism:tool-use type:doc status:engineering
- 概述：主流平台的结构化工具调用接口，是“模型做决策，应用做执行”的基础工程资料。
Anthropic Tool Use
- URL: https://docs.anthropic.com/en/docs/build-with-claude/tool-use
- 类型：官方文档
- 时间：持续更新
- 层次：L4-知识与工具层
- 机制：工具调用机制
- Tags: topic:tool-use layer:knowledge mechanism:tool-use type:doc status:engineering
- 概述：Claude 体系下的 Tool Use 文档，是 OpenAI 路线的重要补充。
Anthropic Computer Use
- URL: https://docs.anthropic.com/en/docs/agents-and-tools/computer-use
- 类型：官方文档
- 时间：2025
- 层次：L4-知识与工具层
- 机制：工具调用机制、环境交互机制
- Tags: topic:agent topic:computer-use layer:knowledge mechanism:tool-use mechanism:environment-interaction type:doc time:2025 status:frontier
- 概述：把工具使用从 API 层推进到 GUI/桌面环境，是 agentic systems 的重要边界扩展。
Model Context Protocol (MCP)
- URL: https://modelcontextprotocol.io/
- 类型：官方文档
- 时间：2024-2026
- 层次：L4-知识与工具层
- 机制：协议机制
- Tags: topic:mcp topic:agent layer:knowledge mechanism:protocol type:doc time:2025 status:foundational
- 概述：MCP 已成为 AI 应用连接外部工具、资源和工作流的重要标准接口，是协议层必须保留的核心资源。
A2A Protocol
- URL: https://google.github.io/A2A/
- 类型：官方文档
- 时间：2025
- 层次：L4-知识与工具层
- 机制：协议机制、Agent 协作机制
- Tags: topic:a2a topic:agent layer:knowledge mechanism:protocol mechanism:agent-collaboration type:doc time:2025 status:frontier
- 概述：把“工具协议”进一步扩展到“Agent 与 Agent 通信”，是多智能体标准化方向的重要资料。

0.29 推理系统、部署工程与本地运行#

时间线#

2023 PagedAttention / vLLM
2024 SGLang / llama.cpp 工程成熟
2025 TensorRT-LLM、SGLang、vLLM 在生产级场景进一步稳定
2026 Q1 本地模型管理、长上下文 serving、推理模型 serving、多模态 serving 继续融合

核心资源#

Efficient Memory Management for Large Language Model Serving with PagedAttention
- URL: https://arxiv.org/abs/2309.06180
- 类型：论文
- 时间：2023
- 层次：L5-系统工程层
- 机制：部署服务机制、KV cache 机制
- Tags: topic:serving topic:vllm layer:system mechanism:serving mechanism:kv-cache type:paper time:2023 status:foundational
- 概述：PagedAttention 是现代 LLM serving 栈的标志性工作，也是 vLLM 理论核心。
vLLM
- URL: https://github.com/vllm-project/vllm
- 类型：代码仓库
- 时间：2023-2026
- 层次：L5-系统工程层
- 机制：部署服务机制、KV cache 机制、连续批处理机制
- Tags: topic:serving topic:vllm layer:system mechanism:serving mechanism:kv-cache mechanism:continuous-batching type:repo status:foundational
- 概述：当前最重要的开源 LLM serving 引擎之一，已经把 PagedAttention、prefix caching、LoRA、reasoning outputs、多模态和 OpenAI-compatible serving 融合为一套成熟工程栈。
vLLM Docs
- URL: https://docs.vllm.ai/
- 类型：官方文档
- 时间：2026
- 层次：L5-系统工程层
- 机制：部署服务机制、可观测机制、多模态 serving 机制
- Tags: topic:serving topic:vllm layer:system mechanism:serving mechanism:observability mechanism:multimodal-serving type:doc time:2026 status:frontier
- 概述：到 2026 年，vLLM 文档已经覆盖 reasoning outputs、MCP tools、OpenTelemetry、多模态和分布式 serving，是非常值得长期跟踪的工程资料源。
SGLang
- URL: https://github.com/sgl-project/sglang
- 类型：代码仓库
- 时间：2024-2026
- 层次：L5-系统工程层
- 机制：部署服务机制、结构化解码机制、前缀缓存机制
- Tags: topic:serving topic:sglang layer:system mechanism:serving mechanism:structured-decoding mechanism:prefix-caching type:repo status:frontier
- 概述：SGLang 已从高性能 serving engine 发展成覆盖 reasoning parser、structured outputs、RL rollout backend、多模态 serving 的综合 runtime。
SGLang Docs
- URL: https://docs.sglang.io/
- 类型：官方文档
- 时间：2026
- 层次：L5-系统工程层
- 机制：部署服务机制、结构化解码机制、多模态 serving 机制
- Tags: topic:serving topic:sglang layer:system mechanism:serving mechanism:structured-decoding mechanism:multimodal-serving type:doc time:2026 status:frontier
- 概述：到 2026 年，SGLang 文档已经把长上下文、多模态、reasoning parser、observability、RL 系统与 serving 主栈连成一个完整体系。
TensorRT-LLM
- URL: https://github.com/NVIDIA/TensorRT-LLM
- 类型：代码仓库
- 时间：2023-2026
- 层次：L5-系统工程层
- 机制：部署服务机制、推理加速机制、量化机制
- Tags: topic:serving topic:tensorrt-llm layer:system mechanism:serving mechanism:inference-acceleration mechanism:quantization type:repo status:frontier
- 概述：生产级 GPU 集群部署的重要路线，尤其适合补齐 Blackwell、MoE、长上下文和 inference-time compute 的优化资料。
llama.cpp
- URL: https://github.com/ggml-org/llama.cpp
- 类型：代码仓库
- 时间：2023-2026
- 层次：L5-系统工程层
- 机制：本地推理机制、量化机制、边缘部署机制
- Tags: topic:local-llm topic:serving layer:system mechanism:edge-inference mechanism:quantization mechanism:serving type:repo status:foundational
- 概述：本地推理生态的，适合补齐 GGUF、低比特量化、边缘部署与跨平台本地运行这一条路线。
Ollama
- URL: https://github.com/ollama/ollama
- 类型：代码仓库
- 时间：2023-2026
- 层次：L5-系统工程层
- 机制：本地模型管理机制、部署服务机制
- Tags: topic:local-llm topic:ollama layer:system mechanism:model-management mechanism:serving type:repo status:engineering
- 概述：Ollama 显著降低了本地模型管理、API 暴露与开发者接入成本，已经成为本地模型工程的默认入口之一。
Ollama API
- URL: https://docs.ollama.com/api
- 类型：官方文档
- 时间：2026
- 层次：L5-系统工程层
- 机制：部署服务机制、本地 API 机制
- Tags: topic:local-llm topic:ollama layer:system mechanism:serving mechanism:api type:doc time:2026 status:engineering
- 概述：适合作为本地模型 API 化的标准入口，尤其适合搭配 Open WebUI、Continue、Open WebUI 等生态工具使用。
Open WebUI
- URL: https://github.com/open-webui/open-webui
- 类型：代码仓库
- 时间：2024-2026
- 层次：L5-系统工程层
- 机制：本地部署机制、RAG 集成机制、可观测机制
- Tags: topic:local-llm topic:ui layer:system mechanism:serving mechanism:retrieval mechanism:observability type:repo status:engineering
- 概述：Open WebUI 已成为本地模型、RAG、Ollama/OpenAI-compatible API 结合的高频 UI 层方案，也适合补入“本地部署实践”方向。

0.30 训练、微调与对齐#

时间线#

2021 LoRA
2022 RLHF / InstructGPT / Constitutional AI
2023 QLoRA / DPO
2025 DeepSeek R1 / reasoning post-training 成为主线

核心资源#

Training language models to follow instructions with human feedback
- URL: https://arxiv.org/abs/2203.02155
- 类型：论文
- 时间：2022
- 层次：L2-训练与对齐层
- 机制：对齐机制
- Tags: topic:alignment topic:rlhf layer:training mechanism:alignment type:paper time:2022 status:foundational
- 概述：RLHF 进入产品主线的标志性论文。
Constitutional AI
- URL: https://arxiv.org/abs/2212.08073
- 类型：论文
- 时间：2022
- 层次：L2-训练与对齐层
- 机制：对齐机制
- Tags: topic:alignment topic:constitutional-ai layer:training mechanism:alignment type:paper time:2022 status:foundational
- 概述：Anthropic 路线的核心来源，适合补齐“对齐不只是 RLHF”这一视角。
LoRA
- URL: https://arxiv.org/abs/2106.09685
- 类型：论文
- 时间：2021
- 层次：L2-训练与对齐层
- 机制：微调机制
- Tags: topic:finetuning topic:lora layer:training mechanism:finetuning type:paper time:2021 status:foundational
- 概述：参数高效微调起点，仍是理解 PEFT 体系的第一篇必读论文。
QLoRA
- URL: https://arxiv.org/abs/2305.14314
- 类型：论文
- 时间：2023
- 层次：L2-训练与对齐层
- 机制：微调机制、量化机制
- Tags: topic:finetuning topic:qlora layer:training mechanism:finetuning mechanism:quantization type:paper time:2023 status:foundational
- 概述：把量化与微调结合起来，显著降低大模型微调门槛。
DPO: Direct Preference Optimization
- URL: https://arxiv.org/abs/2305.18290
- 类型：论文
- 时间：2023
- 层次：L2-训练与对齐层
- 机制：对齐机制
- Tags: topic:alignment topic:dpo layer:training mechanism:alignment type:paper time:2023 status:foundational
- 概述：偏好优化的重要简化路线，是很多现代对齐实践的共同基础。

0.31 评测、可观测性与可靠性#

时间线#

2020-2021 MMLU / HumanEval
2024-2025 GAIA / agent benchmark / observability 进入主流
2024-2025 应用级 eval、RAG eval、software engineering benchmark 快速丰富
2024-2025 tool-agent-user 交互 benchmark 与浏览器/企业任务 benchmark 继续补强
2026 interactive reasoning eval 与 production tracing 深度结合

核心资源#

MMLU
- URL: https://arxiv.org/abs/2009.03300
- 类型：论文
- 时间：2020
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval layer:system mechanism:evaluation type:paper time:2020 status:foundational
- 概述：通用知识 benchmark 经典来源，但更适合作为基础 benchmark，而非最终能力结论。
HumanEval
- URL: https://arxiv.org/abs/2107.03374
- 类型：论文
- 时间：2021
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval topic:coding layer:system mechanism:evaluation type:paper time:2021 status:foundational
- 概述：代码生成 benchmark 的经典来源，是 AI 编程讨论的常用底座。
GAIA
- URL: https://huggingface.co/benchmarks/gaia
- 类型：数据集/基准
- 时间：2024-2025
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval topic:agent layer:system mechanism:evaluation type:benchmark status:frontier
- 概述：更贴近真实任务和工具使用，是 Agent 评测里很值得保留的 benchmark。
API-Bank
- URL: https://arxiv.org/abs/2304.09142
- 类型：论文
- 时间：2023
- 层次：L5-系统工程层
- 机制：评测机制、工具调用评测机制
- Tags: topic:eval topic:agent layer:system mechanism:evaluation mechanism:tool-use type:paper time:2023 status:frontier
- 概述：把工具调用正确性纳入 benchmark 视角，是 Agent 时代很重要的评测补充。
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
- URL: https://arxiv.org/abs/2310.06770
- 类型：论文
- 时间：2023
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval topic:coding topic:agent layer:system mechanism:evaluation type:paper time:2023 status:foundational
- 概述：SWE-bench 已成为软件工程 Agent 评测的核心基准之一，适合把 AI 编程讨论从 demo 级任务推进到真实仓库 issue 修复场景。
WebArena: A Realistic Web Environment for Building Autonomous Agents
- URL: https://arxiv.org/abs/2307.13854
- 类型：论文
- 时间：2023
- 层次：L5-系统工程层
- 机制：评测机制、环境交互机制
- Tags: topic:eval topic:agent layer:system mechanism:evaluation mechanism:environment-interaction type:paper time:2023 status:foundational
- 概述：WebArena 是网页环境 Agent 的代表 benchmark，适合补齐浏览器操作、跨页面导航和复杂任务分解这一类真实交互评测。
tau-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
- URL: https://arxiv.org/abs/2406.12045
- 类型：论文
- 时间：2024
- 层次：L5-系统工程层
- 机制：评测机制、工具调用机制
- Tags: topic:eval topic:agent topic:tool-use layer:system mechanism:evaluation mechanism:tool-use type:paper time:2024 status:frontier
- 概述：tau-bench 把 agent 评测推进到“工具、用户、业务规则”三者同时存在的真实交互语境，是企业型 agent 评测的重要补充。
OpenAI Evals
- URL: https://github.com/openai/evals
- 类型：代码仓库
- 时间：2023-2026
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval layer:system mechanism:evaluation type:repo status:engineering
- 概述：OpenAI Evals 是应用侧自定义评测与回归测试的重要开源入口，适合连接模型 benchmark 与产品级质量验证。
HELM: Holistic Evaluation of Language Models
- URL: https://arxiv.org/abs/2211.09110
- 类型：论文
- 时间：2022
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval layer:system mechanism:evaluation type:paper time:2022 status:foundational
- 概述：HELM 强调“评测不只看一个分数”，而要同时关注准确性、鲁棒性、公平性、校准等多维指标，是评测方法论的骨架级资源。
RAGAS
- URL: https://github.com/explodinggradients/ragas
- 类型：代码仓库
- 时间：2024-2026
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval topic:rag layer:system mechanism:evaluation type:repo status:engineering
- 概述：RAGAS 已成为 RAG 应用评测的高频工具，适合补齐回答质量、上下文利用、faithfulness 等应用层指标。
DeepEval
- URL: https://github.com/confident-ai/deepeval
- 类型：代码仓库
- 时间：2024-2026
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval topic:agent layer:system mechanism:evaluation type:repo status:engineering
- 概述：DeepEval 适合把 LLM 应用测试做成更接近单元测试和 CI 的流程，是应用层 eval 工程化的重要补充。
promptfoo
- URL: https://github.com/promptfoo/promptfoo
- 类型：代码仓库
- 时间：2024-2026
- 层次：L5-系统工程层
- 机制：评测机制
- Tags: topic:eval topic:security layer:system mechanism:evaluation type:repo status:engineering
- 概述：promptfoo 同时覆盖 prompt 测试、红队测试和回归比较，适合连接质量评测与安全评测两条线。
LangSmith
- URL: https://docs.smith.langchain.com/
- 类型：官方文档
- 时间：持续更新
- 层次：L5-系统工程层
- 机制：可观测机制
- Tags: topic:observability topic:agent layer:system mechanism:observability type:doc status:engineering
- 概述：适合追踪链路、prompt、tool 调用与运行轨迹，是 AI 应用调试的主流平台之一。
Langfuse
- URL: https://langfuse.com/
- 类型：官方文档
- 时间：持续更新
- 层次：L5-系统工程层
- 机制：可观测机制
- Tags: topic:observability layer:system mechanism:observability type:doc status:engineering
- 概述：开源可自部署，适合构建自有 AI 观测与标注分析体系。
OpenTelemetry
- URL: https://opentelemetry.io/
- 类型：官方文档
- 时间：持续更新
- 层次：L5-系统工程层
- 机制：可观测机制
- Tags: topic:observability layer:system mechanism:observability type:doc status:foundational
- 概述：把 AI 系统观测纳入通用 tracing 标准，是生产级统一观测栈的关键底座。

0.32 安全、攻击与治理#

时间线#

2023 prompt injection / jailbreak 体系化
2024-2025 tool / plugin / agent attack surface 成为主线
2025-2026 governance、risk management、supply chain 与 overreliance 更系统化

核心资源#

GCG: Greedy Coordinate Gradient
- URL: https://arxiv.org/abs/2307.04757
- 类型：论文
- 时间：2023
- 层次：L6-安全与治理层
- 机制：安全攻击机制
- Tags: topic:security topic:jailbreak layer:security mechanism:security-attack type:paper time:2023 status:foundational
- 概述：自动化越狱攻击的重要代表工作，是从手工 prompt 越狱走向搜索式攻击的关键节点。
Prompt Injection Attacks
- URL: https://www.jailbreaksearch.com/
- 类型：案例库/安全站点
- 时间：持续更新
- 层次：L6-安全与治理层
- 机制：安全攻击机制
- Tags: topic:security topic:prompt-injection layer:security mechanism:security-attack type:benchmark status:engineering
- 概述：适合长期追踪提示注入与越狱样式，是安全专题的实战样例库。
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
- URL: https://arxiv.org/abs/2302.12173
- 类型：论文
- 时间：2023
- 层次：L6-安全与治理层
- 机制：安全攻击机制
- Tags: topic:security topic:prompt-injection layer:security mechanism:security-attack type:paper time:2023 status:foundational
- 概述：这是现实世界间接提示注入讨论里最关键的论文之一，明确说明攻击者可以借助外部内容污染模型行为，而不需要直接接触系统提示词。
OWASP Top 10 for Large Language Model Applications / OWASP GenAI Security Project
- URL: https://owasp.org/www-project-top-10-for-large-language-model-applications/
- 类型：机构报告/政策
- 时间：2023-2025
- 层次：L6-安全与治理层
- 机制：安全防御机制、治理合规机制
- Tags: topic:security topic:governance layer:security mechanism:security-defense mechanism:governance type:policy status:foundational
- 概述：最适合开发者快速建立 LLM 应用安全面地图的开放指南之一，应作为安全章节的基础框架资源。
NIST AI Risk Management Framework
- URL: https://www.nist.gov/itl/ai-risk-management-framework
- 类型：机构报告/政策
- 时间：2023-2025
- 层次：L6-安全与治理层
- 机制：治理合规机制
- Tags: topic:governance topic:risk-management layer:security mechanism:governance type:policy status:foundational
- 概述：适合把 AI 风险讨论从技术漏洞扩展到组织治理、流程控制与生命周期管理。
EU AI Act
- URL: https://artificialintelligenceact.eu/
- 类型：机构报告/政策
- 时间：2024-2025
- 层次：L6-安全与治理层
- 机制：治理合规机制
- Tags: topic:governance topic:compliance layer:security mechanism:governance type:policy status:frontier
- 概述：全球 AI 合规讨论中最值得长期跟踪的法规资源之一。

0.33 多模态、实时交互与本地应用层#

时间线#

2022 Whisper 等单模态基础能力成熟
2024 Gemini 2.0 / agentic multimodal 叙事
2024-2025 realtime voice / video agent 基础设施快速成熟
2025 Claude 4 / coding + reasoning + agent workflows
2026 本地多模态与实时交互工程逐渐主流化

核心资源#

Whisper
- URL: https://arxiv.org/abs/2212.04356
- 类型：论文
- 时间：2022
- 层次：L1-模型机制层
- 机制：语音理解机制
- Tags: topic:multimodal topic:speech layer:model mechanism:speech-understanding type:paper time:2022 status:foundational
- 概述：语音识别的代表模型，是实时语音 Agent / 多模态助手的重要基础来源。
Introducing Gemini 2.0: our new AI model for the agentic era
- URL: https://blog.google/innovation-and-ai/models-and-research/google-deepmind/google-gemini-ai-update-december-2024/
- 类型：官方博客/公告
- 时间：2024 Q4
- 层次：L0-范式层
- 机制：多模态机制、工具调用机制、Agent 机制
- Tags: topic:multimodal topic:agent topic:gemini layer:paradigm mechanism:multimodal mechanism:tool-use type:doc time:2024Q4 status:frontier
- 概述：Gemini 2.0 明确把多模态、原生工具使用和 agentic experience 绑定在一起，是“agentic era”叙事的重要官方材料。
LiveKit Agents
- URL: https://docs.livekit.io/agents/
- 类型：官方文档
- 时间：2025-2026
- 层次：L5-系统工程层
- 机制：多模态 serving 机制、环境交互机制
- Tags: topic:multimodal topic:agent topic:speech layer:system mechanism:multimodal-serving mechanism:environment-interaction type:doc status:engineering
- 概述：LiveKit Agents 是实时语音、视频和多模态 agent 的高频基础设施入口，适合补齐“LLM 应用不只是一问一答，还包括实时交互系统”的工程主线。
Introducing Claude 4
- URL: https://www.anthropic.com/news/claude-4
- 类型：官方博客/公告
- 时间：2025 Q2
- 层次：L0-范式层
- 机制：推理机制、工具调用机制、记忆机制
- Tags: topic:reasoning topic:agent topic:claude layer:paradigm mechanism:reasoning mechanism:tool-use mechanism:memory type:doc time:2025Q2 status:frontier
- 概述：Claude 4 把 extended thinking with tool use、memory files、Claude Code 和 MCP connector 串到一起，是 2025 年 agent workflow 方向的重要产品节点。

0.34 2025-2026 前沿增补速查#

`2026 Q1`#

Attention Residuals：深层 Transformer 稳定性与架构创新
ARC-AGI-3：从静态 reasoning benchmark 走向交互式 reasoning benchmark
vLLM Docs (2026 dev preview)：reasoning outputs、OpenTelemetry、多模态、MCP tools
SGLang Docs (Mar 25, 2026)：reasoning parser、structured outputs for reasoning models、HiCache、多模态 serving
llama.cpp b8508：本地推理、多模态、GGUF、本地 OpenAI-compatible API 强化
Ollama v0.18.x：本地模型 API、集成本地 agent 工具生态扩张

`2025`#

DeepSeek-R1 / DeepSeek-V3：开源推理与后训练强化学习主线
A2A Protocol：多 Agent 标准化通信
Anthropic Computer Use / Claude 4：action-oriented agents 与 coding agents
o3 / o4-mini：推理模型产品化节点
OWASP GenAI Security：安全与治理框架升级
Apple reasoning critique：reasoning 热潮中的关键反思材料

0.35 反向 Tag 索引建议#

后续继续维护时，建议新增一个按 Tag 检索的小索引，至少保留这些主键：

topic:reasoning
topic:rag
topic:agent
topic:serving
topic:local-llm
topic:security
topic:multimodal
mechanism:attention
mechanism:retrieval
mechanism:tool-use
mechanism:observability
layer:model
layer:training
layer:inference
layer:knowledge
layer:system
layer:security

第四部分：高频核心来源清单#

下面这些来源在两套内容体系里反复出现，已经构成你的 AI 知识库的“参考主骨架”：

Attention Is All You Need：几乎所有原理线、论文线、入门线都会回到它
GPT-3 / InstructGPT / Chain-of-Thought / ReAct：分别对应规模化、对齐、推理、Agent 四条主线
RAG、LoRA、QLoRA、DPO：对应知识增强、微调、对齐三大应用实践路线
OpenAI / Anthropic / Google 官方文档：构成工程实践部分最稳定的产品与接口依据
LangChain / LangGraph / LlamaIndex / AutoGen / CrewAI / MCP：构成应用架构与 Agent 工程的基础设施层
vLLM / SGLang / TensorRT-LLM / llama.cpp / Ollama：构成 2026 时点最值得持续维护的推理部署工程主栈
The Illustrated Transformer、happy-llm、Prompt Engineering Guide：构成教程型、解释型材料的高频辅助层
ARC Prize、MMLU、HumanEval、GAIA、API-Bank：构成能力和系统评估层的 benchmark 骨架
Cursor Security Advisory、GCG、Prompt Injection Attacks、OWASP GenAI Security：构成安全专题中最有现实感的攻击与治理来源

第五部分：如何使用这份索引#

如果你后续要继续建设 machine-learning 知识库，我建议这样用这份索引：

写 入门系列 时，优先使用：论文 + 官方文档 + 高质量教程
写 工程系列 时，优先使用：官方文档 + 开源仓库 + benchmark + 事故案例
写 论文解读系列 时，优先使用：原论文 + 官方实现 + 后续工作 + 批评文章
写 安全与可靠性系列 时，优先使用：攻击论文 + 安全通告 + CVE + 事故复盘 + 防御文档
写 行业案例系列 时，优先使用：产品公告 + 架构文档 + benchmark + 行业报告 + 法规/合规材料

一篇高质量技术文章，最好同时包含这四层来源：

原始来源：论文、技术报告、官方规范
实现来源：官方文档、SDK、代码仓库
验证来源：benchmark、数据集、评测报告
现实来源：新闻、案例、事故、法规、行业报告

这样写出来的文章，会比单纯“转述概念”更稳、更耐看，也更适合长期维护。

参考#

AI 是怎么回事

说明#

来源类型说明#

第一部分：本地 machine-learning 目录参考资料索引#

0.1 总览与 README#

00-系列导读.md#

agent-guide/00-系列导读.md#

llm-paper-history/00-系列导读.md#

llm-security/00-系列导读.md#

llm/index.md#

0.2 llm-guide#

llm-guide/01-ai-history.md#

llm-guide/02-how-llm-works.md#

llm-guide/03-choose-model.md#

llm-guide/04-prompt-engineering.md#

llm-guide/05-rag.md#

llm-guide/06-function-calling.md#

llm-guide/07-ai-agent.md#

llm-guide/08-memory-mcp-ecosystem.md#

llm-guide/09-ai-programming.md#

llm-guide/10-build-ai-app.md#

llm-guide/11-fine-tuning.md#

0.3 agent-guide#

agent-guide/14-Agent架构模式.md#

agent-guide/15-Agent评估体系.md#

agent-guide/16-Agent可观测性与调试.md#

agent-guide/17-Agent成本优化.md#

agent-guide/18-Agent可靠性设计.md#

agent-guide/19-Agent测试策略.md#

agent-guide/20-Agent安全防御.md#

0.4 llm-paper-history#

llm-paper-history/24-SpeculativeDecoding推理加速.md#

llm-paper-history/25-T5与FLAN指令微调.md#

llm-paper-history/26-Qwen与InternLM开源模型.md#

llm-paper-history/27-PaLM2技术报告.md#

llm-paper-history/28-AlphaCode编程竞赛.md#

llm-paper-history/29-Mistral7B小而美.md#

llm-paper-history/30-Grok与LLaMA3开源新星.md#

llm-paper-history/31-RAG与LongContext知识增强.md#

llm-paper-history/32-o1o3推理时代.md#

llm-paper-history/33-PromptEngineering提示工程.md#

0.5 llm-security#

llm-security/01-提示注入与越狱攻击.md#

llm-security/02-系统提示词泄露与数据提取.md#

llm-security/03-代码执行与基础设施攻击.md#

llm-security/04-对抗性自动化攻击.md#

llm-security/05-数据泄露与供应链攻击.md#

llm-security/06-特定领域高危漏洞.md#

llm-security/07-AI驱动的自动化攻击.md#

0.6 llm 旧系列#

llm/004-calculate-llm-cost.md#

llm/005-master-prompt-engineering.md#

llm/006-rag-knowledge-injection.md#

llm/009-boost-dev-efficiency.md#

llm/010-build-ai-application.md#

0.7 独立文章#

ai-vup.md#

machine-learning-101.md#

ML系统设计.md#

第二部分：AI 是怎么回事 系列参考资料索引#

0.8 第 1 篇：AI 到底聪明在哪——从手机人脸识别说起#

0.9 第 2 篇：AI 怎么读懂文字——国王减去男人等于什么#

0.10 第 3 篇：AI 是怎么突然变厉害的——2012 所有人以为他们作弊了#

0.11 第 4 篇：神经网络到底是什么——6000 万个旋钮的真相#

0.12 第 5 篇：AI 是怎么学会的——从做错一道题说起#

0.13 第 6 篇：ChatGPT 为什么能对话——一篇引用 17 万次的论文#

0.14 第 7 篇：AI 为什么会撒谎——一个律师被 ChatGPT 骗了#

0.15 第 8 篇：ChatGPT 回答你的三秒钟里，发生了什么？#

0.16 第 9 篇：AI 到底有多聪明？——一份让 AI 研究者也困惑的成绩单#

0.17 第 10 篇：AI 能“创造”吗？——从一团噪声到一幅画#

0.18 第 11 篇：为什么 AI 能赢世界冠军，却开不好车？#

0.19 第 12 篇：这个框架会过时吗？——AI 的天花板和你的判断力#

0.20 第 13 篇：怎么让 AI 听懂你的话——同一个 AI 为什么他用得比你好 10 倍#

0.21 第 14 篇：怎么跟 AI 协作不翻车——AI 说的话你该信几分#

0.22 第 15 篇：AI 写代码有多厉害？——快了 55%，但错多了 75%#

0.23 第 16 篇：AI 会取代我们吗——它不懂孤独是什么意思#

第三部分：标准化核心参考图谱（按方向 / 时间线 / 层次 / 机制）#

0.24 标准字段说明#

0.25 基础架构与模型机制#

时间线#

核心资源#

第一部分：本地 `machine-learning` 目录参考资料索引#

`00-系列导读.md`#

`agent-guide/00-系列导读.md`#

`llm-paper-history/00-系列导读.md`#

`llm-security/00-系列导读.md`#

`llm/index.md`#

0.2 `llm-guide`#

`llm-guide/01-ai-history.md`#

`llm-guide/02-how-llm-works.md`#

`llm-guide/03-choose-model.md`#

`llm-guide/04-prompt-engineering.md`#

`llm-guide/05-rag.md`#

`llm-guide/06-function-calling.md`#

`llm-guide/07-ai-agent.md`#

`llm-guide/08-memory-mcp-ecosystem.md`#

`llm-guide/09-ai-programming.md`#

`llm-guide/10-build-ai-app.md`#

`llm-guide/11-fine-tuning.md`#

0.3 `agent-guide`#

`agent-guide/14-Agent架构模式.md`#

`agent-guide/15-Agent评估体系.md`#

`agent-guide/16-Agent可观测性与调试.md`#

`agent-guide/17-Agent成本优化.md`#

`agent-guide/18-Agent可靠性设计.md`#

`agent-guide/19-Agent测试策略.md`#

`agent-guide/20-Agent安全防御.md`#

0.4 `llm-paper-history`#

`llm-paper-history/24-SpeculativeDecoding推理加速.md`#

`llm-paper-history/25-T5与FLAN指令微调.md`#

`llm-paper-history/26-Qwen与InternLM开源模型.md`#

`llm-paper-history/27-PaLM2技术报告.md`#

`llm-paper-history/28-AlphaCode编程竞赛.md`#

`llm-paper-history/29-Mistral7B小而美.md`#

`llm-paper-history/30-Grok与LLaMA3开源新星.md`#

`llm-paper-history/31-RAG与LongContext知识增强.md`#

`llm-paper-history/32-o1o3推理时代.md`#

`llm-paper-history/33-PromptEngineering提示工程.md`#

0.5 `llm-security`#

`llm-security/01-提示注入与越狱攻击.md`#

`llm-security/02-系统提示词泄露与数据提取.md`#

`llm-security/03-代码执行与基础设施攻击.md`#

`llm-security/04-对抗性自动化攻击.md`#

`llm-security/05-数据泄露与供应链攻击.md`#

`llm-security/06-特定领域高危漏洞.md`#

`llm-security/07-AI驱动的自动化攻击.md`#

0.6 `llm` 旧系列#

`llm/004-calculate-llm-cost.md`#

`llm/005-master-prompt-engineering.md`#

`llm/006-rag-knowledge-injection.md`#

`llm/009-boost-dev-efficiency.md`#

`llm/010-build-ai-application.md`#

`ai-vup.md`#

`machine-learning-101.md`#

`ML系统设计.md`#

第二部分：`AI 是怎么回事` 系列参考资料索引#

`2026 Q1`#

`2025`#