2026-05-11
1. GPipe:朴素流水线与 micro-batch
2026-05-10
3. Interleaved 1F1B
2026-05-10
RL Chapter10 探索:从 ε-greedy 到 Curiosity
2026-05-10
RL Chapter11 离线强化学习:BCQ、CQL、IQL 与 DPO 的渊源
2026-05-10
RL Chapter12 模仿学习与逆 RL:BC、DAgger、GAIL、IRL
2026-05-10
RL Chapter13 RL × LLM 综述:PPO、DPO、GRPO 的统一视角
2026-05-10
RL Chapter2 动态规划 (DP):策略评估、策略迭代、价值迭代
2026-05-10
RL Chapter4 Q-Learning 与 SARSA:从评估到控制
2026-05-10