•
一张表串讲LLM-RL中KL散度正则的正确与错误用法
3 min read · February 11, 2026
2026 · reinforcement-learning
一图串讲GRPO十几种主流变体算法
2 min read · February 10, 2026