Olmo 3 Pre-training Collection All artifacts related to Olmo 3 pre-training • 10 items • Updated 5 days ago • 31
Olmo 3.1 Collection The latest members of the Olmo 3 family: another 3 weeks of RL for 32B Think, the 32B Instruct model, large post-training research datasets... • 9 items • Updated 5 days ago • 36
Olmo 3 Post-training Collection All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them. • 32 items • Updated 5 days ago • 46
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning Paper • 2512.05591 • Published 23 days ago • 16
The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published Nov 11 • 33
AEPO Collection The official datasets and model checkpoints of AEPO • 5 items • Updated 8 days ago • 4
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning Paper • 2509.20712 • Published Sep 25 • 19
Finedeep: Mitigating Sparse Activation in Dense LLMs via Multi-Layer Fine-Grained Experts Paper • 2502.12928 • Published Feb 18 • 1
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization Paper • 2508.07629 • Published Aug 11 • 43
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts Paper • 2410.16077 • Published Oct 21, 2024 • 1
MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts Paper • 2407.09816 • Published Jul 13, 2024 • 1