Step-DPO Resources for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs" xinlai/DeepSeekMath-RL-Step-DPO Text Generation • 7B • Updated Jun 28, 2024 • 29 • 2 xinlai/Qwen2-7B-Instruct-Step-DPO Text Generation • 8B • Updated Jun 29, 2024 • 69 • 3 xinlai/Qwen2-72B-Instruct-Step-DPO Text Generation • 73B • Updated Jun 28, 2024 • 9 xinlai/DeepSeekMath-Base-SFT-Step-DPO Text Generation • 7B • Updated Jun 28, 2024 • 7
Step-DPO Resources for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs" xinlai/DeepSeekMath-RL-Step-DPO Text Generation • 7B • Updated Jun 28, 2024 • 29 • 2 xinlai/Qwen2-7B-Instruct-Step-DPO Text Generation • 8B • Updated Jun 29, 2024 • 69 • 3 xinlai/Qwen2-72B-Instruct-Step-DPO Text Generation • 73B • Updated Jun 28, 2024 • 9 xinlai/DeepSeekMath-Base-SFT-Step-DPO Text Generation • 7B • Updated Jun 28, 2024 • 7