modify readme
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# Search-R1: Train your LLMs to reason and call a search engine with reinforcement learning
|
||||
|
||||
<strong>Search-R1</strong> is a reproduction of <strong>DeepSeek-R1(-Zero)</strong> methods for <em>training reasoning and searching (tool-call) interleaved LLMs</em>. We built upon [veRL](https://github.com/volcengine/verl).
|
||||
<strong>Search-R1</strong> is an extension of <strong>DeepSeek-R1(-Zero)</strong> methods for <em>training reasoning and searching (tool-call) interleaved LLMs</em>. We built upon [veRL](https://github.com/volcengine/verl).
|
||||
|
||||
Through RL (rule-based outcome reward), the 3B **base** LLM (both Qwen2.5-3b-base and Llama3.2-3b-base) develops reasoning and search engine calling abilities all on its own.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user