robgreenberg3 commited on
Commit
3364b5a
·
verified ·
1 Parent(s): 042ce98

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -46
README.md CHANGED
@@ -67,51 +67,6 @@ Both models were trained on our [harmony response format](https://github.com/ope
67
 
68
  # Inference examples
69
 
70
- ## Transformers
71
-
72
- You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony](https://github.com/openai/harmony) package.
73
-
74
- To get started, install the necessary dependencies to setup your environment:
75
-
76
- ```
77
- pip install -U transformers kernels torch
78
- ```
79
-
80
- Once, setup you can proceed to run the model by running the snippet below:
81
-
82
- ```py
83
- from transformers import pipeline
84
- import torch
85
-
86
- model_id = "openai/gpt-oss-120b"
87
-
88
- pipe = pipeline(
89
- "text-generation",
90
- model=model_id,
91
- torch_dtype="auto",
92
- device_map="auto",
93
- )
94
-
95
- messages = [
96
- {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
97
- ]
98
-
99
- outputs = pipe(
100
- messages,
101
- max_new_tokens=256,
102
- )
103
- print(outputs[0]["generated_text"][-1])
104
- ```
105
-
106
- Alternatively, you can run the model via [`Transformers Serve`](https://huggingface.co/docs/transformers/main/serving) to spin up a OpenAI-compatible webserver:
107
-
108
- ```
109
- transformers serve
110
- transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-120b
111
- ```
112
-
113
- [Learn more about how to use gpt-oss with Transformers.](https://cookbook.openai.com/articles/gpt-oss/run-transformers)
114
-
115
  ## vLLM
116
 
117
  vLLM recommends using [uv](https://docs.astral.sh/uv/) for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.
@@ -262,11 +217,57 @@ See [Red Hat Openshift AI documentation](https://docs.redhat.com/en/documentatio
262
 
263
  [Learn more about how to use gpt-oss with vLLM.](https://cookbook.openai.com/articles/gpt-oss/run-vllm)
264
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
265
  ## PyTorch / Triton
266
 
267
  To learn about how to use this model with PyTorch and Triton, check out our [reference implementations in the gpt-oss repository](https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-pytorch-implementation).
268
 
269
- ## Ollama
 
270
 
271
  If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after [installing Ollama](https://ollama.com/download).
272
 
@@ -278,6 +279,8 @@ ollama run gpt-oss:120b
278
 
279
  [Learn more about how to use gpt-oss with Ollama.](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama)
280
 
 
 
281
  #### LM Studio
282
 
283
  If you are using [LM Studio](https://lmstudio.ai/) you can use the following commands to download.
 
67
 
68
  # Inference examples
69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  ## vLLM
71
 
72
  vLLM recommends using [uv](https://docs.astral.sh/uv/) for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.
 
217
 
218
  [Learn more about how to use gpt-oss with vLLM.](https://cookbook.openai.com/articles/gpt-oss/run-vllm)
219
 
220
+ ## Transformers
221
+
222
+ You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony](https://github.com/openai/harmony) package.
223
+
224
+ To get started, install the necessary dependencies to setup your environment:
225
+
226
+ ```
227
+ pip install -U transformers kernels torch
228
+ ```
229
+
230
+ Once, setup you can proceed to run the model by running the snippet below:
231
+
232
+ ```py
233
+ from transformers import pipeline
234
+ import torch
235
+
236
+ model_id = "openai/gpt-oss-120b"
237
+
238
+ pipe = pipeline(
239
+ "text-generation",
240
+ model=model_id,
241
+ torch_dtype="auto",
242
+ device_map="auto",
243
+ )
244
+
245
+ messages = [
246
+ {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
247
+ ]
248
+
249
+ outputs = pipe(
250
+ messages,
251
+ max_new_tokens=256,
252
+ )
253
+ print(outputs[0]["generated_text"][-1])
254
+ ```
255
+
256
+ Alternatively, you can run the model via [`Transformers Serve`](https://huggingface.co/docs/transformers/main/serving) to spin up a OpenAI-compatible webserver:
257
+
258
+ ```
259
+ transformers serve
260
+ transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-120b
261
+ ```
262
+
263
+ [Learn more about how to use gpt-oss with Transformers.](https://cookbook.openai.com/articles/gpt-oss/run-transformers)
264
+
265
  ## PyTorch / Triton
266
 
267
  To learn about how to use this model with PyTorch and Triton, check out our [reference implementations in the gpt-oss repository](https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-pytorch-implementation).
268
 
269
+ <details>
270
+ <summary><strong>Ollama</strong></summary>
271
 
272
  If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after [installing Ollama](https://ollama.com/download).
273
 
 
279
 
280
  [Learn more about how to use gpt-oss with Ollama.](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama)
281
 
282
+ </details>
283
+
284
  #### LM Studio
285
 
286
  If you are using [LM Studio](https://lmstudio.ai/) you can use the following commands to download.