RedHatAI
/

gpt-oss-120b

@@ -67,51 +67,6 @@ Both models were trained on our [harmony response format](https://github.com/ope
 # Inference examples
-## Transformers
-You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony](https://github.com/openai/harmony) package.
-To get started, install the necessary dependencies to setup your environment:
-```
-pip install -U transformers kernels torch
-```
-Once, setup you can proceed to run the model by running the snippet below:
-```py
-from transformers import pipeline
-import torch
-model_id = "openai/gpt-oss-120b"
-pipe = pipeline(
-    "text-generation",
-    model=model_id,
-    torch_dtype="auto",
-    device_map="auto",
-)
-messages = [
-    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
-]
-outputs = pipe(
-    messages,
-    max_new_tokens=256,
-)
-print(outputs[0]["generated_text"][-1])
-```
-Alternatively, you can run the model via [`Transformers Serve`](https://huggingface.co/docs/transformers/main/serving) to spin up a OpenAI-compatible webserver:
-```
-transformers serve
-transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-120b
-```
-[Learn more about how to use gpt-oss with Transformers.](https://cookbook.openai.com/articles/gpt-oss/run-transformers)
 ## vLLM
 vLLM recommends using [uv](https://docs.astral.sh/uv/) for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.
@@ -262,11 +217,57 @@ See [Red Hat Openshift AI documentation](https://docs.redhat.com/en/documentatio
 [Learn more about how to use gpt-oss with vLLM.](https://cookbook.openai.com/articles/gpt-oss/run-vllm)
 ## PyTorch / Triton
 To learn about how to use this model with PyTorch and Triton, check out our [reference implementations in the gpt-oss repository](https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-pytorch-implementation).
-## Ollama
 If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after [installing Ollama](https://ollama.com/download).
@@ -278,6 +279,8 @@ ollama run gpt-oss:120b
 [Learn more about how to use gpt-oss with Ollama.](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama)
 #### LM Studio
 If you are using [LM Studio](https://lmstudio.ai/) you can use the following commands to download.

 # Inference examples
 ## vLLM
 vLLM recommends using [uv](https://docs.astral.sh/uv/) for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.
 [Learn more about how to use gpt-oss with vLLM.](https://cookbook.openai.com/articles/gpt-oss/run-vllm)
+## Transformers
+You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony](https://github.com/openai/harmony) package.
+To get started, install the necessary dependencies to setup your environment:
+```
+pip install -U transformers kernels torch
+```
+Once, setup you can proceed to run the model by running the snippet below:
+```py
+from transformers import pipeline
+import torch
+model_id = "openai/gpt-oss-120b"
+pipe = pipeline(
+    "text-generation",
+    model=model_id,
+    torch_dtype="auto",
+    device_map="auto",
+)
+messages = [
+    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
+]
+outputs = pipe(
+    messages,
+    max_new_tokens=256,
+)
+print(outputs[0]["generated_text"][-1])
+```
+Alternatively, you can run the model via [`Transformers Serve`](https://huggingface.co/docs/transformers/main/serving) to spin up a OpenAI-compatible webserver:
+```
+transformers serve
+transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-120b
+```
+[Learn more about how to use gpt-oss with Transformers.](https://cookbook.openai.com/articles/gpt-oss/run-transformers)
 ## PyTorch / Triton
 To learn about how to use this model with PyTorch and Triton, check out our [reference implementations in the gpt-oss repository](https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-pytorch-implementation).
+<details>
+  <summary><strong>Ollama</strong></summary>
 If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after [installing Ollama](https://ollama.com/download).
 [Learn more about how to use gpt-oss with Ollama.](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama)
+</details>
 #### LM Studio
 If you are using [LM Studio](https://lmstudio.ai/) you can use the following commands to download.