QuantTrio
/

DeepSeek-V3.2-AWQ

Text Generation

4-bit precision

Model card Files Files and versions

tclf90 commited on 7 days ago

Commit

340023c

·

verified ·

1 Parent(s): 591a8e2

Update README.md

[DOC] update vllm startup command

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -58,6 +58,11 @@ see [Official vLLM Deepseek-V3.2 Guide](https://docs.vllm.ai/projects/recipes/en
 ```
 export VLLM_USE_DEEP_GEMM=0  # ATM, this line is a "must" for Hopper devices
 CONTEXT_LENGTH=32768
 vllm serve \
     __YOUR_PATH__/QuantTrio/DeepSeek-V3.2-AWQ \

 ```
 export VLLM_USE_DEEP_GEMM=0  # ATM, this line is a "must" for Hopper devices
+export TORCH_ALLOW_TF32_CUBLAS_OVERRIDE=1
+export VLLM_USE_FLASHINFER_MOE_FP16=1
+export VLLM_USE_FLASHINFER_SAMPLER=0
+export OMP_NUM_THREADS=4
 CONTEXT_LENGTH=32768
 vllm serve \
     __YOUR_PATH__/QuantTrio/DeepSeek-V3.2-AWQ \