Kebob
/

DeepSeek-V3.2-CPU-NUMA2-AMXINT4

Model card Files Files and versions

Kebob commited on 7 days ago

Commit

8defcde

·

verified ·

1 Parent(s): c8106bb

Update README.md

Files changed (1) hide show

README.md +57 -3

README.md CHANGED Viewed

@@ -1,3 +1,57 @@
----
-license: mit
----

+---
+license: mit
+base_model:
+  - deepseek-ai/DeepSeek-V3.2
+tags:
+  - ktransformers
+  - amx
+  - amxint4
+  - cpu
+  - numa2
+---
+## `ktransformers` CPU NUMA2 AMXINT4 quantizations of DeepSeek-V3.2
+Quantized using `ktransformers` (06982524842b20590bbb4c36204b7333ab760448) using:
+```bash
+kt-kernel/scripts/convert_cpu_weights.py \
+  --input-path /DeepSeek-V3.2/ \
+  --input-type fp8 \
+  --output /DeepSeek-V3.2-CPU-NUMA2-AMXINT4/ \
+  --quant-method int4 \
+  --cpuinfer-threads 56 \
+  --threadpool-count 2
+```
+## Running on sm120 (RTX 6000 Pro Blackwell)
+Differences from the [official instructions](https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/kt-kernel/deepseek-v3.2-sglang-tutorial.md):
+  - DeepGEMM does not support sm_120 right now so it must be disabled, see https://github.com/deepseek-ai/DeepGEMM/issues/236
+  - Cannot get FP8 KV-cache to work, so need to use BF16.
+  - triton crashes with out of shared memory so using flashinfer
+```bash
+SGLANG_ENABLE_JIT_DEEPGEMM=false \
+CUDA_VISIBLE_DEVICES=0 \
+    uv run -m \
+        sglang.launch_server \
+        --host 0.0.0.0 --port 60000 \
+        --model /DeepSeek-V3.2/ \
+        --kt-weight-path /DeepSeek-V3.2-CPU-NUMA2-AMXINT4/ \
+        --kt-cpuinfer 56 --kt-threadpool-count 2 --kt-num-gpu-experts 24 --kt-method AMXINT4 \
+        --attention-backend flashinfer \
+        --trust-remote-code \
+        --mem-fraction-static 0.98 \
+        --chunked-prefill-size 4096 \
+        --max-running-requests 32 \
+        --max-total-tokens 131072 \
+        --enable-mixed-chunk \
+        --tensor-parallel-size 1 \
+        --enable-p2p-check \
+        --disable-shared-experts-fusion \
+        --tool-call-parser deepseekv32 \
+        --reasoning-parser deepseek-v3 \
+        --kv-cache-dtype bf16
+```