Kebob commited on
Commit
8defcde
·
verified ·
1 Parent(s): c8106bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -3
README.md CHANGED
@@ -1,3 +1,57 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - deepseek-ai/DeepSeek-V3.2
5
+ tags:
6
+ - ktransformers
7
+ - amx
8
+ - amxint4
9
+ - cpu
10
+ - numa2
11
+ ---
12
+
13
+ ## `ktransformers` CPU NUMA2 AMXINT4 quantizations of DeepSeek-V3.2
14
+
15
+ Quantized using `ktransformers` (06982524842b20590bbb4c36204b7333ab760448) using:
16
+
17
+ ```bash
18
+ kt-kernel/scripts/convert_cpu_weights.py \
19
+ --input-path /DeepSeek-V3.2/ \
20
+ --input-type fp8 \
21
+ --output /DeepSeek-V3.2-CPU-NUMA2-AMXINT4/ \
22
+ --quant-method int4 \
23
+ --cpuinfer-threads 56 \
24
+ --threadpool-count 2
25
+ ```
26
+
27
+ ## Running on sm120 (RTX 6000 Pro Blackwell)
28
+
29
+ Differences from the [official instructions](https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/kt-kernel/deepseek-v3.2-sglang-tutorial.md):
30
+
31
+ - DeepGEMM does not support sm_120 right now so it must be disabled, see https://github.com/deepseek-ai/DeepGEMM/issues/236
32
+ - Cannot get FP8 KV-cache to work, so need to use BF16.
33
+ - triton crashes with out of shared memory so using flashinfer
34
+
35
+ ```bash
36
+ SGLANG_ENABLE_JIT_DEEPGEMM=false \
37
+ CUDA_VISIBLE_DEVICES=0 \
38
+ uv run -m \
39
+ sglang.launch_server \
40
+ --host 0.0.0.0 --port 60000 \
41
+ --model /DeepSeek-V3.2/ \
42
+ --kt-weight-path /DeepSeek-V3.2-CPU-NUMA2-AMXINT4/ \
43
+ --kt-cpuinfer 56 --kt-threadpool-count 2 --kt-num-gpu-experts 24 --kt-method AMXINT4 \
44
+ --attention-backend flashinfer \
45
+ --trust-remote-code \
46
+ --mem-fraction-static 0.98 \
47
+ --chunked-prefill-size 4096 \
48
+ --max-running-requests 32 \
49
+ --max-total-tokens 131072 \
50
+ --enable-mixed-chunk \
51
+ --tensor-parallel-size 1 \
52
+ --enable-p2p-check \
53
+ --disable-shared-experts-fusion \
54
+ --tool-call-parser deepseekv32 \
55
+ --reasoning-parser deepseek-v3 \
56
+ --kv-cache-dtype bf16
57
+ ```