397B on a 128 GB Mac — first ever.
JANG_1L at 112 GB disk (120 GB GPU peak) fits on a 128 GB MacBook Pro and scores 86.5% MMLU with reasoning.
MLX at 2-bit and 3-bit produces NaN — the model is too complex for standard quantization at low bit widths.
MLX 4-bit runs at 94% but needs ~280 GB, far beyond any laptop. JANG_2L at 187 GB hits 92% on an M4 Ultra 256 GB.
First working Nemotron-H quantization for Apple Silicon.
NVIDIA’s hybrid architecture combines Mamba-2 SSM, Latent MoE, and standard attention —
MLX 3-bit is broken on it. JANG_4M at 63 GB scores 93% MMLU with reasoning at 55 tok/s.
JANG_2L fits on a 64 GB Mac at 43 GB with 86% MMLU.
MiniMax-M2.5 (230B) — JANG vs MLX
JANG
JANG_2L
82.5 GB · 2.10 bits · 0.9s per question
74.0%
MMLU (200q) · 148/200
+47.5 points · MLX broken at ALL bit levels
MLX
4-bit
119.8 GB · 4.0 bits · 0.9s per question
26.5%
MMLU (200q) · 53/200
MLX is completely broken on MiniMax at every bit level — 4-bit (26.5%), 3-bit (24.5%), and 2-bit (25%) all score near random. JANG_2L at just 2.10 bits is the only way to run MiniMax quantized on Apple Silicon.
Per-subject breakdown — MiniMax-M2.5 (230B) — all methods
科目
JANG_2L
MLX 4-bit
MLX 3-bit
MLX 2-bit
Abstract Algebra
10/20
3/20
2/20
5/20
Anatomy
15/20
7/20
5/20
5/20
Astronomy
20/20
7/20
6/20
4/20
College CS
13/20
4/20
5/20
6/20
College Physics
13/20
8/20
6/20
6/20
HS Biology
18/20
4/20
5/20
6/20
HS Chemistry
18/20
4/20
5/20
5/20
HS Mathematics
8/20
6/20
6/20
3/20
Logical Fallacies
18/20
5/20
4/20
5/20
World Religions
15/20
5/20
5/20
5/20
Total
148/200 (74%)
53/200 (26.5%)
49/200 (24.5%)
50/200 (25%)
JANG wins all 10 subjects against all MLX methods. MLX 4-bit, 3-bit, and 2-bit all score near random (25%). Root cause: MLX generates meta-commentary instead of direct answers on this model.
Qwen3.5-122B-A10B — ~4 bits
JANG
JANG_4K
71 GB · 3.99 bits · ~40 tok/s
86%
MMLU (200q) · 172/200
+1 point vs MLX 4-bit
MLX
4-bit
64 GB · 4.0 bits · ~50 tok/s
85%
MMLU (200q) · 170/200
Per-subject breakdown — 122B ~4 bits
科目
JANG_4K
MLX 4-bit
Abstract Algebra
16/20
15/20
Anatomy
19/20
18/20
Astronomy
19/20
19/20
College CS
15/20
15/20
College Physics
14/20
14/20
HS Biology
19/20
19/20
HS Chemistry
18/20
18/20
HS Mathematics
14/20
14/20
Logical Fallacies
19/20
19/20
World Religions
19/20
19/20
Total
172/200 (86%)
170/200 (85%)
JANG wins 2 subjects, ties 8. Neck-and-neck at ~4 bits.
Qwen3.5-122B-A10B — ~2 bits
JANG
JANG_2S
44 GB · 2.11 bits · ~45 tok/s
79%
MMLU (200q) · 158/200
+22.5 points
MLX
2-bit
36 GB · 2.0 bits · ~52 tok/s
56.5%
MMLU (200q) · 113/200
Per-subject breakdown — 122B ~2 bits
科目
JANG_2S
MLX 2-bit
Abstract Algebra
9/20
9/20
Anatomy
18/20
11/20
Astronomy
20/20
16/20
College CS
14/20
8/20
College Physics
15/20
10/20
HS Biology
19/20
15/20
HS Chemistry
18/20
13/20
HS Mathematics
11/20
4/20
Logical Fallacies
16/20
13/20
World Religions
18/20
14/20
Total
158/200 (79%)
113/200 (56.5%)
JANG wins 9 of 10 subjects, ties 1 (Abstract Algebra).
Qwen3.5-35B-A3B — ~4 bits
JANG
JANG_4K
20.1 GB · 3.99 bits · ~100 tok/s
77.5%
MMLU (200q) · 155/200
+2 points
MLX
4-bit
18.2 GB · 4.0 bits · ~110 tok/s
75.5%
MMLU (200q) · 151/200
Per-subject breakdown — 35B ~4 bits
科目
JANG_4K
MLX 4-bit
Abstract Algebra
12/20
10/20
Anatomy
17/20
16/20
Astronomy
18/20
18/20
College CS
14/20
15/20
College Physics
14/20
13/20
HS Biology
18/20
18/20
HS Chemistry
17/20
17/20
HS Mathematics
10/20
8/20
Logical Fallacies
18/20
19/20
World Religions
17/20
17/20
Total
155/200 (77.5%)
151/200 (75.5%)
JANG wins 4 subjects, loses 2 (College CS, Logical Fallacies), ties 4.
Qwen3.5-35B-A3B — ~2 bits
JANG
JANG_2S
12.8 GB · 2.17 bits · fits 16 GB RAM
65.5%
MMLU (200q) · 131/200
+25 points
MLX
2-bit
12.8 GB · ~2.5 bits
~40%
MMLU (est. from 34% at 50q)
Per-subject breakdown — 35B ~2 bits (JANG only)
科目
JANG_2S
MLX 2-bit
Abstract Algebra
8/20
—
Anatomy
14/20
—
Astronomy
19/20
—
College CS
14/20
—
College Physics
11/20
—
HS Biology
16/20
—
HS Chemistry
14/20
—
HS Mathematics
5/20
—
Logical Fallacies
14/20
—
World Religions
16/20
—
Total
131/200 (65.5%)
~40% (est.)
MLX 2-bit 200q not yet tested. Estimate based on 34% at 50 questions.
Test methodology & conditions
MMLU: 200-question subset (10 subjects × 20 questions each), thinking disabled, temperature 0.0. Hardware: Apple M4 Max 128 GB unified memory. Quantization: MLX affine quantization, group_size=64. JANG uses variable bit widths via quant_predicate. Models: All methods use the same base model weights. JANG stays quantized in GPU memory using MLX’s native quantized_matmul — no float16 expansion. Reproducibility: All scores verified from HuggingFace model cards. Code at github.com/jjang-ai/jangq.
MLX’s mixed_2_6 mode protects select v_proj and down_proj layers at 6-bit, but does not account for
GatedDeltaNet linear attention layers, MoE expert routing tensors, or hybrid architecture components.
JANG’s tier system classifies these architecture-specific tensors explicitly.
On this hybrid MoE model, MLX mixed_2_6 does not improve over 2-bit.
The mixed_2_6 heuristic targets v_proj and down_proj in standard transformer layers
but misses GatedDeltaNet attention and MoE routing tensors that are critical for this architecture.
Qwen3.5-122B-A10B — 1220亿参数,正面对比
MoE 256 experts, top-8, 10B activeJANG_2L vs 2-bitM4 Max 128 GB
JANG_2L · 2.19 bits
45.3 GB RAM · 38–49 tok/s
2-bit · 2.0 bits
35.6 GB RAM · 52–65 tok/s
“What is photosynthesis?”
“process by which green plants, algae, and some bacteria convert light energy into chemical energy in the form of glucose”
“Photos-sense” then “y = y = y” degenerate
“Three planets larger than Earth?”
Uses <think> reasoning tags, lists Jupiter with details
Misreads as “larger than Earth’s moon”, rambles
“Capital of France?”
“Paris” with government details
“Paris, on the banks of the River Seine” — both correct
Apple M4 Max 128 GB / M4 Ultra 256 GB · MMLU: 200-question (10 subjects × 20), reasoning enabled for 397B and Nemotron, thinking disabled for others · 2026-03
Qwen3.5-397B: JANG_1L at 112 GB (120 GB GPU peak) fits on 128 GB Macs — 86.5% MMLU with reasoning, 36 tok/s. JANG_2L at 187 GB hits 92% on M4 Ultra 256 GB. MLX 2/3-bit: NaN. MLX 4-bit: 94% but ~280 GB.
Nemotron-3-Super-120B: JANG_4M at 63 GB scores 93% MMLU, 55 tok/s. JANG_2L at 43 GB scores 86%, fits 64 GB Macs. MLX 3-bit: broken. First working Nemotron-H quantization for Apple Silicon.
MiniMax-M2.5 (230B): JANG_2L scores 74% MMLU at 82.5 GB vs MLX 4-bit at 26.5% (119.8 GB). MLX broken at ALL bit levels (26.5%, 24.5%, 25%). JANG is the only way to run MiniMax quantized.
Pipeline verification: JANG_4S matches MLX 4-bit exactly on 35B MMLU (82% = 82%), confirming the quantization pipeline is lossless at matched bit widths.
Hybrid: 24 linear + 8 full attnJANG_2S2.5 eff. bitsM4 Max · 107 GB
At 2.5 effective bits, JANG_2S gets 6/6 correct while
2-bit gets 0/6. JANG protects the 8 critical full-attention layers at 6-bit
while compressing the 24 linear-attention layers and all MLP at 2-bit.
“What is 2+2?”
JANG: “The answer is 4.”
2-bit: “2+2? 2+2? 2+2?”
“Is a tomato a fruit?”
JANG: “A tomato is a fruit, not a vegetable.”
2-bit: “1 1 1 1 1 1 1 1”
“Who wrote Romeo and Juliet?”
JANG: Answers correctly
2-bit: “10, 10, 10, 10”
“What is photosynthesis?”
JANG: Correct definition
2-bit: Garbled text
“How many legs does a spider have?”
JANG: Answers correctly
2-bit: “10, 10, 10”
“Largest ocean on Earth?”
JANG: “The Pacific Ocean.”
2-bit: Infinite loop
亮点 — 7B模型
Mistral-7B-v0.3
Mistral GQA 4:1JANG_3M3.4 bitsM4 Max
"光合作用是什么?"
JANG_3M (3.4 bits)
“Photosynthesis is the process by which plants and some other organisms...”
Photosynthesis is the process by which green plants and some other organisms use sunlight to synthesize foods from carbon dioxide and water. It generally involves the green pigment chlorophyll and generates oxygen as a byproduct.