# JANGQ — Smaller Models, Bigger MLX Wins > Public AI-readable summary for https://jangq.ai. > The benchmark page is intentionally strict: it only lists cases where JANG is smaller than the closest MLX baseline and wins by a large quality margin. > Open source, Apache 2.0. GitHub: https://github.com/jjang-ai/jangq. Models: https://huggingface.co/JANGQ-AI. ## Listing Policy JANGQ does not list every tested model on the public homepage. Included results must satisfy all of the following: 1. JANG uses less memory, less disk, or a lower average bit width than the closest MLX comparison. 2. JANG wins by a decisive MMLU margin, coherency margin, or measured error margin. 3. The comparison uses the same base model family and a nearby MLX quantization target. 4. The result is a proven benchmark or prompt-level failure comparison, not a speculative claim. Close wins, same-size-only wins, larger JANG profiles, and broad recent uploads are excluded from the visible model list. ## Proven Smaller-Win Set ### MiniMax-M2.5 — JANG_2L vs MLX 4-bit The largest public win on the page. | Method | Avg bits | GPU memory | Disk | MMLU | | --- | ---: | ---: | ---: | ---: | | JANG_2L | 2.10 | 82.5 GB | 89 GB | 74.0% | | MLX 4-bit | 4.00 | 119.8 GB | 120 GB | 26.5% | Result: JANG is 37.3 GB smaller in GPU memory and wins by +47.5 MMLU points. Curated release: https://huggingface.co/JANGQ-AI/MiniMax-M2.5-JANG_2L ### Qwen3.5-122B-A10B — JANG_2M vs MLX mixed_2_6 The same-neighborhood comparison retained because JANG is slightly smaller and substantially better. | Method | GPU memory | MMLU | | --- | ---: | ---: | | JANG_2M | 44.7 GB | 79% | | MLX mixed_2_6 | 45.0 GB | 46% | Result: JANG is 0.3 GB smaller and wins by +33 MMLU points. ### Mistral-7B — Lower-Bit Coherency Wins | Prompt family | JANG result | MLX comparison | Outcome | | --- | --- | --- | --- | | Photosynthesis | JANG_3M at 3.4 bits | 3.5-bit MLX | JANG gives a coherent explanation; MLX degenerates into number sequences. | | Arithmetic | JANG_4S at 4.1 bits | 4.5-bit MLX | JANG answers directly; MLX loops the prompt. | Result: JANG uses fewer bits and keeps coherent output. ### Qwen2.5-3B — Lower-Bit Coherency And MSE Proof | Comparison | JANG result | MLX comparison | Outcome | | --- | --- | --- | --- | | Translation / factual QA | JANG_4S at about 4.1 bits | 4.5-bit MLX | JANG answers directly; MLX echoes or repeats the prompt. | | Logit MSE | JANG at 3.37 bits | 4.00-bit MLX | JANG has lower measured error while using fewer bits. | Result: JANG is smaller and stronger on both prompt behavior and measured error. ### SmolLM2-1.7B — Lower-Bit Direct Answer | Method | Avg bits | Prompt outcome | | --- | ---: | --- | | JANG_3M | 3.4 | Answers the spider-legs question directly. | | MLX 3-bit | 3.5 | Produces number-sequence output. | Result: JANG uses fewer bits and answers correctly. ### TinyLlama-1.1B — Lower-Bit Topic Retention | Method | Avg bits | Prompt outcome | | --- | ---: | --- | | JANG_4S | 4.1 | Stays on the water-formula question. | | MLX 4-bit | 4.5 | Derails to a different chemistry question. | Result: JANG uses fewer bits and avoids topic derailment. ## Removed From Public Model Listing The page no longer uses a broad “recent uploads” model feed as its primary proof list. Models are only shown when they pass the stricter smaller-and-decisive-win filter above. ## Runtime Notes JANG models are designed for native MLX execution on Apple Silicon with compressed weights staying quantized in GPU memory and dequantizing on the fly during inference. Example: ```bash pip install jang jang convert --model Qwen/Qwen2.5-3B --profile JANG_4S --output ./Qwen2.5-3B-JANG_4S/ jang run --model Qwen2.5-3B-JANG_4S.jang ``` ## Links - Homepage: https://jangq.ai/ - Curated model page: https://jangq.ai/models/ - GitHub: https://github.com/jjang-ai/jangq - Hugging Face account: https://huggingface.co/JANGQ-AI - vMLX Engine: https://vmlx.net/ - MLX Studio: https://mlx.studio/