Run Qwen3.5-397B-A17B-NVFP4 on Copilot+ PC

The fastest way to get this model running locally is via Optional Features.

Kindly follow the on-screen instructions below.

An automated background process downloads all required large-scale files.

The script runs a quick hardware check to dynamically adjust parameters for elite speed.

🔐 Hash sum: 7a31a953b86deecc6554b59043e0c49a | 📅 Last update: 2026-06-29



  • CPU: 8-core / 16-thread recommended for orchestration
  • RAM: high-speed DDR5 memory preferred for CPU offloading
  • Storage: extra room for future model updates and datasets
  • Graphics: TensorRT-LLM / vLLM inference engine compatible chip

The Qwen3.5-397B-A17B-NVFP4 model represents a major leap in large language model efficiency, combining a 397‑billion parameter architecture with the ultra‑low‑precision NVFP4 data type.

By leveraging NVFP4 quantization, the model achieves a dramatic reduction in memory footprint while preserving near‑full‑precision performance, making it ideal for deployment on consumer‑grade GPUs.

Benchmarks show that the model delivers sub‑50 ms inference latency and a throughput of over 200 tokens per second on standard hardware, outperforming previous 400B‑scale models.

Its training pipeline incorporates a novel mixture‑of‑experts routing scheme that balances load across the A17B accelerator cluster, resulting in stable convergence and robust multilingual capabilities.

The integrated

Model Parameters Precision Latency (ms) Throughput (tokens/s)
Qwen3.5-397B-A17B-NVFP4 397B NVFP4 <50 >200

provides a quick comparison with competing models, highlighting parameter count, precision, latency, and throughput in a concise format.

  1. Script downloading custom layout analysis models for local PDF processing
  2. Deploy Qwen3.5-397B-A17B-NVFP4
  3. Script fetching optimized Qwen model variants for terminal-based chat
  4. Deploy Qwen3.5-397B-A17B-NVFP4
  5. Setup tool adjusting host operating system paging variables for large model weights
  6. Qwen3.5-397B-A17B-NVFP4 via WebGPU (Browser) Uncensored Edition Direct EXE Setup
  7. Script automating download of Stable Diffusion 3.5 medium checkpoints
  8. Quick Run Qwen3.5-397B-A17B-NVFP4 Local Guide
  9. Script automating download of Stable Diffusion 3.5 Large hyper-networks
  10. Qwen3.5-397B-A17B-NVFP4 on Copilot+ PC Complete Walkthrough
  11. Setup utility adjusting flash-decoding memory buffers within local runtime spaces
  12. Quick Run Qwen3.5-397B-A17B-NVFP4 For Low VRAM (6GB/8GB) Dummy Proof Guide FREE