According to the official Microsoft Blog, the company introduced Maia 200, a new AI accelerator chip built for inference tasks. The chip uses TSMC’s 3nm manufacturing process and contains over 140 billion transistors. Microsoft says it delivers 30% better performance per dollar than current hardware in its fleet.
Technical Design and Performance
Maia 200 delivers over 10 petaFLOPS in 4-bit precision and over 5 petaFLOPS of 8-bit performance. The chip operates within a 750W power envelope. It includes native FP8 and FP4 tensor cores designed for low-precision compute tasks.
The chip features 216GB HBM3e memory running at 7 TB/s. It also contains 272MB of on-chip SRAM. Microsoft designed a memory subsystem with a specialized DMA engine and NoC fabric to improve data movement and token throughput.
Network Architecture
The system uses a two-tier network design built on standard Ethernet. Each accelerator provides 2.8 TB/s of bidirectional scaleup bandwidth. The architecture supports clusters of up to 6,144 accelerators with predictable performance.
Four Maia chips connect directly within each tray using non-switched links. The same Maia AI transport protocol works for intra-rack and inter-rack networking. This design reduces network hops and simplifies programming across nodes.
Deployment and Availability
Microsoft deployed Maia 200 in its US Central datacenter near Des Moines, Iowa. The US West 3 region near Phoenix, Arizona will come next. More regions will follow later.
The chip will serve multiple models, including GPT-5.2 from OpenAI. Microsoft Foundry and Microsoft 365 Copilot will use the hardware. The Superintelligence team plans to use Maia 200 for synthetic data generation and reinforcement learning.
Microsoft released the Maia SDK as a preview. The kit includes PyTorch integration, a Triton compiler and an optimized kernel library. Developers can access a low-level programming language for fine-grained control. The SDK also offers a simulator and cost calculator for optimization.
Microsoft says AI models ran on Maia 200 silicon within days of the first packaged part arrival. The company reduced time from first silicon to datacenter deployment by more than half compared to similar programs.