xAI launched Grok-4-Fast, a cost-optimized successor to Grok-4 with a 2M-token context window. The model unifies “reasoning” and “non-reasoning” behaviors under one set of weights that respond to system prompts. It targets high-throughput search, coding, and Q&A, and includes native tool-use reinforcement learning for browsing, code execution, and tool calls.
According to Marktechpost, Grok-4-Fast aims to cut latency and token use while keeping accuracy close to Grok-4.
Unified model for real-time work
Earlier Grok releases split long “reasoning” and short “non-reasoning” responses across separate models. Grok-4-Fast uses a unified weight space and steers behavior with system prompts. This reduces end-to-end latency and token count in real-time use like search and interactive coding.
xAI trained the model end-to-end with tool-use RL. The company reports gains on search agent tasks, including BrowseComp 44.9%, SimpleQA 95.0%, and Reka Research 66.0%. Chinese variants saw higher scores, such as BrowseComp-zh 51.2%.
Benchmarks and “intelligence density”
xAI cites pass@1 results of 92.0% on AIME 2025 and 93.3% on HMMT 2025 without tools. It reports 85.7% on GPQA Diamond and 80.0% on LiveCodeBench Jan–May. The company says the model uses about 40% fewer “thinking” tokens than Grok-4 at similar accuracy.
In LMArena, grok-4-fast-search (codename “menlo”) ranks #1 in the Search Arena with 1163 Elo. The text variant (codename “tahoe”) sits at #8 in the Text Arena, described as near grok-4-0709.
Availability and pricing
The model is generally available in Grok’s Fast and Auto modes on web and mobile. Auto will select Grok-4-Fast on hard queries to improve latency and keep quality. For the first time, free users can access xAI’s latest model tier.
Developers get two SKUs, grok-4-fast-reasoning and grok-4-fast-non-reasoning, both with 2M context. xAI lists API pricing as $0.20 / 1M input tokens (<128k), $0.40 / 1M input tokens (≥128k), $0.50 / 1M output tokens (<128k), $1.00 / 1M output tokens (≥128k), and $0.05 / 1M cached input tokens.
xAI frames the efficiency as “intelligence density.” It claims a ~98% reduction in price to match Grok-4 on benchmarks when combining lower token use with new per-token rates.
Marktechpost links to xAI’s announcement and technical details at x.ai.