Grok API Pricing Explained: Every Model, Every Cost, and How It Compares (2026)
Grok 4.20 starts at $2/M tokens with a 2M context window. Here's a complete breakdown of xAI's pricing tiers, tools costs, and how it stacks up against GPT and Claude.
xAI's Grok API has emerged as one of the most aggressive pricing plays in the LLM space. With flagship models starting at $2 per million tokens, a 2-million-token context window, and a batch API that cuts costs in half, xAI is clearly going after developers who are tired of paying GPT-4o prices for workloads that don't need GPT-4o capability.
But the pricing structure has layers: there are fast and standard variants, reasoning and non-reasoning modes, tools with per-call fees stacked on top of token costs, voice APIs billed by the minute, and a batch discount buried in the docs that most developers miss. This article breaks all of it down in one place.
The Model Lineup: What You're Actually Choosing Between
xAI currently offers two active tiers of language models on its API, plus image generation, video generation, and voice models. Here's the full breakdown.
Grok 4.20: The Flagship
Grok 4.20 is xAI's current top-tier language model. It comes in two variants: grok-4.20-0309-reasoning and grok-4.20-0309-non-reasoning. Both share the same pricing:
- Input tokens: $2.00 per million
- Output tokens: $6.00 per million
- Context window: 2,000,000 tokens
- Rate limits: 10M tokens per minute, 1,800 requests per minute
The reasoning variant activates internal chain-of-thought processing before responding. It scores higher on complex reasoning benchmarks but will consume more output tokens in the process since those reasoning steps count toward your bill.
There's also grok-4.20-multi-agent-0309, the multi-agent variant that ships with a built-in four-agent collaborative architecture. It's priced identically at $2.00/$6.00 per million input/output tokens. xAI describes this as the differentiating architecture: four specialized agents (named Grok, Harper, Benjamin, and Lucas) collaborating on complex tasks internally. If you're building orchestration layers yourself, this is worth evaluating against the overhead of rolling your own multi-agent coordination.
Note for developers migrating from Grok 3: Grok 4 is a reasoning model and does not support presencePenalty, frequencyPenalty, or stop parameters. Sending these will return an error. There is also no reasoning_effort parameter. If you're porting prompts from earlier versions, clean those parameters out of your payloads before you switch.
Grok 4.1 Fast: The High-Volume Workhorse
The grok-4-1-fast-reasoning and grok-4-1-fast-non-reasoning models are where the pricing picture gets genuinely interesting for production workloads. At $0.20 per million input tokens and $0.50 per million output tokens, these are among the cheapest capable language models available from any major provider.
The context window is identical to the flagship: 2,000,000 tokens. Rate limits are the same: 10M tpm, 1,800 rpm.
If you're running high-volume classification, summarization, extraction, or routing tasks where you don't need frontier-level reasoning, Grok 4.1 Fast is the number to beat. The 10:1 price difference between Grok 4.20 and Grok 4.1 Fast gives you enormous headroom to route tasks intelligently.
Image Generation: Grok Imagine
xAI has two image generation models priced per output image:
| Model | Rate Limit | Price |
|---|---|---|
grok-imagine-image-pro |
30 rpm, 1 rps | $0.07 / image |
grok-imagine-image |
300 rpm, 1 rps | $0.02 / image |
At $0.02 per image for the standard model, this is competitive with Stable Diffusion API pricing and cheaper than DALL-E 3. The pro model at $0.07 is still reasonable for production image pipelines where quality matters.
Video Generation: Grok Imagine Video
grok-imagine-video is priced per second of output at $0.05/second, billed at 60 rpm and 1 rps. Note the "⊘" symbol on the pricing page, which indicates this model is currently in limited availability. Confirm access before building a pipeline that depends on it.
Voice and Audio
Voice Agent API (Realtime): $0.05 per minute ($3.00 per hour). This is a WebSocket-based realtime voice API with a maximum session duration of 30 minutes and 100 concurrent sessions per team. Function calling, web search, X search, collections, and MCP are all available during a voice session, but each tool invocation is billed separately on top of the per-minute rate.
Text to Speech: $4.20 per million input characters. Multiple voices, streaming and batch output, and support for MP3, WAV, PCM, μ-law, and A-law formats. Rate limit is 600 rpm and 10 rps.
Tools: The Pricing Layer Most Developers Miss
Token costs are the headline, but if you're using xAI's server-side tools, you're also paying per invocation. This is on top of the token cost for the model handling your request.
| Tool | Cost |
|---|---|
| Web Search | $5.00 per 1,000 calls |
| X Search | $5.00 per 1,000 calls |
| Code Execution | $5.00 per 1,000 calls |
| File Attachments | $10.00 per 1,000 calls |
| Collections Search (RAG) | $2.50 per 1,000 calls |
| Image Understanding | Token-based (no per-call fee) |
| X Video Understanding | Token-based (no per-call fee) |
| Remote MCP Tools | Token-based (no per-call fee) |
The key thing to understand here: in agentic workflows, the model decides how many tool calls to make. If you send a query that requires multiple web searches and a code execution step, you're paying for each invocation separately. On a complex agentic task that triggers 10 tool calls, you could easily add $0.05 to the base token cost of the request. At scale, tool invocation costs can rival or exceed token costs for certain workload types.
Collections search at $2.50 per 1,000 calls is the cheapest tool invocation available. If you're building RAG pipelines on Grok, this is the right way to attach your document corpus rather than stuffing everything into the context window.
File attachments at $10.00 per 1,000 calls is the most expensive. Use collections instead wherever your use case permits it.
The Batch API: 50% Off Everything
xAI offers a batch API that cuts all token costs in half. Input tokens, output tokens, reasoning tokens, cached tokens: all 50% off. Standard rates still apply to image and video generation in batch mode.
Batch requests are processed asynchronously, typically completing within 24 hours. Batch requests don't count against your per-minute rate limits, which means you can run very large jobs without throttling.
For offline workloads: document processing, large-scale classification, embedding generation, dataset annotation, anything where you don't need a response in real time: the batch API should be your default. Paying standard rates for async workloads is leaving significant money on the table.
Prompt Caching
xAI automatically caches prompt tokens for repeated or near-identical requests. Cached prompt tokens are billed at a lower rate than fresh input tokens. Caching is enabled by default and requires no configuration. You can check the "usage" object in the API response to see how many tokens were served from cache on any given request.
For applications with long system prompts, static context blocks, or document prefixes that stay constant across requests, prompt caching can meaningfully reduce costs. The exact cached token discount rate varies by model; check the model's detail page on the xAI console for specifics.
Model Aliases: Stable vs. Latest
xAI uses a three-tier alias system:
<modelname>: Aliased to the latest stable version. Safe for production.<modelname>-latest: Aliased to the latest version, including pre-stable releases. Good for staying current but could introduce breaking changes.<modelname>-<date>: Pinned to a specific release. Use this when you need absolute consistency in behavior across deployments.
If you're running production workloads where unexpected behavior changes are costly, pin to dated model strings. If you're fine with automatic upgrades, the base alias is more convenient.
What Grok Doesn't Know Without Search
Worth calling out explicitly: Grok's knowledge cutoff is November 2024. Without web search or X search tools enabled, the model has no access to anything after that date. If you're building applications that require current events, real-time prices, live sports data, or any other time-sensitive information, you need to explicitly enable the search tools. This isn't unique to Grok, but it's something to design around from the start rather than discover in production.
The Competitive Landscape: How Grok API Pricing Stacks Up
This is where context matters. Grok's pricing is aggressive, but "cheap" only helps you if the model can do the job.
Token Pricing: Head-to-Head
| Provider | Model | Input (per 1M) | Output (per 1M) |
|---|---|---|---|
| xAI | Grok 4.1 Fast | $0.20 | $0.50 |
| xAI | Grok 4.20 | $2.00 | $6.00 |
| Gemini 3.1 Pro | $2.00 | $12.00 | |
| OpenAI | GPT-5.2 | $1.75 | $14.00 |
| Anthropic | Claude Sonnet 4.6 | $3.00 | $15.00 |
| Anthropic | Claude Opus 4.6 | $5.00 | $25.00 |
Grok 4.1 Fast is in a league of its own on price. For high-volume workloads, a 10x cost reduction compared to GPT or Claude is not a rounding error. That's the difference between a product that's economically viable and one that isn't.
At the flagship level, Grok 4.20's $2.00/$6.00 is cheaper on output than both GPT-5.2 ($14.00) and Claude Sonnet 4.6 ($15.00). The output token price is often where your actual spend concentrates in production (models generate more tokens than you send), so this is the number that matters most in many workloads.
Context Window
Every Grok language model offers a 2,000,000-token context window. That's competitive with Gemini's extended context offerings and larger than what most GPT and Claude tiers provide without moving to enterprise contracts. For document-heavy workloads, this is material.
Where Grok Doesn't Win
Benchmark performance tells a more nuanced story. On coding tasks (SWE-bench), Grok 4 and GPT-5.4 are essentially tied at around 75%. On scientific reasoning (GPQA Diamond), Gemini leads at 94.3%. Claude Opus 4.6 tops the pack on expert task benchmarks (1606 Elo) and is the choice for complex legal, compliance, and analytical workflows where output depth matters more than output price.
Grok's built-in X integration and real-time data access via X Search is genuinely differentiated: no other provider has this. For applications where live social data, trending topics, or X post analysis are core features, Grok is the only reasonable choice.
The Batch API Advantage
xAI's 50% batch discount aligns with Anthropic's but is more aggressively marketed. If you're comparing Grok 4.1 Fast batch ($0.10/$0.25 per million tokens) against any other provider's batch pricing, Grok wins on raw token cost without much competition.
Rate Limits: What's Available by Default
xAI's language models come with 10 million tokens per minute and 1,800 requests per minute as default limits. These are generous defaults for most workloads. If you need more, xAI offers provisioned throughput (dedicated capacity allocation) for teams that need predictable, high-volume throughput.
Voice Agent API is capped at 100 concurrent sessions per team. Text to Speech allows 600 requests per minute. Image generation limits vary: 300 rpm for the standard model, 30 rpm for pro.
There's also a usage guideline violation fee worth being aware of: if your request is flagged as violating xAI's usage policies before generation starts, you're charged $0.05 per request anyway. Design your input validation accordingly.
Node.js Quickstart: Making Your First Grok API Call
xAI's API is OpenAI-compatible, which means you can use the OpenAI SDK pointed at a different base URL. If you're already using the OpenAI client in Node.js, switching to Grok requires changing two lines:
const OpenAI = require('openai');
const client = new OpenAI({
apiKey: process.env.XAI_API_KEY,
baseURL: 'https://api.x.ai/v1',
});
async function callGrok(prompt) {
const response = await client.chat.completions.create({
model: 'grok-4-1-fast-non-reasoning',
messages: [
{ role: 'user', content: prompt }
],
max_tokens: 1024,
});
return response.choices[0].message.content;
}
callGrok('Explain prompt caching in one paragraph.')
.then(console.log)
.catch(console.error);
For reasoning models, note that stop, presencePenalty, and frequencyPenalty are not supported. Strip those from any prompt configs you're porting from GPT or Claude.
For batch requests, use the Batch API endpoint. Responses come back via polling or webhook, not streaming.
Routing Strategy: Which Model for Which Workload
Given the 10x price spread between Grok 4.1 Fast and Grok 4.20, the right architecture for most production applications isn't "pick one model and use it everywhere." It's a routing layer.
A practical routing strategy:
Route to Grok 4.1 Fast (non-reasoning) for: Classification, entity extraction, summarization, short-form content generation, routing decisions themselves, any task where the input and expected output are well-structured and the failure mode is recoverable.
Route to Grok 4.1 Fast (reasoning) for: Multi-step logic problems where chain-of-thought helps but you don't need frontier capability. Code review with defined criteria. Structured analysis tasks.
Route to Grok 4.20 for: Complex code generation, tasks requiring the multi-agent architecture, workloads where output quality directly impacts user experience or business outcomes, anything that regularly fails on the faster models.
Use batch API for: All offline processing. Document ingestion, pre-computation, dataset labeling, analytics runs.
This routing pattern is not unique to Grok, but Grok's pricing makes the economics of getting it right particularly compelling. Getting 80% of your traffic onto Grok 4.1 Fast at $0.20/$0.50 instead of Grok 4.20 at $2.00/$6.00 is a 10x cost reduction on that traffic slice.
Is the Grok API Worth It?
Here's the honest take.
If you're evaluating Grok purely on token cost, it's hard to beat for high-volume workloads. Grok 4.1 Fast at $0.20/$0.50 per million tokens is the cheapest capable API available from a major western provider. For businesses running millions of LLM calls per month, this is a legitimate cost-reduction opportunity.
If you need real-time X data or social media intelligence as part of your application, Grok is the only provider with native X Search integration. That's not a price argument; it's a capability argument.
If you need the absolute best performance on complex reasoning, coding, or scientific tasks, the benchmark data suggests Grok 4.20 is competitive with GPT-5.2 and Claude Sonnet 4.6 but not definitively better. Claude Opus 4.6 still leads on expert task benchmarks; Gemini leads on scientific reasoning. The "best model" question depends entirely on your specific task distribution.
What Grok is not: a replacement for every use case in your stack. It's a competitive option that makes particular sense for cost-sensitive production workloads, X-integrated applications, and teams that want a large context window without paying a premium for it.
Set up an API key, run your actual workloads through Grok 4.1 Fast, and benchmark against what you're paying today. The pricing is compelling enough that it's worth the hour to test.
xAI's pricing and model availability change frequently. Verify current rates at docs.x.ai/developers/models before making architectural decisions based on this article.