LLM Cost Calculator

Compare the monthly cost of cloud APIs vs self-hosting on your own GPU. Enter your token usage and find your break-even point.

Covers OpenAI, Anthropic, Google, Groq, Together AI, DeepSeek, RTX 4090, A100, and H100 local setups. Pricing verified 2026-06-06.

Plan LLM costs before they become infrastructure surprises

LLM Cost Calculator helps developers, founders, and operators compare hosted API usage with local GPU inference. The calculator combines monthly input tokens, output tokens, model pricing, GPU purchase or rental cost, electricity, and daily utilization into a single planning estimate. It is built for AI chatbots, RAG products, coding tools, internal automations, batch summarization jobs, and any product where token usage can become a meaningful operating cost.

Cloud API rates come from public provider pricing pages. GPU estimates use approximate market prices, rental rates, amortization periods, and power assumptions. The result is not a quote from any provider; it is a practical comparison to help you decide whether cloud APIs, rented GPUs, purchased hardware, or a hybrid deployment is worth evaluating next.

Full disclaimer and data sources

How to use this LLM cost calculator

1. Estimate usage

Start with expected monthly input and output tokens. Include hidden prompts, retrieval context, chat history, retries, and background jobs instead of only visible user messages.

2. Pick a model

Select a cloud model that matches the quality level your product needs. Smaller models can be cheaper, but failed retries and poor answers can raise the effective cost per successful task.

3. Compare GPU options

Choose whether you would buy or rent a GPU, then adjust amortization, power, and daily usage. A local setup only wins when utilization is high enough to cover fixed costs.

Token Usage

Monthly input tokens: 10M0.1M – 1,000M tokensMonthly output tokens: 3M0.1M – 300M tokens

Cloud API

Model

Released 2026-06 · current

Local Setup

GPU

Buy $1,800 · Rent $0.74/hr

Amortize over: 24 months12 – 48 monthsGPU usage: 12h/dayElectricity: $0.12/kWhUS avg: ~$0.12/kWh

Monthly Cost

Cloud API recommended

Cloud API

$8.80

per month

Self-hosted

$94.44

per month

Cloud API saves $85.64/month

Cost Breakdown

Input tokens (10M @ $0.4/1M)$4.00

Output tokens (3M @ $1.6/1M)$4.80

Cloud total$8.80

GPU amortization (NVIDIA RTX 4090 ÷ 24mo)$75.00

Electricity (450W × 12h/day)$19.44

Local total$94.44

Rent NVIDIA RTX 4090

via RunPod — from $0.74/hr

Get Started →

LLM cost guides

View all guides

2026-06-06 - 6 min read

Anthropic Claude API Pricing: Complete Cost Guide 2026

Exact pricing for every Claude model — Haiku, Sonnet, and Opus — with monthly cost estimates, a comparison to GPT-4 and Gemini, and guidance on which tier to choose.

2026-06-03 - 7 min read

How to Calculate LLM API Costs: A Practical Guide

Learn how input tokens, output tokens, context windows, caching, and traffic patterns affect monthly LLM API bills.

2026-06-03 - 8 min read

Self-Hosted GPU vs Cloud API: Cost, Reliability, and Operations

Compare buying or renting GPUs with using hosted LLM APIs across cost, uptime, maintenance, and scaling risk.

2026-06-03 - 7 min read

Claude vs GPT vs Gemini Pricing: What to Compare Before You Choose

A buyer's guide to comparing major LLM families beyond headline token prices.

Model Cost Comparisons

Claude Opus 4.8 vs GPT-5.5

Anthropic $5/1M input - OpenAI $5/1M input

GPT-5.5 vs Gemini 3.5 Flash

OpenAI $5/1M input - Google $1.5/1M input

Claude Sonnet 4.6 vs Gemini 3.5 Flash

Anthropic $3/1M input - Google $1.5/1M input

GPT-5.5 vs DeepSeek V4 Pro

OpenAI $5/1M input - DeepSeek $0.44/1M input

GLM-5.1 vs Claude Sonnet 4.6

Zhipu AI (GLM) $1.4/1M input - Anthropic $3/1M input

GLM-5.1 vs GPT-5.5

Zhipu AI (GLM) $1.4/1M input - OpenAI $5/1M input

GPT-5.4 Mini vs Gemini 3.1 Flash-Lite

OpenAI $0.75/1M input - Google $0.25/1M input

Llama 4 Scout (Groq) vs GPT-5.4 Mini

Groq $0.11/1M input - OpenAI $0.75/1M input

Pricing verified 2026-06-06

Cloud LLM API Pricing

OpenAI

The most widely-used LLM API. GPT-5.5 and GPT-5.4 are the current frontier tiers...

Anthropic

Claude 4 models excel at agentic coding, reasoning, and long documents. Claude O...

Google Gemini

Gemini 3.5 Flash is the latest frontier model. The 2.5 series offers excellent p...

Groq

Ultra-fast inference on dedicated LPU hardware — up to 1,000 tokens/sec. Best fo...

Together AI

Best prices for the latest open-source and frontier models. Supports Kimi K2.5, ...

DeepSeek

Chinese AI lab delivering frontier open-source models at industry-leading prices...

Qwen (Alibaba)

Alibaba's Qwen series via DashScope API. Qwen3.7 Max is the latest flagship; Qwe...

Doubao (ByteDance)

ByteDance's Doubao model family, served via Volcano Engine API. Seed 2.0 Pro is ...

Kimi (Moonshot AI)

Moonshot AI's Kimi series, known for exceptional long-context understanding and ...

Zhipu AI (GLM)

Tsinghua-affiliated AI lab. GLM-5.1 is the latest flagship; GLM-4.7 Flash delive...

Hunyuan (Tencent)

Tencent Cloud's Hunyuan models. TurboS is the high-speed inference variant; T1 a...

Frequently Asked Questions

How much does Claude API cost per million tokens in 2026?

Anthropic offers three Claude tiers: Claude Haiku 4.5 is the most affordable at $1.00/1M input and $5.00/1M output tokens. Claude Sonnet 4.6 sits in the mid-tier at $3.00/1M input and $15.00/1M output. Claude Opus 4.8 is the flagship at $5.00/1M input and $25.00/1M output. Use the calculator above to translate those rates into your expected monthly bill.

What is the cheapest LLM API in 2026?

The cheapest capable LLM APIs in 2026 come from both Chinese and Western providers. Doubao Seed 2.0 Mini (ByteDance) is the absolute lowest at $0.03/1M input tokens. OpenAI's open-source gpt-oss-120b costs $0.04/1M input. Several models sit at $0.05/1M input: Qwen3.5-Flash (Alibaba), GPT-5 Nano (OpenAI), and Llama 3.1 8B on Groq. Among mid-tier Western options, GPT-5.4 Nano ($0.20/1M input) and Gemini 2.5 Flash-Lite ($0.10/1M input) offer strong value for structured tasks.

How does Claude pricing compare to GPT-5 in 2026?

At the mid-tier, Claude Sonnet 4.6 ($3.00/$15.00 per 1M input/output) closely matches GPT-5.4 ($2.50/$15.00) — nearly identical on output, with GPT-5.4 slightly cheaper on input. At the budget tier, Claude Haiku 4.5 ($1.00/$5.00) is a bit more expensive than GPT-5.4 Mini ($0.75/$4.50). At the flagship tier, Claude Opus 4.8 ($5.00/$25.00) and GPT-5.5 ($5.00/$30.00) are similarly priced on input — Claude is cheaper on output-heavy tasks. Claude tends to excel at long-context and agentic workflows; OpenAI's reasoning models (o3, o4-mini) lead on math and code benchmarks. Use the compare pages for exact monthly estimates.

When is self-hosting an LLM cheaper than cloud APIs?

Self-hosting becomes cost-effective when your monthly cloud API spend exceeds the amortized GPU cost plus electricity. For most indie projects using under 100M tokens/month, cloud APIs are cheaper. At 500M+ tokens/month with a capable open-source model, a single RTX 4090 or A100 typically pays for itself within 6-12 months.

How accurate are these cost estimates?

Cloud API pricing is sourced from official provider pricing pages and updated periodically. GPU prices reflect current market rates and may vary. Rental prices are based on average rates from vast.ai, RunPod, and Lambda Labs. All estimates are for informational purposes only. Always verify with provider pricing before making purchasing decisions.

What does monthly tokens mean?

LLM APIs charge per token. As a rough English estimate, one token is about 0.75 words. For context, a simple chatbot with 1,000 active users making 10 messages per day might use 10-30M tokens per month depending on prompt size and response length.

Why are output tokens more expensive than input tokens?

Generating each output token requires repeated model computation, while input tokens can be processed more efficiently. As a result, output tokens often cost several times more than input tokens across major providers.

Can I run a 70B model on a single GPU?

A 70B model at full precision usually needs far more memory than a single consumer GPU provides. Quantized versions may fit on high-memory setups, but a single RTX 4090 is more comfortable for smaller 7B-13B class models and selected quantized workloads.

Pricing data is for estimation purposes only. Always verify costs directly with cloud providers before making decisions. Last verified: 2026-06-06.