LLM Cost Calculator

LLM Cost Calculator

Compare the monthly cost of cloud APIs vs self-hosting on your own GPU. Enter your token usage and find your break-even point.

Covers OpenAI, Anthropic, Google, Groq, Together AI, DeepSeek — and RTX 4090, A100, H100 local setups. Pricing verified 2026-05-20.

Token Usage

Cloud API

Local Setup

Monthly Cost

Cloud API recommended
Cloud API
$44.00
per month
Self-hosted
$94.44
per month
Cloud API saves $50.44/month

Cost Breakdown

Input tokens (10M @ $2/1M)$20.00
Output tokens (3M @ $8/1M)$24.00
Cloud total$44.00
GPU amortization (NVIDIA RTX 4090 ÷ 24mo)$75.00
Electricity (450W × 12h/day)$19.44
Local total$94.44
Rent NVIDIA RTX 4090
via RunPod — from $0.74/hr
Get Started →

Model Cost Comparisons

Pricing verified 2026-05-20

Cloud LLM API Pricing

Frequently Asked Questions

When is self-hosting an LLM cheaper than cloud APIs?

Self-hosting becomes cost-effective when your monthly cloud API spend exceeds the amortized GPU cost plus electricity. For most indie projects using under 100M tokens/month, cloud APIs are cheaper. At 500M+ tokens/month with a capable open-source model, a single RTX 4090 or A100 typically pays for itself within 6-12 months.

How accurate are these cost estimates?

Cloud API pricing is sourced from official provider pricing pages and updated quarterly. GPU prices reflect current market rates and may vary. Rental prices are based on average rates from vast.ai, RunPod, and Lambda Labs. All estimates are for informational purposes only — always verify with provider pricing before making purchasing decisions.

What does 'monthly tokens' mean?

LLM APIs charge per token — roughly 0.75 words per token. 1M tokens ≈ 750,000 words or about 1,500 pages of text. For context: a simple chatbot with 1,000 active users making 10 messages/day might use 10-30M tokens per month.

Why are output tokens more expensive than input tokens?

Generating each output token requires a full forward pass through the model, which is computationally intensive. Reading input tokens can be processed in parallel. As a result, output tokens typically cost 3-10x more per token than input tokens across most providers.

Can I run a 70B model on a single GPU?

Running Llama 3 70B at full precision requires ~140GB VRAM. In 4-bit quantization (GGUF format), it fits in ~40GB VRAM. This requires an A100 80G or two RTX 3090s / RTX 4090s. A single RTX 4090 (24GB) can run 7B-13B models comfortably.

Pricing data is for estimation purposes only. Always verify costs directly with cloud providers before making decisions. Last verified: 2026-05-20.