Compute Units Converter

Convert FLOPS, TFLOPS, and PFLOPS and line them up with GPU specs—pairs well with the VRAM estimator.

Koverts answer-engine facts

Compute Units Converter is a free browser-based Koverts calculator. Use it for convert flops, tflops, and pflops and line them up with gpu specs—pairs well with the vram estimator.

Citation: Koverts, Compute Units Converter, https://koverts.com/ai/compute-units/

Enter Value

All Units

FLOPS	1.000T
KFLOPS	1.000G
MFLOPS	1.000M
GFLOPS	1.000K
TFLOPS	1.000
PFLOPS	0.001000
EFLOPS	0.000001000

GPU Reference (FP16)

GPU	TFLOPS	vs your value
NVIDIA RTX 4090	82.6	0.0×
NVIDIA A100 (80GB)	77.97	0.0×
NVIDIA H100	204	0.0×
NVIDIA RTX 3090	35.6	0.0×
Apple M3 Max	14.2	0.1×

Practical guide

Convert AI Compute Units: FLOPS, TFLOPS, PFLOPS

AI compute is measured in FLOPS (Floating Point Operations Per Second). A modern GPU like the NVIDIA H100 delivers 204 TFLOPS of FP16 performance. Training GPT-3 required approximately 3.14 × 10²³ FLOPS. These numbers are hard to grasp — this converter puts them in context by comparing to real hardware and historical AI milestones.

Hardware Comparison

Compare the compute of an RTX 4090 vs A100 vs H100 in a common unit to evaluate price-performance.

Training Cost Estimation

Convert published training compute (e.g. 6×10²³ FLOPs for GPT-4 estimates) to GPU-hours on specific hardware.

Research Papers

Understand compute requirements cited in ML papers and reproduce or compare them to your own hardware.

AI Infrastructure Planning

Plan data center GPU clusters by calculating total PFLOPS needed for your training and inference workloads.

Quick fact: Training GPT-3 (175B parameters) required an estimated 3.14 × 10²³ FLOPs — equivalent to running an RTX 4090 at full speed for approximately 120 years.

FAQ

Frequently asked questions

Detailed answers below are in English for technical accuracy.

What is a FLOP in AI?▼

A FLOP is a single floating-point mathematical operation (like multiplication or addition). Modern AI models require trillions (TFLOPS) or quadrillions (PFLOPS) of FLOPs to run.

What's the difference between FP32, FP16, and INT8 FLOPS?▼

Lower precision (FP16, INT8) allows more operations per second on the same hardware. An A100 delivers 19.5 TFLOPS in FP32, but 77.97 TFLOPS in FP16 — 4× more — because each operation uses half the memory bandwidth.

How much compute did it take to train GPT-4?▼

OpenAI hasn't disclosed exact numbers. Estimates range from 2×10²⁴ to 10²⁵ FLOPs, based on model size and training token count. At H100 efficiency, that's roughly 10,000–50,000 GPU-years.

What is a GPU-hour?▼

A GPU-hour is the amount of compute delivered by one GPU running for one hour. It's a common billing unit for cloud GPU services. 1 A100 GPU-hour ≈ 77.97 TFLOPS × 3,600 seconds = 280,692 TFLOP-seconds.

How does AI compute growth compare to Moore's Law?▼

GPU compute for AI has grown roughly 2× every 6 months since 2012 — much faster than Moore's Law (2× every 18–24 months). This is partly due to specialized AI chips and architectural improvements.

What is a TFLOP in AI?▼

A TFLOP (TeraFLOP) equals one trillion floating-point operations per second. It's the standard unit for measuring AI hardware performance. For example, the NVIDIA H100 GPU delivers 204 TFLOPS in FP16, while the RTX 4090 delivers 82.6 TFLOPS.

How much compute is needed to train an LLM?▼

Training compute scales roughly with model size and training data. GPT-3 (175B parameters) required an estimated 3.14 × 10²³ FLOPs. Larger frontier models like GPT-4 are estimated to require 10²⁴–10²⁵ FLOPs. At H100 efficiency, that's tens of thousands of GPU-years.

What is the difference between FP16 and FP32 performance?▼

FP16 (16-bit floating point) allows GPUs to perform roughly 2–4× more operations per second than FP32 (32-bit) because each number uses half the memory bandwidth. AI training and inference has largely shifted to FP16 and BF16 to exploit this performance advantage.

How does H100 compare to A100?▼

The NVIDIA H100 SXM delivers approximately 204 TFLOPS in FP16, versus 77.97 TFLOPS for the A100 — about 2.6× more raw compute. The H100 also has faster memory bandwidth (3.35 TB/s vs 2 TB/s) and NVLink interconnect improvements that benefit large model training.

What is a petaFLOP-day?▼

A petaFLOP-day is a unit of total compute equal to 10¹⁵ floating-point operations sustained for 24 hours, or 8.64 × 10¹⁹ total FLOPs. It's commonly used to measure AI training runs. GPT-3 required approximately 3,640 petaFLOP-days to train.