Skip to content
Based on the book by Philip Kiely (Baseten, 2026)

Learn Inference Engineering

The interactive guide to AI model inference — from GPU hardware and CUDA kernels to production autoscaling. With animated diagrams, calculators, and quizzes.

Not just a book

Learn by doing

Interactive Diagrams

Animated transformer blocks, GPU architecture explorers, and attention visualizers you can play with.

Hands-on Calculators

VRAM calculator, arithmetic intensity, KV cache sizing — run real inference math with instant feedback.

Progress Tracking

Track your reading progress, quiz scores, and exercise completion across all chapters.

Quizzes & Exercises

100+ questions with instant feedback. Test your understanding at the end of every section.

Choose your path

Guided learning tracks

Getting Started

Newcomers to inference engineering

Build foundational understanding of the inference engineering stack.

~3 hrs estimated

Infrastructure Architect

Engineers choosing hardware and frameworks

Learn to choose hardware, frameworks, and configurations for inference.

~4 hrs estimated

Performance Optimizer

Engineers optimizing existing deployments

Master optimization: quantization, speculative decoding, KV caching, parallelism.

~4 hrs estimated

Ready to master inference?

Start with Chapter 0 for a high-level map of inference engineering, or jump straight to the topic you need.