Learn Inference Engineering
The interactive guide to AI model inference — from GPU hardware and CUDA kernels to production autoscaling. With animated diagrams, calculators, and quizzes.
Not just a book
Learn by doing
Interactive Diagrams
Animated transformer blocks, GPU architecture explorers, and attention visualizers you can play with.
Hands-on Calculators
VRAM calculator, arithmetic intensity, KV cache sizing — run real inference math with instant feedback.
Progress Tracking
Track your reading progress, quiz scores, and exercise completion across all chapters.
Quizzes & Exercises
100+ questions with instant feedback. Test your understanding at the end of every section.
Choose your path
Guided learning tracks
8 Chapters
The complete inference stack
From prerequisites and model architecture through hardware, software, optimization techniques, multimodal inference, and production deployment.
Ready to master inference?
Start with Chapter 0 for a high-level map of inference engineering, or jump straight to the topic you need.