What is Diffusion model?

AI Glossary

Diffusion model

diffusion model, diffusion-based model

A diffusion model generates images or video by learning to gradually remove noise from random data until a coherent result emerges. It is the architecture behind most of today's image generators.

Builds an image by iteratively removing noise from a random starting point.
Trained on pairs: a noised image and the predicted amount of noise to remove.
Forms the basis of most modern image and video generators.

A diffusion model is a type of generative AI model that is created by teaching a neural network to reverse a noising process. During training, random noise is added to images step by step, and the model learns to undo that process. To generate, the model starts from pure noise and removes it bit by bit until an image matching the description emerges.

Unlike language models, which predict the next tokens of text, a diffusion model works on visual data and operates iteratively — a typical generation takes anywhere from a few to a few dozen denoising steps. It is often paired with a model that understands text, yielding a multimodal system: a verbal description steers what appears in the image.

From a deployment standpoint, the diffusion model is the standard today for generating marketing graphics, product visualizations and video material. It does have limits, though: its iterative nature makes generation computationally expensive at times, and control over details (text within an image, precise composition) requires careful prompting and can be unreliable.

Related terms