AI Glossary

AI Glossary.

AI terms in English — short, concrete and jargon-free. Each entry is one definition ready to quote.

AI fundamentals

Artificial general intelligence (AGI)

Artificial general intelligence (AGI) is a hypothetical system that matches humans in breadth — capable of any intellectual task. Today's AI systems are narrow: very good at specific tasks, but not universal.

Artificial intelligence (AI)

Artificial intelligence (AI) is the field of computer science concerned with systems that perform tasks usually requiring human thought: recognizing patterns, making decisions, and generating content from data.

Computer vision

Computer vision is the field of artificial intelligence that teaches machines to understand images and video — to detect objects, classify scenes or read text — rather than treat them as raw pixels.

Deep learning

Deep learning is a subfield of machine learning in which multi-layer neural networks automatically extract increasingly complex features from data. It is the technical foundation of today's language and generative models.

Foundation model

A foundation model is a large model pre-trained on broad, non-specialized data that serves as a base for tuning to many different tasks instead of building a separate model for each one.

Generative AI

Generative AI is a class of models that create new content — text, image, sound, video, or code — from patterns learned in data, rather than merely classifying or predicting values for existing data.

Inference

Inference is the phase in which a trained model produces a result for new input — for example, answering a question or classifying an image. It happens without changing the parameters, unlike training.

Machine learning (ML)

Machine learning (ML) is a branch of artificial intelligence in which a model detects patterns in training data instead of following hand-written rules, and uses them to predict outcomes for new cases.

Natural language processing (NLP)

Natural language processing (NLP) is the field of AI concerned with how machines understand, analyze, and generate human language — from text classification and translation to holding a conversation. Large language models are its modern peak.

Neural network

A neural network is a machine-learning model built from layers of connected units (neurons) that progressively transform the input data and learn relationships by tuning the connection weights during training.

Training data

Training data is the set of examples a model learns patterns from during training. Its quality, quantity and representativeness directly determine how accurately the model performs on new data.

Models & architecture

Attention mechanism

The attention mechanism lets a model weigh which input tokens matter when generating each element of the output. It is the core of the transformer architecture and the foundation of today's language models.

Context window

The context window is the maximum number of tokens a model can process at once in a single request — including the instruction, attached data, and the response. Anything beyond this limit is dropped.

Diffusion model

A diffusion model generates images or video by learning to gradually remove noise from random data until a coherent result emerges. It is the architecture behind most of today's image generators.

LLM (large language model)

An LLM is a large language model trained on vast amounts of text that predicts the next token and, on that basis, generates answers, summaries or code in natural language.

Mixture of Experts (MoE)

Mixture of Experts is an architecture in which each token is routed to only a selected subset of specialized sub-networks (experts). It lets you grow a model's parameter count while keeping the compute cost per token lower.

Model distillation (knowledge distillation)

Model distillation is a technique for training a smaller model (the "student") to imitate a larger one (the "teacher"). It produces a model that is smaller and cheaper to run while retaining part of the original's quality.

Model parameters

Model parameters are the internal numerical values (weights) a model adjusts during training. They hold the learned knowledge, and their count is often given in billions.

Multimodality

Multimodality is a model's ability to process and combine different types of data — text, images, audio, or video — within a single request, instead of working with text alone.

Open model (open-weight / open-source)

An open model is one whose weights are released publicly so you can download it and run it yourself. It is the opposite of closed models, available only through a vendor's API.

Quantization

Quantization lowers the numerical precision of a model's weights (e.g. from 16 to 8 or 4 bits) to shrink its size and speed up inference. It comes at the cost of a small drop in quality.

RLHF

RLHF is a model fine-tuning method in which people rate its responses and the model learns to prefer the higher-rated ones. This makes it more helpful and more aligned with users' expectations.

Reasoning model

A reasoning model is a language model that, before answering, devotes extra compute to internal, step-by-step reasoning. It excels at tasks requiring multi-step logic, such as mathematics or programming.

Small language model (SLM)

A small language model (SLM) is a compact language model that can run on modest hardware or locally. In exchange for fewer parameters, it offers lower cost and greater control than large models.

Token

A token is the smallest unit of text a language model works on — usually a fragment of a word, a whole word, or a character. The model processes and generates text precisely as a sequence of tokens.

Tokenization

Tokenization is the process of splitting text into tokens — short fragments a language model can process. It's a preprocessing step that turns raw text into a sequence of the model's input units.

Transformer

A transformer is a neural network architecture built on the attention mechanism, which lets the model weigh the relationships between every token in a sequence. It is the foundation of today's large language models.

Agents & automation

AI agent

An AI agent is a system that, given a goal, plans its own steps, uses tools (search, APIs, code, and so on) and carries out tasks in a loop — rather than simply answering a single question.

AI assistant

An AI assistant is an application built on a language model that holds a conversation, answers questions and helps with tasks — often with access to company data or tools through tool use.

AI copilot (built-in assistant)

An AI copilot is an assistive mode of work built into a tool (a code editor, a spreadsheet, a word processor) that suggests in real time as the user performs a task — the model usually runs as a remote service, and the copilot only suggests the next step, because the person is still leading the work.

Agent memory

Agent memory is the mechanism by which an AI agent retains information across steps and sessions — from the short-term context of the current task to long-term knowledge stored outside the context window.

Agent orchestration

Agent orchestration is the coordination of multiple AI agents so they pursue a single goal together — with tasks divided, results handed off, and shared control points.

Agent planning

Agent planning is an AI agent's ability to break a goal into ordered steps and choose its tools before acting, then revise the plan along the way, instead of reacting blindly one step at a time.

Agentic AI

Agentic AI is a paradigm in which AI systems independently plan and carry out multi-step tasks toward a goal, using tools and evaluating results — rather than just responding to single queries.

Agentic workflow

An agentic workflow is a process in which an AI agent carries out a task in defined steps — planning, using tools, checking the result and adjusting its approach — instead of returning a single answer.

Coding agent

A coding agent is an AI agent that writes, runs and fixes code on its own, working in a loop with a developer's tools — it reads files, runs commands, reads errors and applies fixes until it reaches the goal.

Function calling

Function calling is a mechanism that lets a language model invoke a defined function or API by generating structured arguments that conform to its schema. It is the technical basis for tool use by a model.

MCP (Model Context Protocol)

MCP is an open protocol that connects language models to external tools, data and services in a standardized way — through a single interface instead of a separate integration for every application.

Multi-agent system

A multi-agent system is a setup in which several specialized AI agents collaborate on a single task, splitting roles and handing results to one another, rather than relying on one general-purpose agent.

Tool use

Tool use is a language model's ability to call external tools — a search engine, an API, a calculator or code — when generating text alone isn't enough to complete the task.

Data & retrieval

Chunking

Chunking is the splitting of documents into smaller pieces before they're turned into embeddings, so that the model receives coherent, relevant chunks of text — a key data-preparation step for RAG.

Company Knowledge File (CKF)

A Company Knowledge File (CKF) is an organized, versioned body of a company's knowledge: data, a glossary of terms, sources and an audit trail in one portable package. It gives AI agents consistent context, and the organization a change history and documentation that can actually be maintained.

Data labeling

Data labeling is the practice of attaching labels or annotations to raw data to describe the correct answer, so the data can train or evaluate a model. It is the basis of supervised learning and reliable evaluation.

Data pipeline

A data pipeline is an ordered sequence of steps through which data flows from its source, via ingestion, cleaning, and processing, all the way to the model or the database powering RAG. Each stage passes its result to the next, making the flow repeatable.

Embedding

An embedding is a numerical representation of text (or an image) as a vector, where closeness between vectors signals similarity in meaning — the foundation of semantic search and RAG systems.

Hybrid search

Hybrid search combines vector (semantic) search with keyword matching, so it captures both the intent of a query and the exact terms at once. Results from both methods are merged and often ordered by reranking.

Knowledge graph

A knowledge graph is an organized network of entities (people, products, documents) and the relationships between them. It can ground a model's answers — either complementing or replacing vector search.

RAG (Retrieval-Augmented Generation)

RAG is a technique in which a language model searches for relevant document passages before answering and grounds its generation in them — so it responds based on specific sources rather than memory alone.

Reranking

Reranking is a second retrieval stage in which a separate model reorders the initial results by actual relevance before the best ones reach the language model. It improves answer quality in RAG.

Semantic search

Semantic search matches documents to a query by meaning rather than by exact words — it uses embeddings and a vector database, so it finds relevant content even when the wording differs.

Synthetic data

Synthetic data is artificially generated examples, used to train or evaluate models when real data is scarce or sensitive. It needs quality control, because it can reproduce and amplify the flaws of its source.

Vector database

A vector database is a system that stores embeddings and quickly finds the vectors closest to a query by semantic similarity — the foundation of semantic search and RAG systems.

Practice & quality

AI benchmark

An AI benchmark is a standardized set of tasks for comparing models on a single scale — for example in reasoning or programming. The scores can be inflated and don't always reflect real-world use.

Chain of thought

Chain of thought is a technique in which a model works through a solution step by step before giving an answer. It helps with multi-step tasks such as arithmetic or logic.

Context engineering

Context engineering is the practice of selecting, ordering and trimming everything that enters a model's context window — instructions, data, conversation history and tool outputs — so the model has exactly what it needs for the task and nothing more.

Extended thinking (reasoning effort)

Extended thinking is a mode in which a model generates internal reasoning before giving its final answer. It trades higher latency and token usage for greater accuracy on hard tasks.

Few-shot (learning from a few examples)

Few-shot is a technique in which you show the model a few examples of correct answers inside the prompt itself, steering its behavior without any training. It works within a single request.

Fine-tuning

Fine-tuning is the further training of a ready-made model on your own set of examples, so it handles a specific task or style better. It changes the model's weights, unlike prompting.

Hallucination

A hallucination is when a language model produces an answer that sounds credible but does not match the facts or the sources. It stems from how the model works, not from a malfunction.

In-context learning

In-context learning is a model's ability to adapt to a task from the instructions and examples in the prompt alone, without updating its weights — the effect disappears once the conversation ends.

LLM-as-a-judge

LLM-as-a-judge uses a language model to score another model's answers against defined criteria. It is faster and cheaper than human evaluation, but carries its own errors and biases.

Model evaluation

Model evaluation is the systematic measurement of answer quality on a fixed set of cases and metrics. It lets you compare versions and catch regressions instead of judging by gut feel.

Overfitting

Overfitting is when a model memorizes its training data instead of learning general patterns, so it performs excellently on familiar data but poorly on new, previously unseen examples.

Prompt chaining

Prompt chaining breaks a task into a sequence of prompting steps, where the output of one prompt feeds the next. It lets you split a complex process into smaller stages that are easier to control.

Prompt engineering

Prompt engineering is the practice of phrasing instructions for a language model so as to get accurate, repeatable responses. It covers the choice of instructions, examples, and output format.

System prompt

A system prompt is a fixed instruction set before a conversation that defines the model's role, rules, and boundaries. It holds for the entire session, regardless of the user's subsequent questions.

Temperature (generation parameter)

Temperature is a generation parameter that controls the randomness of a model's responses. A low value gives more predictable, repeatable results; a high one gives more varied and less predictable output.

Zero-shot (no examples)

Zero-shot is a way of prompting in which you ask the model to perform a task without showing a single worked example — you rely solely on the instruction itself and the model's knowledge from training.