AI Glossary
AI Glossary.
AI terms in English — short, concrete and jargon-free. Each entry is one definition ready to quote.
AI fundamentals
Artificial general intelligence (AGI)
Artificial general intelligence (AGI) is a hypothetical system that matches humans in breadth — capable of any intellectual task. Today's AI systems are narrow: very good at specific tasks, but not universal.
Artificial intelligence (AI)
Artificial intelligence (AI) is the field of computer science concerned with systems that perform tasks usually requiring human thought: recognizing patterns, making decisions, and generating content from data.
Computer vision
Computer vision is the field of artificial intelligence that teaches machines to understand images and video — to detect objects, classify scenes or read text — rather than treat them as raw pixels.
Deep learning
Deep learning is a subfield of machine learning in which multi-layer neural networks automatically extract increasingly complex features from data. It is the technical foundation of today's language and generative models.
Foundation model
A foundation model is a large model pre-trained on broad, non-specialized data that serves as a base for tuning to many different tasks instead of building a separate model for each one.
Generative AI
Generative AI is a class of models that create new content — text, image, sound, video, or code — from patterns learned in data, rather than merely classifying or predicting values for existing data.
Inference
Inference is the phase in which a trained model produces a result for new input — for example, answering a question or classifying an image. It happens without changing the parameters, unlike training.
Machine learning (ML)
Machine learning (ML) is a branch of artificial intelligence in which a model detects patterns in training data instead of following hand-written rules, and uses them to predict outcomes for new cases.
Natural language processing (NLP)
Natural language processing (NLP) is the field of AI concerned with how machines understand, analyze, and generate human language — from text classification and translation to holding a conversation. Large language models are its modern peak.
Neural network
A neural network is a machine-learning model built from layers of connected units (neurons) that progressively transform the input data and learn relationships by tuning the connection weights during training.
Training data
Training data is the set of examples a model learns patterns from during training. Its quality, quantity and representativeness directly determine how accurately the model performs on new data.
Models & architecture
Attention mechanism
The attention mechanism lets a model weigh which input tokens matter when generating each element of the output. It is the core of the transformer architecture and the foundation of today's language models.
Context window
The context window is the maximum number of tokens a model can process at once in a single request — including the instruction, attached data, and the response. Anything beyond this limit is dropped.
Diffusion model
A diffusion model generates images or video by learning to gradually remove noise from random data until a coherent result emerges. It is the architecture behind most of today's image generators.
LLM (large language model)
An LLM is a large language model trained on vast amounts of text that predicts the next token and, on that basis, generates answers, summaries or code in natural language.
Mixture of Experts (MoE)
Mixture of Experts is an architecture in which each token is routed to only a selected subset of specialized sub-networks (experts). It lets you grow a model's parameter count while keeping the compute cost per token lower.
Model distillation (knowledge distillation)
Model distillation is a technique for training a smaller model (the "student") to imitate a larger one (the "teacher"). It produces a model that is smaller and cheaper to run while retaining part of the original's quality.
Model parameters
Model parameters are the internal numerical values (weights) a model adjusts during training. They hold the learned knowledge, and their count is often given in billions.
Multimodality
Multimodality is a model's ability to process and combine different types of data — text, images, audio, or video — within a single request, instead of working with text alone.
Open model (open-weight / open-source)
An open model is one whose weights are released publicly so you can download it and run it yourself. It is the opposite of closed models, available only through a vendor's API.
Quantization
Quantization lowers the numerical precision of a model's weights (e.g. from 16 to 8 or 4 bits) to shrink its size and speed up inference. It comes at the cost of a small drop in quality.
RLHF
RLHF is a model fine-tuning method in which people rate its responses and the model learns to prefer the higher-rated ones. This makes it more helpful and more aligned with users' expectations.
Reasoning model
A reasoning model is a language model that, before answering, devotes extra compute to internal, step-by-step reasoning. It excels at tasks requiring multi-step logic, such as mathematics or programming.
Small language model (SLM)
A small language model (SLM) is a compact language model that can run on modest hardware or locally. In exchange for fewer parameters, it offers lower cost and greater control than large models.
Token
A token is the smallest unit of text a language model works on — usually a fragment of a word, a whole word, or a character. The model processes and generates text precisely as a sequence of tokens.
Tokenization
Tokenization is the process of splitting text into tokens — short fragments a language model can process. It's a preprocessing step that turns raw text into a sequence of the model's input units.
Transformer
A transformer is a neural network architecture built on the attention mechanism, which lets the model weigh the relationships between every token in a sequence. It is the foundation of today's large language models.
Agents & automation
AI agent
An AI agent is a system that, given a goal, plans its own steps, uses tools (search, APIs, code, and so on) and carries out tasks in a loop — rather than simply answering a single question.
AI assistant
An AI assistant is an application built on a language model that holds a conversation, answers questions and helps with tasks — often with access to company data or tools through tool use.
AI copilot (built-in assistant)
An AI copilot is an assistive mode of work built into a tool (a code editor, a spreadsheet, a word processor) that suggests in real time as the user performs a task — the model usually runs as a remote service, and the copilot only suggests the next step, because the person is still leading the work.
Agent memory
Agent memory is the mechanism by which an AI agent retains information across steps and sessions — from the short-term context of the current task to long-term knowledge stored outside the context window.
Agent orchestration
Agent orchestration is the coordination of multiple AI agents so they pursue a single goal together — with tasks divided, results handed off, and shared control points.
Agent planning
Agent planning is an AI agent's ability to break a goal into ordered steps and choose its tools before acting, then revise the plan along the way, instead of reacting blindly one step at a time.
Agentic AI
Agentic AI is a paradigm in which AI systems independently plan and carry out multi-step tasks toward a goal, using tools and evaluating results — rather than just responding to single queries.
Agentic workflow
An agentic workflow is a process in which an AI agent carries out a task in defined steps — planning, using tools, checking the result and adjusting its approach — instead of returning a single answer.
Coding agent
A coding agent is an AI agent that writes, runs and fixes code on its own, working in a loop with a developer's tools — it reads files, runs commands, reads errors and applies fixes until it reaches the goal.
Function calling
Function calling is a mechanism that lets a language model invoke a defined function or API by generating structured arguments that conform to its schema. It is the technical basis for tool use by a model.
MCP (Model Context Protocol)
MCP is an open protocol that connects language models to external tools, data and services in a standardized way — through a single interface instead of a separate integration for every application.
Multi-agent system
A multi-agent system is a setup in which several specialized AI agents collaborate on a single task, splitting roles and handing results to one another, rather than relying on one general-purpose agent.
Tool use
Tool use is a language model's ability to call external tools — a search engine, an API, a calculator or code — when generating text alone isn't enough to complete the task.
Data & retrieval
Chunking
Chunking is the splitting of documents into smaller pieces before they're turned into embeddings, so that the model receives coherent, relevant chunks of text — a key data-preparation step for RAG.
Company Knowledge File (CKF)
A Company Knowledge File (CKF) is an organized, versioned body of a company's knowledge: data, a glossary of terms, sources and an audit trail in one portable package. It gives AI agents consistent context, and the organization a change history and documentation that can actually be maintained.
Data labeling
Data labeling is the practice of attaching labels or annotations to raw data to describe the correct answer, so the data can train or evaluate a model. It is the basis of supervised learning and reliable evaluation.
Data pipeline
A data pipeline is an ordered sequence of steps through which data flows from its source, via ingestion, cleaning, and processing, all the way to the model or the database powering RAG. Each stage passes its result to the next, making the flow repeatable.
Embedding
An embedding is a numerical representation of text (or an image) as a vector, where closeness between vectors signals similarity in meaning — the foundation of semantic search and RAG systems.
Hybrid search
Hybrid search combines vector (semantic) search with keyword matching, so it captures both the intent of a query and the exact terms at once. Results from both methods are merged and often ordered by reranking.
Knowledge graph
A knowledge graph is an organized network of entities (people, products, documents) and the relationships between them. It can ground a model's answers — either complementing or replacing vector search.
RAG (Retrieval-Augmented Generation)
RAG is a technique in which a language model searches for relevant document passages before answering and grounds its generation in them — so it responds based on specific sources rather than memory alone.
Reranking
Reranking is a second retrieval stage in which a separate model reorders the initial results by actual relevance before the best ones reach the language model. It improves answer quality in RAG.
Semantic search
Semantic search matches documents to a query by meaning rather than by exact words — it uses embeddings and a vector database, so it finds relevant content even when the wording differs.
Synthetic data
Synthetic data is artificially generated examples, used to train or evaluate models when real data is scarce or sensitive. It needs quality control, because it can reproduce and amplify the flaws of its source.
Vector database
A vector database is a system that stores embeddings and quickly finds the vectors closest to a query by semantic similarity — the foundation of semantic search and RAG systems.
Practice & quality
AI benchmark
An AI benchmark is a standardized set of tasks for comparing models on a single scale — for example in reasoning or programming. The scores can be inflated and don't always reflect real-world use.
Chain of thought
Chain of thought is a technique in which a model works through a solution step by step before giving an answer. It helps with multi-step tasks such as arithmetic or logic.
Context engineering
Context engineering is the practice of selecting, ordering and trimming everything that enters a model's context window — instructions, data, conversation history and tool outputs — so the model has exactly what it needs for the task and nothing more.
Extended thinking (reasoning effort)
Extended thinking is a mode in which a model generates internal reasoning before giving its final answer. It trades higher latency and token usage for greater accuracy on hard tasks.
Few-shot (learning from a few examples)
Few-shot is a technique in which you show the model a few examples of correct answers inside the prompt itself, steering its behavior without any training. It works within a single request.
Fine-tuning
Fine-tuning is the further training of a ready-made model on your own set of examples, so it handles a specific task or style better. It changes the model's weights, unlike prompting.
Hallucination
A hallucination is when a language model produces an answer that sounds credible but does not match the facts or the sources. It stems from how the model works, not from a malfunction.
In-context learning
In-context learning is a model's ability to adapt to a task from the instructions and examples in the prompt alone, without updating its weights — the effect disappears once the conversation ends.
LLM-as-a-judge
LLM-as-a-judge uses a language model to score another model's answers against defined criteria. It is faster and cheaper than human evaluation, but carries its own errors and biases.
Model evaluation
Model evaluation is the systematic measurement of answer quality on a fixed set of cases and metrics. It lets you compare versions and catch regressions instead of judging by gut feel.
Overfitting
Overfitting is when a model memorizes its training data instead of learning general patterns, so it performs excellently on familiar data but poorly on new, previously unseen examples.
Prompt chaining
Prompt chaining breaks a task into a sequence of prompting steps, where the output of one prompt feeds the next. It lets you split a complex process into smaller stages that are easier to control.
Prompt engineering
Prompt engineering is the practice of phrasing instructions for a language model so as to get accurate, repeatable responses. It covers the choice of instructions, examples, and output format.
System prompt
A system prompt is a fixed instruction set before a conversation that defines the model's role, rules, and boundaries. It holds for the entire session, regardless of the user's subsequent questions.
Temperature (generation parameter)
Temperature is a generation parameter that controls the randomness of a model's responses. A low value gives more predictable, repeatable results; a high one gives more varied and less predictable output.
Zero-shot (no examples)
Zero-shot is a way of prompting in which you ask the model to perform a task without showing a single worked example — you rely solely on the instruction itself and the model's knowledge from training.
Safety & oversight
AI audit
An AI audit is a structured review of an AI system: its data, behavior, risks and compliance with the rules in place. It checks whether the model does what was claimed and produces evidence for oversight purposes.
AI explainability (XAI)
AI explainability is the ability to explain why a model produced a particular result. It serves to build trust, diagnose errors and demonstrate compliance with oversight requirements.
AI governance
AI governance is the set of rules, roles, and processes that determine how an organization deploys and controls AI systems. It covers accountability, permitted uses, risk assessment, and the requirement for oversight.
AI observability
AI observability is the continuous monitoring of how an AI system behaves in production: cost, response quality, errors, and latency. It lets you detect model degradation and react before users feel it.
AI red teaming
AI red teaming is deliberately adversarial testing of a system, meant to find its weak points, safeguard bypasses and harmful outputs before it reaches users.
Data privacy in AI
Data privacy in AI is the set of rules and measures protecting personal and confidential data at every stage of working with a model: in training, in queries, and in responses. It defines what may be passed to the model and how long it is retained.
EU AI Act
The EU AI Act is a European Union regulation that classifies AI systems by their level of risk and imposes obligations matched to that category. It applies to AI used in the European Union.
Guardrails
Guardrails are rules and filters that constrain what a model may accept as input and return as output. They block disallowed content, enforce the answer's format, and keep behavior within set limits.
Human-in-the-loop
Human-in-the-loop is a way of working in which a person approves, corrects, or blocks the model's action at chosen points before the result moves on. It pairs the speed of AI with oversight on higher-risk decisions.
Jailbreak (bypassing safeguards)
A jailbreak is a prompt crafted to get around a model's rules and safeguards and coax it into answers it would normally refuse. The attacker manipulates the instruction itself, not the data the model is processing.
Model bias
Model bias is a systematic skew in an AI's outputs, inherited from the training data or design assumptions. It leads to unfair or wrong outcomes for some groups or cases.
Prompt injection
Prompt injection is an attack in which hidden instructions in the input hijack a model's behavior and coax it into breaking its own rules. It is especially dangerous for agents that read content from external sources.
Shadow AI
Shadow AI is the use of AI tools in a company outside the knowledge and control of IT and security teams. It creates risks of data leakage, compliance breaches, and a lack of oversight over what reaches external models.