.glossary-page { max-width: 900px; margin: 0 auto; font-family: -apple-system, BlinkMacSystemFont, ‚Segoe UI‘, Roboto, sans-serif; }
.glossary-page h1 { color: #1a1a2e; border-bottom: 3px solid #16213e; padding-bottom: 10px; }
.glossary-page h2 { color: #16213e; margin-top: 30px; background: #f0f4ff; padding: 10px 15px; border-left: 4px solid #16213e; }
.glossary-page dl { margin: 0 0 20px 0; }
.glossary-page dt { font-weight: 700; color: #0f3460; margin-top: 14px; font-size: 1.05em; }
.glossary-page dd { margin: 4px 0 0 20px; color: #333; line-height: 1.6; }
#glossarySearch { width: 100%; padding: 12px 16px; font-size: 16px; border: 2px solid #16213e; border-radius: 8px; margin-bottom: 24px; box-sizing: border-box; }
AI Glossary: 100+ Terms Every Practitioner Should Know
Reviewed: June 4, 2026
Comprehensive reference for AI, ML, and data science terminology — organized by category. Last updated: May 2026.
Foundation Models
- Foundation Model
- A large-scale AI model trained on broad data that can be adapted to a wide range of downstream tasks. Examples: GPT-4, Claude, Gemini, LLaMA.
- Large Language Model (LLM)
- A neural network trained on vast text corpora to understand and generate human language using transformer architectures.
- Transformer
- The neural network architecture powering modern LLMs, introduced in „Attention Is All You Need“ (2017). Uses self-attention to process sequences in parallel.
- Attention Mechanism
- A component that weighs the importance of different parts of the input when producing output, enabling the model to focus on relevant context.
- Self-Attention
- Attention where the query, key, and value all come from the same sequence, allowing each position to attend to all other positions.
- Multi-Head Attention
- Running multiple attention mechanisms in parallel, each learning different types of relationships in the data.
- Token
- The basic unit of text processed by an LLM — a word, subword, or character. English averages roughly 1.3 tokens per word.
- Context Window
- The maximum tokens an LLM can consider at once when generating a response. Ranges from 4K to 1M+ in modern models.
- Temperature
- A parameter controlling output randomness. Lower values (0.1-0.3) produce focused, deterministic responses; higher values (0.7-1.5) produce creative, diverse outputs.
- Top-p (Nucleus Sampling)
- Sampling from the smallest set of tokens whose cumulative probability exceeds p, balancing diversity and quality.
- Top-k Sampling
- Sampling from only the k most likely next tokens, filtering out low-probability options.
- Prompt Engineering
- The practice of crafting input text to elicit desired LLM outputs. Includes few-shot examples, role specification, and structured formatting.
- Chain of Thought (CoT)
- A prompting technique that asks the model to reason step-by-step before giving a final answer, improving performance on complex reasoning tasks.
- Few-Shot Learning
- Teaching a model to perform a task by providing a small number of examples in the prompt, without updating model weights.
- In-Context Learning
- The ability of LLMs to learn patterns from examples provided within the prompt itself, without any weight updates.
- Fine-tuning
- Further training a pre-trained model on a specific dataset to adapt it for particular tasks or domains.
- Pre-training
- The initial phase where a model learns general patterns from a large, diverse dataset before being fine-tuned.
- RLHF (Reinforcement Learning from Human Feedback)
- Training approach that uses human preference data to align model outputs with human values and preferences.
- DPO (Direct Preference Optimization)
- An alignment technique that directly optimizes a model on preference data without training a separate reward model.
- LoRA (Low-Rank Adaptation)
- A parameter-efficient fine-tuning method that adds small trainable matrices to frozen pre-trained weights.
- QLoRA
- Quantized LoRA — combines 4-bit quantization with LoRA for efficient fine-tuning on consumer hardware.
- Mixture of Experts (MoE)
- An architecture where different parts of the network (experts) handle different types of inputs, enabling large models with lower compute costs.
- Emergent Ability
- Capabilities that unpredictably arise in large models that were absent in smaller versions, such as multi-step reasoning.
Natural Language Processing
- NLP
- Natural Language Processing — the field focused on enabling computers to understand, interpret, and generate human language.
- Named Entity Recognition (NER)
- Identifying and classifying entities (people, organizations, locations, dates) in unstructured text.
- Sentiment Analysis
- Determining the emotional tone expressed in text: positive, negative, or neutral.
- Text Classification
- Machine Translation
- Automatically translating text from one language to another using AI models.
- Text Summarization
- Question Answering
- Systems that automatically answer questions posed in natural language from knowledge bases or open-domain sources.
- RAG (Retrieval-Augmented Generation)
- Enhancing LLM responses by retrieving relevant information from external knowledge sources before generating an answer.
- Word Embedding
- Word2Vec
- BERT
- GPT
- Seq2Seq
- BLEU Score
- ROUGE Score
- Perplexity
dd>Categorizing text into predefined classes based on content, such as spam detection or topic labeling.
dd>Generating a concise summary of a longer text while preserving key information and meaning.
dd>Dense vector representations of words that capture semantic relationships, where similar words have similar vectors.
dd>An early word embedding algorithm that learns vector representations from word context patterns.
dd>Bidirectional Encoder Representations from Transformers — reads text in both directions for deeper context understanding.
dd>Generative Pre-trained Transformer — an autoregressive model that predicts the next token in a sequence.
dd>Sequence-to-sequence architecture that maps an input sequence to an output sequence, used in translation and summarization.
dd>A metric for evaluating machine translation quality by comparing output to reference translations.
dd>A metric for evaluating text summarization by measuring overlap between generated and reference summaries.
dd>A measure of how well a language model predicts text. Lower perplexity indicates better prediction quality.
Computer Vision
- Computer Vision
- CNN
- Object Detection
- Image Segmentation
- Semantic Segmentation
- Instance Segmentation
- OCR
- GAN
- Diffusion Model
- A generative model that learns to reverse a gradual noise-adding process to create realistic data from random noise.
- Vision Transformer (ViT)
- CLIP
- Stable Diffusion
- Image Classification
dd>The field of AI focused on enabling computers to interpret and understand visual information from images and video.
dd>Convolutional Neural Network — uses convolutional filters to detect visual features at different scales and locations.
dd>Identifying and localizing objects within an image by drawing bounding boxes and classifying each one.
dd>Partitioning an image into meaningful segments or regions at the semantic or instance level.
dd>Classifying each pixel in an image by category without distinguishing between individual object instances.
dd>Identifying and segmenting each individual object instance in an image at the pixel level.
dd>Optical Character Recognition — converting images of text into machine-readable character data.
dd>Generative Adversarial Network — a generator creates samples while a discriminator evaluates their realism.
dd>Applying transformer architecture to image patches for image classification and other vision tasks.
dd>Contrastive Language-Image Pre-training — OpenAI’s model connecting vision and language for zero-shot image understanding.
dd>A latent diffusion model for text-to-image generation that produces high-quality images from text descriptions.
dd>Assigning a category label to an entire image based on its visual content.
Reinforcement Learning
- Reinforcement Learning (RL)
- Agent (RL)
- Environment (RL)
- State
- Action
- Reward
- Policy
- Q-Learning
- Deep Q-Network (DQN)
- Policy Gradient
- Actor-Critic
- PPO (Proximal Policy Optimization)
- Reward Modeling
- Reward Hacking
dd>Learning through trial and error by maximizing cumulative rewards from environment interactions.
dd>The decision-making entity that interacts with an environment by taking actions and receiving rewards.
dd>The external system with which an RL agent interacts, providing states and rewards in response to actions.
dd>A representation of the current situation of the environment that the agent uses to make decisions.
dd>A decision or move made by the agent that affects the environment and transitions to a new state.
dd>A scalar feedback signal indicating how desirable an agent’s action was in a given state.
dd>The strategy or function that maps states to actions, defining the agent’s behavior.
dd>A model-free RL algorithm that learns the value of actions in each state (Q-values).
dd>Combining Q-learning with deep neural networks to handle high-dimensional state spaces.
dd>Directly optimizing the policy by following the gradient of expected reward with respect to policy parameters.
dd>An RL architecture combining a policy (actor) with a value function (critic) for more stable learning.
dd>A popular policy gradient method that constrains updates to prevent destructive large policy changes.
dd>Training a model to predict human preferences, used as a reward signal for RLHF.
dd>When an agent exploits loopholes in the reward function to achieve high scores without truly solving the task.
MLOps
- MLOps
- Model Registry
- Feature Store
- Model Drift
- Data Drift
- Concept Drift
- A/B Testing
- Shadow Deployment
- Canary Deployment
- Blue-Green Deployment
- Model Serving
- Model Monitoring
- CI/CD for ML
dd>Machine Learning Operations — practices for deploying and maintaining ML models in production reliably and efficiently.
dd>A centralized repository for versioning, storing, and managing ML models throughout their lifecycle.
dd>A centralized repository for storing and serving ML features consistently across training and inference.
dd>When a model’s performance degrades over time as the real-world data distribution changes.
dd>Changes in the statistical properties of input data over time that can degrade model performance.
dd>When the relationship between input features and the target variable changes over time.
dd>Comparing two versions of a model or feature by splitting traffic and measuring performance differences.
dd>Running a new model in parallel with the production model, comparing outputs without affecting users.
dd>Gradually rolling out a new model to a small percentage of traffic before full deployment.
dd>Maintaining two identical production environments and switching traffic between them for zero-downtime updates.
dd>The infrastructure and processes for making trained models available for inference requests.
dd>Continuously tracking model performance, data quality, and system health in production.
dd>Continuous Integration and Continuous Deployment pipelines adapted for machine learning workflows.
AI Safety
- AI Alignment
- AI Safety
- Red Teaming
- Jailbreak
- Prompt Injection
- Hallucination
- Bias (Algorithmic)
- Fairness
- Explainability (XAI)
- Interpretability
- Robustness
dd>Ensuring AI systems pursue goals that are beneficial to humans and aligned with human values and intentions.
dd>The field focused on preventing harmful outcomes from AI systems, from immediate risks to catastrophic scenarios.
dd>Adversarial testing to identify vulnerabilities, harmful outputs, or failure modes in AI systems.
dd>Circumventing an AI system’s safety guardrails through carefully crafted prompts or inputs.
dd>Manipulating an AI system by embedding malicious instructions in input data or documents.
dd>When an AI generates plausible-sounding but factually incorrect or fabricated information.
dd>Systematic errors that produce unfair outcomes for certain groups in AI systems.
dd>Ensuring AI systems treat different demographic groups equitably and without discrimination.
dd>Making AI decision-making processes understandable and interpretable to humans.
dd>The degree to which a human can understand the cause of a model’s decision without needing post-hoc explanations.
dd>An AI system’s ability to maintain performance under adversarial conditions, distribution shifts, or edge cases.
Agent Systems
- AI Agent
- Autonomous Agent
- Multi-Agent System
- Agent Orchestration
- Tool Use
- Function Calling
- MCP (Model Context Protocol)
- A2A (Agent-to-Agent)
- ReAct
- Planning (Agent)
- Short-Term Memory (Agent)
- Long-Term Memory (Agent)
- Embedding
- Vector Database
- Semantic Search
dd>An autonomous system that perceives its environment, makes decisions, and takes actions to achieve goals.
dd>An AI agent that operates independently, making and executing decisions without continuous human oversight.
dd>A system of multiple AI agents working together, communicating, and coordinating to solve complex tasks.
dd>The coordination of multiple AI agents, managing their interactions, task allocation, and workflow.
dd>An AI agent’s ability to call external functions, APIs, or tools to extend its capabilities beyond text generation.
dd>A structured format for LLMs to invoke external functions with specific parameters, enabling tool use.
dd>An open standard for connecting AI systems to external tools, data sources, and services.
dd>Protocols and standards enabling AI agents to communicate, delegate, and coordinate with each other.
dd>Reasoning + Acting — an agent framework that interleaves thought steps with action execution.
dd>An agent’s ability to decompose complex goals into sequences of actionable steps.
dd>An agent’s working memory for the current conversation or task context.
dd>Persistent storage an agent uses across sessions, typically implemented via vector databases.
dd>A dense vector representation of data (text, images) that captures semantic meaning for similarity search.
dd>A database optimized for storing and querying high-dimensional vectors for similarity search.
dd>Search that understands the meaning of queries rather than just matching keywords.
AI Hardware
- GPU
- TPU
- NPU
- CUDA
- Tensor Core
- VRAM
- Memory Bandwidth
- NVLink
- InfiniBand
- Quantization
- GGUF
- GGML
dd>Graphics Processing Unit — parallel processors originally designed for graphics, now essential for AI training and inference.
dd>Tensor Processing Unit — Google’s custom chip optimized specifically for machine learning workloads.
dd>Neural Processing Unit — specialized processors for neural network inference in consumer devices and edge hardware.
dd>NVIDIA’s parallel computing platform and programming model for GPU-accelerated computing.
dd>Specialized processing units in NVIDIA GPUs optimized for matrix multiplication operations used in deep learning.
dd>Video RAM — high-bandwidth memory on GPUs used to store model weights, activations, and training data.
dd>The rate at which data can be read from or written to memory, a critical bottleneck in AI training.
dd>NVIDIA’s high-speed interconnect for fast GPU-to-GPU communication in multi-GPU systems.
dd>A high-performance networking standard used in AI clusters for low-latency communication between nodes.
dd>Reducing model precision (e.g., from 32-bit to 4-bit) to decrease model size and increase inference speed.
dd>GPT-Generated Unified Format — a file format for quantized LLM weights optimized for llama.cpp inference.
dd>Tensor library for machine learning that powers llama.cpp and enables CPU-based LLM inference.
function filterGlossary() {
var input = document.getElementById(‚glossarySearch‘).value.toLowerCase().trim();
var categories = document.querySelectorAll(‚.category‘);
if (!input) {
categories.forEach(function(cat) { cat.style.display = “; });
document.querySelectorAll(‚.category dt, .category dd‘).forEach(function(el) { el.style.display = “; });
return;
}
categories.forEach(function(cat) {
var items = cat.querySelectorAll(‚dt‘);
var hasVisible = false;
items.forEach(function(dt) {
var term = dt.textContent.toLowerCase();
var dd = dt.nextElementSibling;
var def = dd && dd.tagName === ‚DD‘ ? dd.textContent.toLowerCase() : “;
if (term.indexOf(input) !== -1 || def.indexOf(input) !== -1) {
dt.style.display = “;
if (dd) dd.style.display = “;
hasVisible = true;
} else {
dt.style.display = ’none‘;
if (dd) dd.style.display = ’none‘;
}
});
cat.style.display = hasVisible ? “ : ’none‘;
});
}
