AI/ML Glossary: 60+ Essential Terms Explained

Reviewed: June 4, 2026

Last updated: May 2026 — A comprehensive reference guide to the most important terms in artificial intelligence, machine learning, and large language models. Whether you’re a developer, product manager, or business leader, this glossary will help you speak the language of AI.

Quick Navigation:
Foundation |
LLM |
Agents |
Training |
Infrastructure |
Safety

Foundation Terms

Term Definition Example
Artificial Intelligence (AI) The broad field of creating systems that can perform tasks typically requiring human intelligence — reasoning, perception, language, decision-making. Chatbots, self-driving cars, recommendation engines
Machine Learning (ML) A subset of AI where systems learn patterns from data rather than following explicit rules. The model improves with more data. Spam filters, fraud detection, image recognition
Deep Learning ML using neural networks with many layers („deep“ networks). Powers modern AI breakthroughs in vision, language, and audio. GPT-4, Stable Diffusion, Whisper
Neural Network A computational model inspired by the brain. Consists of interconnected nodes (neurons) organized in layers that process information. The foundational architecture behind all modern AI
Transformer The neural network architecture (introduced 2017) that powers all modern LLMs. Uses „self-attention“ to understand context across long sequences. GPT, Claude, Gemini, Llama — all transformer-based
Attention Mechanism The core innovation of transformers. Allows the model to focus on relevant parts of the input when generating each output token. When translating „the cat sat on the mat,“ the model attends to „cat“ when generating the subject
Parameters The learned weights in a neural network. More parameters generally means more capacity to learn patterns (but also more compute needed). GPT-4: ~1.8T params; Llama 4 Scout: 17B active (109B total MoE)

Large Language Model (LLM) Terms

Term Definition Example
Large Language Model (LLM) A transformer model trained on vast text data to understand and generate human language. „Large“ = billions of parameters. GPT-4.1, Claude 3.7, Llama 4, Gemini 2.5, Mistral Large 3
Token The basic unit of text an LLM processes. ~4 characters in English. One token ≈ ¾ of a word on average. „Hello, world!“ = 4 tokens
Context Window The maximum number of tokens an LLM can process in a single request (input + output combined). Claude 3.7: 200K tokens (~150K words); GPT-4.1: 1M tokens
Prompt The input text that tells the LLM what to do. Can include instructions, examples, and context. „Summarize this article in 3 bullet points“
Prompt Engineering The craft of designing effective prompts to get the best outputs from an LLM. Includes techniques like few-shot, chain-of-thought, and role prompting. Adding „Let’s think step by step“ to improve reasoning
Temperature A parameter (0-2) controlling randomness. Lower = more deterministic/focused; higher = more creative/random. 0.0 for code/legal; 0.7 for chat; 1.2+ for creative writing
Hallucination When an LLM generates plausible-sounding factually incorrect information. The most significant reliability challenge. LLM confidently states a false date or invents a fake source
RAG (Retrieval-Augmented Generation) An architecture that augments an LLM with a retrieval system (usually vector search) so it can ground answers in factual, up-to-date documents. Enterprise Q&A over internal documentation
Embedding A numerical vector representation of text that captures semantic meaning. Similar texts have similar vectors. „king“ – „man“ + „woman“ ≈ „queen“ (classic word2vec example)
Vector Database A database optimized for storing and searching vector embeddings. Powers semantic search and RAG systems. Pinecone, Weaviate, Milvus, Qdrant, pgvector
Fine-tuning Further pre-training a base model on a specialized dataset to improve performance on specific tasks or domains. Training Llama on medical papers for a healthcare chatbot
LoRA (Low-Rank Adaptation) An efficient fine-tuning method that updates only small „adapter“ matrices instead of all model parameters. 100x cheaper than full fine-tuning. Fine-tuning a 70B model on a single GPU instead of 8
QLoRA Quantized LoRA — combines 4-bit quantization with LoRA for even more efficient fine-tuning. Fine-tuning a 7B model on a consumer GPU
GGUF A file format for quantized models, optimized for local CPU/GPU inference via llama.cpp. llama-4-scout-17b-16e-instruct-Q4_K_M.gguf
Quantization Reducing model weight precision (e.g., 16-bit → 4-bit) to reduce memory and speed up inference with minimal quality loss. Q4_K_M: 4-bit quantization, ~95% of full precision quality
MoE (Mixture of Experts) Architecture where different „expert“ sub-networks handle different inputs. Only a subset activates per token, reducing compute. Llama 4 Scout: 16 experts, 2 active per token

Agent Terms

Term Definition Example
AI Agent An LLM-based system that can take actions (call tools, write code, make API calls) to accomplish goals autonomously. A coding agent that reads a GitHub issue, writes code, and opens a PR
Tool Use / Function Calling The ability of an LLM to invoke external functions (APIs, databases, code execution) as part of its reasoning process. Agent calls weather API to answer „What’s the weather in Zurich?“
Chain-of-Thought (CoT) Prompting technique that asks the LLM to reason step-by-step before giving the final answer. Dramatically improves complex reasoning. „Let’s solve this math problem step by step…“
ReAct (Reasoning + Acting) An agent pattern that alternates between reasoning (thinking) and acting (using tools) in a loop until the task is complete. Think → Search → Think → Read → Think → Answer
Multi-Agent System Multiple AI agents working together, each with different roles, expertise, or perspectives, coordinated by an orchestration layer. Research agent + Writing agent + Review agent collaborating on a report
MCP (Model Context Protocol) An open standard (by Anthropic, 2024) for connecting LLMs to external tools and data sources. Replaces ad-hoc integrations. Connecting Claude to your database, file system, or APIs via MCP servers
A2A (Agent-to-Agent) Google’s protocol for agents to communicate and collaborate across different platforms and frameworks. A Google agent delegating a subtask to an Anthropic agent

Training & Evaluation Terms

Term Definition Example
Pre-training The initial training phase where a model learns general language patterns from massive text corpora (books, web, code). GPT-4 pre-trained on ~13T tokens of internet text
RLHF (RL from Human Feedback) Training technique where humans rank model outputs, and a reward model is trained to align the LLM with human preferences. ChatGPT’s helpfulness and safety alignment
DPO (Direct Preference Optimization) A simpler alternative to RLHF that directly optimizes the model on preference data without a separate reward model. Used to fine-tune Llama 3 and Mistral models
Benchmark A standardized test to evaluate model performance on specific tasks (reasoning, coding, math, etc.). MMLU (knowledge), HumanEval (coding), GSM8K (math)
Scaling Laws Empirical relationships showing that model performance improves predictably with more compute, data, and parameters. Chinchilla optimal: 20 tokens per parameter

Infrastructure Terms

Term Definition Example
Inference The process of running a trained model to generate outputs (as opposed to training). Sending a prompt to GPT-4 and receiving a response
vLLM A high-performance LLM serving engine that optimizes inference throughput using PagedAttention. Serving Llama 4 at 10x the throughput of naive inference
KV Cache A cache of key-value attention states from previous tokens, avoiding redundant computation during generation. Reduces generation time by 50-80% for long sequences
Batch Inference Processing multiple requests simultaneously to maximize GPU utilization and throughput. Processing 100 classification requests in one GPU pass
GPU VRAM Video RAM on a graphics card. Determines the maximum model size that can fit on a single GPU. NVIDIA A100: 80GB; RTX 5090: 32GB; M4 Ultra: 192GB unified

Safety & Alignment Terms

Term Definition Example
Alignment The process of ensuring AI systems behave in accordance with human values, intentions, and safety requirements. Training Claude to refuse harmful requests
Jailbreak A technique to bypass an AI model’s safety guardrails through carefully crafted prompts. „DAN“ (Do Anything Now) prompts that bypass content filters
Guardrails Safety mechanisms that constrain AI behavior — input filtering, output validation, content policies. Refusing to generate PII, hate speech, or dangerous instructions
Red Teaming Adversarial testing where experts try to find vulnerabilities, biases, or harmful outputs in an AI system. Hiring security researchers to probe a new LLM before release

About this glossary: This reference is maintained as part of the data-gate.ch knowledge base. Definitions reflect the state of the field as of May 2026. AI evolves rapidly — terms and meanings may shift.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert