AI/ML Glossary — 100 Key Terms Explained

A comprehensive reference of essential artificial intelligence and machine learning terms, organized alphabetically with clear definitions and practical examples.

Quick Jump:
A B C
D E F
G H I
L M N
O P Q
R S T
V Z

A

Activation Function
A mathematical function applied to a neuron’s output in a neural network (e.g., ReLU, sigmoid, tanh). Determines whether and how strongly a neuron should fire. ReLU (max(0, x)) is the most common in modern networks.
Agent (AI Agent)
An autonomous system that perceives its environment, makes decisions, and takes actions to achieve goals. AI agents use LLMs for reasoning and can call tools, access memory, and interact with other agents.
AGI (Artificial General Intelligence)
A hypothetical AI system that matches or exceeds human cognitive abilities across all domains. Contrasted with narrow AI, which excels at specific tasks. Not yet achieved as of 2026.
Attention Mechanism
A technique that allows models to focus on relevant parts of the input when producing output. The foundation of transformer architecture. Self-attention computes relationships between all positions in a sequence.
Augmented Generation
See RAG (Retrieval-Augmented Generation).

B

Backpropagation
The algorithm used to train neural networks. Computes the gradient of the loss function with respect to each weight by applying the chain rule, then updates weights to minimize loss.
Batch Normalization
A technique that normalizes the inputs to each layer, stabilizing and accelerating training. Reduces internal covariate shift.
Beam Search
A decoding algorithm for sequence generation that maintains the top-k most likely partial sequences at each step. Balances quality and diversity in text generation.
BERT (Bidirectional Encoder Representations from Transformers)
A pre-trained language model by Google that reads text bidirectionally. Revolutionized NLP tasks like question answering and sentiment analysis. Largely superseded by decoder-only LLMs for generation tasks.

C

Chain-of-Thought (CoT)
A prompting technique that asks an LLM to reason step-by-step before giving a final answer. Significantly improves performance on complex reasoning tasks.
ChatGPT
OpenAI’s conversational AI chatbot, launched November 2022. Powered by GPT-4/GPT-4o. Popularized LLMs for mainstream use.
CLIP (Contrastive Language-Image Pre-training)
OpenAI’s model that connects vision and language. Enables zero-shot image classification by comparing images with text descriptions.
Context Window
The maximum amount of text (in tokens) an LLM can process in a single request. GPT-4o: 128K tokens. Claude 3.5: 200K tokens. Gemini 1.5: 1M+ tokens.
Convolutional Neural Network (CNN)
A neural network architecture designed for processing grid-like data (images). Uses convolutional filters to detect spatial patterns. Foundation of computer vision.

D

Data Augmentation
Techniques to artificially increase training data diversity (e.g., image rotation, cropping, flipping). Reduces overfitting and improves model generalization.
Deep Learning
A subset of machine learning using neural networks with multiple layers. Enables learning of hierarchical feature representations from raw data.
Diffusion Model
A generative model that learns to reverse a noise-adding process. Powers image generators like Stable Diffusion and DALL·E. Produces high-quality, diverse outputs.
Distillation
Training a smaller „student“ model to mimic a larger „teacher“ model. Reduces inference cost while preserving most of the performance.

E

Embedding
A dense vector representation of data (text, images, etc.) that captures semantic meaning. Similar items have similar embeddings. Used for search, clustering, and RAG.
Epoch
One complete pass through the entire training dataset. Multiple epochs are typically needed for convergence.
Expert Model
See Mixture of Experts (MoE).

F

Few-Shot Learning
Training or prompting a model to perform a task with only a few examples. LLMs excel at few-shot learning through in-context examples.
Fine-Tuning
Adapting a pre-trained model to a specific task by continuing training on task-specific data. Requires less data than training from scratch.
Foundation Model
A large, general-purpose model trained on broad data that can be adapted to many downstream tasks. Examples: GPT-4, Claude, Llama, Gemini.

G

GAN (Generative Adversarial Network)
A framework where two networks compete: a generator creates fake data, a discriminator tries to detect fakes. Largely superseded by diffusion models for image generation.
GPT (Generative Pre-trained Transformer)
OpenAI’s family of large language models. GPT-4 and GPT-4o are the most capable as of 2026. Decoder-only transformer architecture.
Gradient Descent
The optimization algorithm that iteratively adjusts model parameters to minimize the loss function. Variants: SGD, Adam, AdamW.
GPU (Graphics Processing Unit)
Hardware originally designed for graphics, now essential for training and running AI models. NVIDIA dominates with A100, H100, and B200 chips.

H

Hallucination
When an LLM generates plausible-sounding but factually incorrect information. A major challenge for production AI systems. Mitigated through RAG, fact-checking, and grounding.
Hidden Layer
Layers in a neural network between input and hidden layers that learn intermediate representations.

I

Inference
The process of running a trained model to generate predictions or outputs. Contrasted with training. Inference is typically much cheaper and faster.
In-Context Learning
An LLM’s ability to learn from examples provided in the prompt without parameter updates. A key capability of large language models.

L

LangChain
A framework for building LLM-powered applications. Provides abstractions for chains, agents, memory, and tool integration. LangGraph extends it for graph-based workflows.
LLM (Large Language Model)
A language model with billions of parameters trained on massive text corpora. Capable of understanding and generating human-like text. Examples: GPT-4, Claude, Llama, Gemini.
LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning method that adds small trainable matrices to frozen weights. Reduces fine-tuning cost by 90%+ while maintaining quality.
Loss Function
A function that measures how far the model’s predictions are from the true values. Training aims to minimize this function. Common: cross-entropy, MSE.

M

Mixture of Experts (MoE)
An architecture where different „expert“ sub-networks handle different inputs. Only a subset of experts activate per input, reducing compute. Used in Mixtral, GPT-4 (rumored).
ML (Machine Learning)
A subset of AI where systems learn from data rather than following explicit rules. Includes supervised, unsupervised, and reinforcement learning.
Model Drift
When a model’s performance degrades over time as the real-world data distribution changes from training data. Requires periodic retraining.

N

Neural Network
A computational model inspired by biological neurons. Consists of layers of connected nodes (neurons) with learnable weights. The foundation of deep learning.
NLP (Natural Language Processing)
The field of AI focused on understanding and generating human language. LLMs have transformed NLP by providing general-purpose language understanding.

O

Overfitting
When a model learns training data too well, including noise, and fails to generalize to new data. Detected when training loss decreases but validation loss increases.

P

Parameter
A learnable weight in a neural network. GPT-4 is estimated at 1.8T parameters. More parameters generally mean more capability but higher cost.
Pre-training
The initial training phase where a model learns general patterns from a large, diverse dataset. Followed by fine-tuning for specific tasks.
Prompt Engineering
The craft of designing effective prompts to get desired outputs from LLMs. Includes techniques like CoT, few-shot examples, and role assignment.

Q

Quantization
Reducing the precision of model weights (e.g., 32-bit to 4-bit) to reduce memory and compute requirements. Enables running large models on consumer hardware.
QLoRA
Quantized LoRA: combines 4-bit quantization with LoRA fine-tuning. Enables fine-tuning 65B models on a single consumer GPU.

R

RAG (Retrieval-Augmented Generation)
A technique that enhances LLM responses by retrieving relevant documents from a knowledge base before generation. Reduces hallucinations and enables up-to-date information.
RLHF (Reinforcement Learning from Human Feedback)
A training technique where models are fine-tuned using human preference data. Key to making LLMs helpful, harmless, and honest. Used in ChatGPT and Claude.
ReLU (Rectified Linear Unit)
The most common activation function: f(x) = max(0, x). Simple, fast, and avoids the vanishing gradient problem.

S

Self-Attention
The mechanism in transformers where each token attends to all other tokens in the sequence. Enables understanding of long-range dependencies in text.
Supervised Learning
Learning from labeled examples where the correct output is provided. The most common form of ML training.

T

Temperature
A parameter controlling randomness in LLM output. Low (0.1) = deterministic and focused. High (1.0+) = creative and diverse.
Token
The basic unit of text processed by an LLM. Approximately 4 characters or 0.75 words in English. Pricing is per-token.
Transformer
The neural network architecture introduced in „Attention Is All You Need“ (2017). Uses self-attention instead of recurrence. Foundation of all modern LLMs.

V

Vector Database
A database optimized for storing and querying high-dimensional vectors (embeddings). Enables fast semantic search. Examples: Pinecone, Weaviate, ChromaDB, Qdrant.

Z

Zero-Shot Learning
Performing a task the model was never explicitly trained for, using only a natural language description. A key capability of large language models.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert