CUDA¶

CUDA (Compute Unified Device Architecture) ist NVIDIAs Programmierplattform für GPU-Computing. Sie ist der De-facto-Standard für Deep Learning.

Warum CUDA dominiert¶

Früher Start: Seit 2007 verfügbar
Ökosystem: PyTorch, TensorFlow, alle Frameworks
cuDNN: Hochoptimierte Bibliothek für Deep Learning
Tooling: Profiler, Debugger, umfangreiche Docs

CUDA in der Praxis¶

import torch

# Tensor auf GPU verschieben
x = torch.randn(1000, 1000).cuda()  # oder .to('cuda')

# Modell auf GPU
model = MyModel().cuda()

# Alles auf GPU = schnell
output = model(x)

CUDA Compute Capability¶

Verschiedene GPU-Generationen haben verschiedene Features:

Architektur	Compute Capability	Features
Pascal (GTX 10xx)	6.x	FP16
Turing (RTX 20xx)	7.5	Tensor Cores
Ampere (RTX 30xx)	8.x	TF32, bessere Tensor Cores
Ada (RTX 40xx)	8.9	FP8
Blackwell (RTX 50xx)	10.0	Noch bessere Tensor Cores

Alternativen¶

ROCm (AMD)¶

Open-Source Alternative
Wachsende Unterstützung
Noch nicht so ausgereift

Metal (Apple)¶

Für Apple Silicon
MLX Framework
Begrenzte Framework-Unterstützung

Siehe auch¶

GPU – Die Hardware dahinter
VRAM – Der Speicher
Tensor – Was auf der GPU berechnet wird