basic concept
ninja
ninja
is a build system—a tool that actually runs the compile and link commands for a software project. Think of it as a faster, more streamlined replacement for GNU make
.
Why it’s alongside
cmake
: CMake generates build files; one of the formats it can emit is for Ninja. Many projects prefer the CMake + Ninja combo because Ninja’s focus on minimal bookkeeping and highly optimized parallel scheduling makes large builds noticeably quicker.Key traits:
Very small codebase and zero-configuration behavior.
Designed for speed—only the bare information needed to decide what to rebuild is stored.
Widely used in big projects (Chromium, LLVM, Android, etc.).
So in the Homebrew command, brew install cmake ninja git wget
, you’re grabbing both the build-file generator (cmake
) and the build executor (ninja
), plus git
for version-control checkouts and wget
for downloading extra files.
MPS
MPS = Metal Performance Shaders In PyTorch’s context, “MPS” refers to Apple’s Metal Performance Shaders framework, which PyTorch exposes as the mps
device. It’s Apple’s analogue to NVIDIA’s CUDA or AMD’s ROCm.
What it does in PyTorch
Role | What happens |
---|---|
GPU backend for macOS | Lets PyTorch send tensor ops to the GPU on Apple-silicon (M1/M2/M3) and recent Intel/AMD Macs instead of running them on CPU. |
Bridges to Metal API | Translates PyTorch kernels into Metal compute kernels that the Metal driver can schedule on Apple GPUs. |
Drop-in usage | You just pick the device: device = torch.device("mps") (or move a tensor/model with .to("mps") ); no code changes beyond what you’d do for CUDA. |
Performance focus | MPS kernels are highly tuned for each Apple GPU family, giving several-fold speed-ups for training and inference compared with CPU on the same machine. |
Why you’ll see “PyTorch 2.7 + MPS” in release notes or installer commands
Binary wheels built with MPS support mean you don’t have to compile PyTorch yourself; the backend is already compiled against Apple’s Metal libraries.
On a Mac, that combo is the recommended way to get GPU acceleration if you don’t have an NVIDIA card.
Quick demo snippet
1 | import torch |
When it’s useful
Training small- to medium-size models locally on a Mac laptop/desktop.
Running inference pipelines (e.g. LLMs, Stable Diffusion) without an external GPU.
Prototyping when you want portable code that runs on either CUDA, ROCm, or Apple GPUs just by switching devices.
DALI
DALI = NVIDIA Data Loading Library
What it is | What it does | Why it matters in Modulus / deep-learning |
---|---|---|
A GPU-accelerated, open-source library from NVIDIA | Builds end-to-end data-processing pipelines (decode → crop/resize → augment → normalize → batch) that run largely on the GPU and overlap with training | Removes the CPU/IO bottleneck so your GPU stays busy and epoch times drop, especially for large image/video/audio/scientific datasets |
Key capabilities
Pipeline API – you declare a directed graph of operators, and DALI’s runtime executes it efficiently across CPU and GPU stages. (NVIDIA Docs)
Turn-key operators – JPEG/PNG/WebP decoders, random crops, flips, color jitter, optical flow, audio resampling, etc. (NVIDIA Developer)
Drop-in loaders – Adapters for PyTorch, TensorFlow, MXNet; replaces the framework’s
DataLoader
/Dataset
with a DALI iterator so the rest of your code is unchanged. (GitHub)
Why you might skip it on Apple-silicon
DALI relies on CUDA and NVIDIA GPUs; Apple M-series GPUs cannot run it, so “Path A – Install Modulus without DALI” is the pragmatic route on a Mac. You’ll still get core Modulus functionality, but any Modulus helper utilities that wrap DALI (e.g., GIF creation or specialized loaders in modulus.datapipes
) disappear. (NVIDIA Docs)
In one sentence
DALI is NVIDIA’s high-throughput, GPU-powered data-ingestion and augmentation engine; in projects like Modulus it’s optional but prized on NVIDIA hardware because it keeps the training pipeline from becoming CPU-bound.