settings for PINN

basic concept

ninja

ninja is a build system—a tool that actually runs the compile and link commands for a software project. Think of it as a faster, more streamlined replacement for GNU make.

Why it’s alongside cmake: CMake generates build files; one of the formats it can emit is for Ninja. Many projects prefer the CMake + Ninja combo because Ninja’s focus on minimal bookkeeping and highly optimized parallel scheduling makes large builds noticeably quicker.
Key traits:
- Very small codebase and zero-configuration behavior.
- Designed for speed—only the bare information needed to decide what to rebuild is stored.
- Widely used in big projects (Chromium, LLVM, Android, etc.).

So in the Homebrew command, brew install cmake ninja git wget, you’re grabbing both the build-file generator (cmake) and the build executor (ninja), plus git for version-control checkouts and wget for downloading extra files.

MPS

MPS = Metal Performance Shaders In PyTorch’s context, “MPS” refers to Apple’s Metal Performance Shaders framework, which PyTorch exposes as the mps device. It’s Apple’s analogue to NVIDIA’s CUDA or AMD’s ROCm.

What it does in PyTorch

Role	What happens
GPU backend for macOS	Lets PyTorch send tensor ops to the GPU on Apple-silicon (M1/M2/M3) and recent Intel/AMD Macs instead of running them on CPU.
Bridges to Metal API	Translates PyTorch kernels into Metal compute kernels that the Metal driver can schedule on Apple GPUs.
Drop-in usage	You just pick the device: `device = torch.device("mps")` (or move a tensor/model with `.to("mps")`); no code changes beyond what you’d do for CUDA.
Performance focus	MPS kernels are highly tuned for each Apple GPU family, giving several-fold speed-ups for training and inference compared with CPU on the same machine.

Why you’ll see “PyTorch 2.7 + MPS” in release notes or installer commands

Binary wheels built with MPS support mean you don’t have to compile PyTorch yourself; the backend is already compiled against Apple’s Metal libraries.
On a Mac, that combo is the recommended way to get GPU acceleration if you don’t have an NVIDIA card.

Quick demo snippet

import torch
x = torch.randn(8, 1024, device="mps")   # tensor now lives on the Apple-GPU
y = x @ x.t()                            # matrix multiply runs on MPS backend
print(y.device)                          # mps:0

When it’s useful

Training small- to medium-size models locally on a Mac laptop/desktop.
Running inference pipelines (e.g. LLMs, Stable Diffusion) without an external GPU.
Prototyping when you want portable code that runs on either CUDA, ROCm, or Apple GPUs just by switching devices.

DALI

DALI = NVIDIA Data Loading Library

What it is	What it does	Why it matters in Modulus / deep-learning
A GPU-accelerated, open-source library from NVIDIA	Builds end-to-end data-processing pipelines (decode → crop/resize → augment → normalize → batch) that run largely on the GPU and overlap with training	Removes the CPU/IO bottleneck so your GPU stays busy and epoch times drop, especially for large image/video/audio/scientific datasets

Key capabilities

Pipeline API – you declare a directed graph of operators, and DALI’s runtime executes it efficiently across CPU and GPU stages. (NVIDIA Docs)
Turn-key operators – JPEG/PNG/WebP decoders, random crops, flips, color jitter, optical flow, audio resampling, etc. (NVIDIA Developer)
Drop-in loaders – Adapters for PyTorch, TensorFlow, MXNet; replaces the framework’s DataLoader/Dataset with a DALI iterator so the rest of your code is unchanged. (GitHub)

Why you might skip it on Apple-silicon

DALI relies on CUDA and NVIDIA GPUs; Apple M-series GPUs cannot run it, so “Path A – Install Modulus without DALI” is the pragmatic route on a Mac. You’ll still get core Modulus functionality, but any Modulus helper utilities that wrap DALI (e.g., GIF creation or specialized loaders in modulus.datapipes) disappear. (NVIDIA Docs)

In one sentence

DALI is NVIDIA’s high-throughput, GPU-powered data-ingestion and augmentation engine; in projects like Modulus it’s optional but prized on NVIDIA hardware because it keeps the training pipeline from becoming CPU-bound.