SV Advanced Computing, LLC

Performance Engineering
for GPU & AI Systems

Independent research and consulting in HPC performance optimization, GPU computing, and AI infrastructure. We profile, analyze, and optimize at the intersection of hardware and software.

I. Recent Investigations

Establishing a Baseline: AI Framework Profiling Methodology

Documenting the profiling methodology and toolchain we'll use across our AI framework performance investigations. Covers Nsight Systems, Nsight Compute, PyTorch profiler integration, and custom instrumentation approaches.

II. Focus Areas

AI Framework Performance

Profiling and optimization of PyTorch, NeMo, Megatron-Core, and inference runtimes (vLLM, TRT-LLM). End-to-end training and serving pipeline analysis.

GPU Architecture Analysis

Deep performance characterization on NVIDIA and AMD GPU architectures. Memory hierarchy analysis, occupancy tuning, kernel optimization using Nsight and ROCm profiling tools.

HPC & Distributed Systems

MPI, NCCL/RCCL communication optimization. Multi-node scaling analysis. Performance engineering for exascale-class systems.

Custom Instrumentation

Beyond vendor tools — custom profiling, roofline modeling, memory bandwidth characterization, and performance modeling for novel workloads.