KLA
We are seeking a Senior Algorithm Software Architect to lead design and delivery of GPU‑accelerated, high‑performance computing software. You will set architectural direction, coach engineers, and partner with product and domain experts to deliver scalable, reliable systems for large‑scale compute and data workflows.
Responsibilities
Own the end‑to‑end software architecture for HPC/GPU platforms (services, libraries, data pipelines, deployment)
Lead technical strategy, decision records, define candidate architectures, lead design reviews and drive decisions; drive clear trade‑offs for performance, reliability, and maintainability
Design and implement GPU kernels and frameworks (e.g., CUDA, OpenCL, NCCL), optimizing for throughput, latency, and memory use
Guide parallel and distributed computing patterns (MPI, multi‑GPU scaling, heterogeneous compute)
Establish performance engineering practices: profiling, benchmarking, regression performance gates (Nsight Systems/Compute, nvprof)
Collaborate across functions; convert requirements into clear technical plans, roadmaps, and measurable outcomes
Uphold engineering excellence: coding standards, code reviews, test strategies, observability, security considerations
Mentor engineers; provide technical leadership on design, delivery, and career growth.
Communicate architecture, risks, and status to executives and stakeholders with clarity and candor.
Qualifications
10+ years in software engineering; 5+ years in software architecture for HPC or large‑scale systems
Expert in C++ (17/20) and one scripting language (Python preferred)
GPU programming expertise (CUDA, OpenCL); strong knowledge of GPU memory hierarchies, streams, occupancy
Hands‑on with parallel/distributed stacks (MPI, NCCL, gRPC) and Linux performance tooling
Experience with cluster orchestration (Slurm, Kubernetes), CI/CD, and containerization (Docker)
Track record of technical leadership and exceptional communication with cross‑functional teams.
Preferred / Nice‑to‑Have
Multi‑node, multi‑GPU scaling; mixed precision; numerical methods and algorithms.
Experience with H200/H100/A100/L40S‑class accelerators and modern profiling workflows.
No skills specified