ML Framework (MetalLM) Engineer, Graphics, Game and ML

Apple

H1B ✓On-sitesenior levelPosted March 27, 2026

About the Role

The role involves working on high-performance, distributed inference for GenAI applications like LLMs on Private Cloud Compute using custom server hardware. Engineers will focus on building scalable, efficient, and production-grade solutions tailored for high-throughput GPU execution.

Requirements

Candidates must have a minimum of 3 years of programming experience with C/C++/ObjC, coupled with expertise in GPU kernel development using models like Metal or CUDA. Experience with distributed techniques and system-level programming is also required.

Full Job Description

Apple’s Server ML Frameworks team in GPU, Graphics and Machine Learning works on enabling Apple Intelligence through high-performance, distributed inference of GenAI applications (such as LLMs) on Private Cloud Compute. You will get to work on custom-built server hardware that brings the power and security of Apple silicon to the data center. We are looking for engineers with systems background who are deeply passionate about building scalable, efficient, and production-grade solutions tailored for high-throughput GPU execution.

Description

Our team is seeking extraordinary machine learning and GPU programming engineers who are passionate about providing robust compute solutions for accelerating Machine learning libraries on Apple Silicon. Role has the opportunity to influence the design of compute and programming models in next generation GPU architectures.

Minimum Qualifications

3+ years of programming and problem-solving experience with C/C++/ObjC Experience with GPU kernel development & optimizations using compute programming models such as Metal, CUDA etc. Experience with Distributed training or inference techniques Experience with system level programming and computer architecture

Preferred Qualifications

Experience with graph compilers such as CuTE, CuTile, Triton, OpenXLA or LLVM is a plus Good understanding of LLM and Diffusion based model architectures

Opens the employer's site in a new tab

AI Resume Tailoring

Generate a resume tailored to this job's requirements based on your uploaded resume.

Compensation

AI Est. Total Comp

$411,500

Details

Location

Cupertino

Work Type

On-site

Seniority

senior level

Experience

2-5 years

Key Skills

C++ObjCGPU Kernel DevelopmentMetalCUDADistributed TrainingDistributed InferenceSystem Level ProgrammingComputer ArchitectureGraph CompilersLLVMLLMDiffusion Models