About the Role
The Machine Learning Engineer will build, fine-tune, and rigorously evaluate agentic vision-language models for novel product experiences. They will translate ambiguous product requirements into measurable evaluation criteria while working at the intersection of multimodal modeling and agentic AI.
Requirements
Candidates must have a Bachelor's or Master's degree in Computer Science or Machine Learning with at least 2 years of hands-on experience in generative AI or multimodal models. Proficiency in Python and ML frameworks like PyTorch or TensorFlow is required.
Full Job Description
Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or experience we deliver is the result of us making each other’s ideas stronger. The diversity of our people and their thinking inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you’ll do more than join something — you’ll add something.
Description
The Special Projects team at Apple is developing novel experiences powered by state-of-the-art agentic vision-language models that incorporate visual context into conversational interaction. We are looking for a Machine Learning Engineer to help us build, fine-tune, and rigorously evaluate these systems. A successful candidate has hands-on experience with vision-language models, knows how to translate ambiguous product requirements into measurable evaluation criteria, and is excited to work at the intersection of multimodal modeling and agentic AI.
Minimum Qualifications
BA or Master’s degree in Computer Science or Machine Learning
2+ years of hands-on experience building and evaluating generative AI or multimodal models
Experience working with vision-language models or multimodal systems
Proficiency in Python and ML frameworks (Pytorch or Tensorflow)
Preferred Qualifications
PhD in Computer Science, Machine Learning, Statistics, or other STEM field
Prior industry internship or research experience applying ML to product use cases
Experience with video understanding, temporal reasoning, or activity recognition
Familiarity with agentic system design including tool use, grounding, or perceive-act loops
Experience building or working with large-scale multimodal data and annotation pipelines
Proficiency in training, fine-tuning, and evaluation of foundation models and frameworks
Publications or technical presentations in Machine Learning journals or conferences
Excellent communication skills and cross functional collaboration