Full Job Description
As an MLOps Engineer, you will be the backbone of our machine learning infrastructure, ensuring that AI/ML systems are reliable, scalable, and continuously improving in production. You will bridge the gap between data science and engineering, driving operational excellence across the full ML lifecycle.
Description
The MLOps Engineer will drive end-to-end quality initiatives across data ingestion, model training, deployment pipelines, and MLOps tooling. This hire will build, deploy, and optimize AI/ML based applications with a strong emphasis on scalable, and production-ready systems. You will establish standard methodologies for model integration, deployment, and monitoring using CI/CD principles.
Minimum Qualifications
8 years in software engineering with demonstrated experience in large-scale software system design and implementation.
Bachelor's Degree in Software Engineering, Computer Science, Statistics, Data Mining, Machine Learning, Operations Research, or related field.
Proven track record of shipping and maintaining production-grade ML systems end-to-end.
Strong experience with distributed systems, databases (SQL/NoSQL), cloud platforms (AWS, Azure, or GCP), and container orchestration tools such as Kubernetes.
Hands-on experience with MLOps tooling and platforms such as Ray, MLflow, Kubeflow, SageMaker, Vertex AI, or similar.
Proficiency in Python and familiarity with ML frameworks such as TensorFlow, PyTorch, or scikit-learn.
Experience building and managing CI/CD pipelines for ML workflows using tools such as Jenkins, GitHub Actions, or ArgoCD.
Strong understanding of data pipeline orchestration tools such as Airflow, Prefect, or similar.
Preferred Qualifications
10 years of related experience building high-throughput, scalable applications or machine learning models in a production environment.
Familiarity with model monitoring, drift detection, and observability practices in production environments.
Excellent cross-functional communication skills with the ability to collaborate effectively across engineering and data science teams.
Comfort using LLM-based tools such as Claude, Gemini, or ChatGPT to assist with code generation, documentation, debugging, and workflow automation.
Demonstrated ability to critically evaluate and validate LLM-generated outputs, ensuring accuracy and reliability before applying them in production contexts.
Experience incorporating AI-assisted tools into day-to-day engineering workflows, with an understanding of their limitations and appropriate use cases.