Title: Senior ML Ops Engineer (Machine Learning Infrastructure)
Employment Type: Full-Time
Compensation Range: $150,000 – $240,000 USD
Location: Remote — United States (Any U.S. Time Zone)
Work Schedule: Full-Time, Flexible Hours
Industry: Autonomous Transportation Technology
Work Authorization: Must be authorized to work in the United States. No visa sponsorship is available for this role.
Company Overview
The organization operates in the autonomous transportation sector, developing battery-electric rail vehicles designed to modernize freight logistics. Its mission centers on improving safety, efficiency, and environmental impact by shifting portions of long-haul freight movement from road to rail through advanced autonomous systems. The company is in a growth phase, building first-of-kind autonomous rail technology for real-world deployment.
Position Summary
The Senior ML Ops Engineer (Machine Learning Infrastructure) will lead the design, development, and operation of the scalable systems that power autonomy and perception machine learning pipelines. This role owns the ML infrastructure stack end to end, enabling efficient experimentation, distributed training, deployment, and monitoring of safety-critical ML models across R&D and production environments. The position requires strong system design skills, cloud-native expertise, and close collaboration with ML and robotics teams building real-time autonomous systems.
Key Responsibilities
- Design and implement robust MLOps solutions, including automated pipelines for data management, model training, deployment, and monitoring.
- Architect, deploy, and manage scalable infrastructure for distributed model training and inference.
- Partner with machine learning engineers to gather requirements and define strategies for data workflows, model development, and deployment.
- Build and operate cloud-based ML systems optimized for research, development, and production workloads.
- Develop scalable infrastructure supporting CI/CD, experiment tracking, and governance of models and datasets.
- Automate model evaluation, selection, and deployment workflows to ensure repeatable and reliable ML operations.
Required Qualifications
- Bachelor’s degree or higher in Computer Science, Machine Learning, or a relevant engineering discipline.
- Five or more years of experience building large-scale, reliable systems, with at least two years focused on ML infrastructure or MLOps.
- Demonstrated hands-on experience leading 0 to 1 builds of ML infrastructure platforms or MLOps systems in production environments.
- Proven experience architecting and deploying production-grade ML pipelines and platforms.
- Strong understanding of the full ML lifecycle, including data ingestion, training, evaluation, packaging, and deployment.
- Hands-on experience with MLOps tools such as MLflow, Kubeflow, SageMaker, Airflow, Metaflow, or similar technologies.
- Deep knowledge of CI/CD practices as applied to machine learning workflows.
- Proficiency in Python, Git, and modern software engineering and system design practices.
- Experience designing ML architectures on cloud platforms such as AWS, GCP, or Azure.
Preferred Qualifications
- Experience with deep learning architectures, including CNNs, RNNs, or Transformers.
- Hands-on experience with distributed training frameworks such as PyTorch DDP, Horovod, or Ray.
- Background in real-time ML systems and batch inference with CPU and GPU-aware orchestration.
- Prior experience in autonomous vehicles, robotics, or other real-time, ML-driven systems.