Job Title: MLOps + DevOps (Platform) Engineer
Location: Remote / Hybrid
Job Type: Full-time
About the Role
Chapter 2 is working with a leading creative agency to develop scalable machine learning platforms for AI-driven content creation. This role is perfect for an MLOps + DevOps Engineer who thrives in fast-paced environments, takes ownership, and has experience building infrastructure for large-scale AI and ML applications. You’ll be instrumental in developing automated, scalable, and high-performance ML infrastructure to support generative AI workflows and large language models (LLMs) in production.
What You’ll Do
- Design, build, and maintain scalable ML platforms for model development, experimentation, and production workflows.
- Automate ML infrastructure deployment, including data pipelines, model training, validation, and deployment.
- Manage the full ML lifecycle, from model versioning to deployment, monitoring, and retraining.
- Optimise large language model (LLM) operations, ensuring efficient fine-tuning, deployment, and performance monitoring.
- Collaborate closely with data scientists and engineers to develop and deploy ML models at scale.
- Optimise performance for inference and training across GPUs and cloud-based architectures.
- Ensure security and compliance for ML platforms handling sensitive data.
- Evaluate and integrate MLOps tools (MLflow, Kubeflow, etc.) to enhance efficiency.
- Implement monitoring and alerting systems to detect anomalies and maintain model reliability.
What We’re Looking For
- 3+ years of experience in software engineering, infrastructure, or MLOps roles.
- Proven expertise in building and maintaining ML platforms at scale.
- Hands-on experience with cloud platforms (AWS, GCP, or Azure) for ML workloads.
- Strong proficiency with Docker, Kubernetes, and infrastructure automation (Terraform, CloudFormation).
- Solid programming skills in Python and familiarity with ML frameworks like TensorFlow, PyTorch.
- Experience designing CI/CD pipelines for ML workflows and deployment automation.
- Exposure to LLM Ops, including managing fine-tuning and deployment of large language models.
- Strong problem-solving skills and ability to troubleshoot complex ML infrastructure issues.
- Ability to work in a fast-paced, high-growth environment with a product-oriented mindset.
- Bonus: Experience with big data tools (Spark, Kafka) and feature stores.
Why Join Us?
- Work on cutting-edge AI and ML infrastructure supporting generative AI products.
- Be part of a high-impact, innovative team driving AI advancements.
- Competitive salary, benefits, and career growth opportunities.
- Collaborate with top-tier engineers and data scientists in the AI space.
Excited? Let’s talk. Apply now with your resume and portfolio!
…