Platform Engineer

Company: Chapter 2
Apply for the Platform Engineer
Location: London
Job Description:

Job Title: MLOps + DevOps (Platform) Engineer

Location: Remote / Hybrid

Job Type: Full-time

About the Role

Chapter 2 is working with a leading creative agency to develop scalable machine learning platforms for AI-driven content creation. This role is perfect for an MLOps + DevOps Engineer who thrives in fast-paced environments, takes ownership, and has experience building infrastructure for large-scale AI and ML applications. You’ll be instrumental in developing automated, scalable, and high-performance ML infrastructure to support generative AI workflows and large language models (LLMs) in production.

What You’ll Do

  • Design, build, and maintain scalable ML platforms for model development, experimentation, and production workflows.
  • Automate ML infrastructure deployment, including data pipelines, model training, validation, and deployment.
  • Manage the full ML lifecycle, from model versioning to deployment, monitoring, and retraining.
  • Optimise large language model (LLM) operations, ensuring efficient fine-tuning, deployment, and performance monitoring.
  • Collaborate closely with data scientists and engineers to develop and deploy ML models at scale.
  • Optimise performance for inference and training across GPUs and cloud-based architectures.
  • Ensure security and compliance for ML platforms handling sensitive data.
  • Evaluate and integrate MLOps tools (MLflow, Kubeflow, etc.) to enhance efficiency.
  • Implement monitoring and alerting systems to detect anomalies and maintain model reliability.

What We’re Looking For

  • 3+ years of experience in software engineering, infrastructure, or MLOps roles.
  • Proven expertise in building and maintaining ML platforms at scale.
  • Hands-on experience with cloud platforms (AWS, GCP, or Azure) for ML workloads.
  • Strong proficiency with Docker, Kubernetes, and infrastructure automation (Terraform, CloudFormation).
  • Solid programming skills in Python and familiarity with ML frameworks like TensorFlow, PyTorch.
  • Experience designing CI/CD pipelines for ML workflows and deployment automation.
  • Exposure to LLM Ops, including managing fine-tuning and deployment of large language models.
  • Strong problem-solving skills and ability to troubleshoot complex ML infrastructure issues.
  • Ability to work in a fast-paced, high-growth environment with a product-oriented mindset.
  • Bonus: Experience with big data tools (Spark, Kafka) and feature stores.

Why Join Us?

  • Work on cutting-edge AI and ML infrastructure supporting generative AI products.
  • Be part of a high-impact, innovative team driving AI advancements.
  • Competitive salary, benefits, and career growth opportunities.
  • Collaborate with top-tier engineers and data scientists in the AI space.

Excited? Let’s talk. Apply now with your resume and portfolio!

Posted: April 17th, 2025