🚀 At A5 Labs, we are committed to creating cutting-edge, AI-driven experiences that redefine industry standards. If you’ve ever played an online casino game, you may have already encountered our technology and innovation.

📎 The Role

We’re seeking a Senior DevOps Engineer, ideally, with expertise in MLOps and LLMOp, to join our team. In this role, you will help us build and operate the infrastructure behind our poker applications, ensuring that it is secure, scalable, and efficient.

You will work closely with product and engineering teams to enable a self-service approach, allowing developers to ship features faster and more reliably. From designing cloud infrastructure to automating deployments and establishing monitoring and incident management practices, you’ll be at the heart of how we scale our platform and teams.

You’ll also play a key role in supporting our MLOps and LLMOps workflows, helping scale AI model deployment and experimentation across our platform.

🚀 Key Objectives

Design and build cloud infrastructure to run poker applications at scale, optimised for learning and exploration by recreational players worldwide.
Optimise development workflows by automating builds, testing, and deployments while ensuring fast, reliable infrastructure to minimise friction and maximise developer focus.
Establish and maintain robust MLOps and LLMOps workflows to support the scalable development, reliable deployment, and continuous optimisation of LLMs at scale.

What you bring to the table

🛠 Experience
7+ years in DevOps/Infrastructure Engineering, including AI/ML workloads in production.

☁️ Cloud & Efficiency

Strong AWS and Cloudflare skills with hands-on experience in EB, ECS, RDS, MSK/Kinesis, CloudWatch, IAM, Lambda, S3, Route 53, etc., and a proven track record in infrastructure cost optimisation.

🌍 Multi-region & Scaling

Experience designing highly available, scalable, multi-region systems with disaster recovery strategies and cost optimisation.

📦 Containerisation & Orchestration

Hands-on experience with Docker and orchestration platforams such as ECS, EKS, or Kubernetes.

🔒 Security & Reliability

Good understanding of cloud security best practices to ensure safe and resilient systems.

🔁 CI/CD & Observability

Experience with CI/CD pipelines, such as Bitbucket Pipelines or GitHub Actions, and observability tools like OpenTelemetry and Datadog or similar.

Infrastructure as Code

Proficient with Terraform or Pulumi for managing infrastructure.

MLOps & LLMOps

Familiarity with machine learning operations is a plus. Experience supporting ML workflows and managing the model lifecycle using tools like MLflow or SageMaker is beneficial, but not required.
An understanding of concepts such as model versioning, experiment tracking, feature stores, scalable deployment, and the unique challenges of LLM (Large Language Model) inference, fine-tuning, and performance observability would be an advantage.

🚨 Incident Management

Experience setting up incident processes, participating in on-call rotations, and resolving production issues.

🤝 Collaboration & Enablement

Worked closely with engineering teams to build tailored infrastructure, provide reusable blueprints and self-service tooling, and promote DevOps best practices.

🎁 What We Offer

A fast-moving environment with minimal bureaucracy and quick decision-making
The opportunity to work on cutting-edge AI products and services
A strong focus on high-quality technical solutions
High autonomy and rapid feedback cycles
A great chance to learn how to play poker
Remote-friendly work culture
Unlimited vacation policy
Close collaboration with engineering teams and meaningful contributions to a shared product vision

🎰 This role is part of AceGuardian, a cutting-edge team within A5 Labs. AceGuardian is focused on building advanced AI agents through reinforcement learning, game-solving, fine-tuning, and planning. These AI agents tackle challenges such as anti-cheat detection (including collusion and bots) and optimising gameplay across various games. The team operates in stealth mode and is composed of experts in AI, machine learning, and game development, all working together to revolutionise both gaming and real-world problem-solving. By joining this team, you’ll contribute to innovative projects that push the boundaries of AI in the gaming industry while working alongside some of the brightest minds in the field.

Senior DevOps Engineer (Remote)