T
Vaga AtivaTechBiz Global GmbH

Senior AI DevOps / LLMOps

Baden-Baden

Sobre a vaga

At TechBiz Global, we are providing recruitment service to our TOP clients from our portfolio. We are currently seeking an Senior AI DevOps / LLMOps specialist to join one of our clients ' teams. If you're looking for an exciting opportunity to grow in a innovative environment, this could be the perfect fit for you.

Key Responsibilities

Automation of Build-to-Production

  • Design and implement robust CI/CD pipelines tailored for AI, covering model weights,

dataset versioning, and application code.

  • Develop specialized workflows for PromptOps, ensuring that system prompts are

version-controlled, tested for regressions, and deployed with the same rigor as traditional

code.

-Automate the deployment of Agentic workflows, managing the complexities of stateful

AI interactions and multi-agent handoffs.

  1. AI Infrastructure as Code (IaC)
  • Provision and manage high-performance compute environments (GPU clusters, TPU

pods) using Terraform, Pulumi, or Ansible.

  • Define and enforce Policy-as-Code for AI endpoints to ensure compliance with security,

cost-usage limits, and data residency requirements.

  • Maintain a consistent environment across Hybrid Infrastructure, ensuring seamless

parity between On-Premises development and Cloud production.

  1. Safe Experimentation & Controlled Releases
  • Architect Progressive Delivery strategies for AI, including Canary releases, Blue-Green

deployments, and Shadowing (where new models run in parallel with production to

compare outputs).

  • Build “Evaluation-in-the-Loop” gates within the pipeline to automatically test for bias,

hallucination, and performance degradation before a release.

  • Implement A/B testing frameworks specifically designed for LLM outputs and agentic

behavior.

  1. Monitoring & Observability
  • Establish deep observability into Inference Endpoints, tracking metrics like tokens-per-

second, latency, and drift in model accuracy.

-Integrate feedback loops that capture production “edge cases” to feed back into the

training and fine-tuning pipelines.

Must-Have Technical Skills:

-Orchestration: Advanced Kubernetes (K8s) skills, specifically with KubeFlow, Ray, or

NVIDIA Triton.

-CI/CD & IaC: Expertise in GitHub Actions/GitLab CI, and Terraform or Pulumi.

  • AI Tooling: Experience with Weights & Biases, MLflow, LangSmith, or Arize

Phoenix.

-Hardware: Understanding of GPU virtualization, CUDA drivers, and on-premises

hardware management.

-Security: Familiarity with Open Policy Agent (OPA) and secret management (Vault).

Experience:

  • 10+ years in DevOps, SRE, or Cloud Engineering.

  • 2+ years of hands-on experience in MLOps or LLMOps, specifically moving LLMs

from notebook to production.

-Proven experience managing Hybrid Cloud environments (e.g., AWS/Azure + Private

Data Center).

Find Jobs in Germany on Arbeitnow

Stack / Tags

GoAWSAzureKubernetesDevOpsTerraformCI/CDAI

Termômetro da Vaga

Seja o primeiro a avaliar esta vaga.

Faça login para avaliar esta vaga.