awesome-devops-ai

AI DevOps Quick Start Guide

Not sure where to start? This guide maps the landscape and recommends tools by role.

AI DevOps Landscape at a Glance

Layer Open Source Commercial CNCF Projects
Coding Agents Aider, Cline, Continue Claude Code, Cursor, Copilot -
Kubernetes K8sGPT, kubectl-ai, Headlamp Komodor, Robusta K8sGPT, Kagent, KAITO
IaC and Terraform OpenTofu, Infracost, Checkov Spacelift, Env0, Firefly -
Incident Response HolmesGPT, IncidentFox, Tracecat Rootly, PagerDuty AIOps HolmesGPT
Monitoring Grafana, Prometheus Datadog, Dynatrace, Splunk Prometheus
Security Trivy, Falco, Checkov, Semgrep Snyk, Wiz, Prisma Cloud Falco
Cost and FinOps OpenCost, Kubecost CAST AI, Vantage, CloudZero OpenCost
MCP Servers MCP Reference, Kubernetes MCP AWS MCP, GitHub MCP -
CI/CD ArgoCD, Tekton, Dagger GitLab Duo, Harness ArgoCD, Tekton
Platform Engineering Backstage, Kratix Port, Humanitec, Cortex Backstage
GitOps Flux, Kustomize, Helm Weave GitOps, Codefresh Flux, Helm
Chaos Engineering Chaos Mesh, Litmus Gremlin, Steadybit Chaos Mesh, Litmus

Quick Start by Role

DevOps Engineer Starting with AI

  1. Daily IaC work: Start with Claude Code or GitHub Copilot for writing Terraform, Kubernetes manifests, and Dockerfiles.
  2. Cluster troubleshooting: Add K8sGPT to scan clusters and explain issues in plain English.
  3. Cost visibility: Use Infracost for cost estimates in Terraform PRs.

SRE Focused on Reliability

  1. Incident investigation: HolmesGPT combines observability telemetry with LLM reasoning for root cause analysis.
  2. Observability: Grafana AI provides AI-assisted query generation and SRE agents.
  3. Resilience testing: Chaos Mesh for fault injection in Kubernetes.

Platform Engineer Building Self-Service

  1. Developer portal: Backstage for service catalogs and templates.
  2. GitOps delivery: ArgoCD for continuous deployment to Kubernetes.
  3. Continuous reconciliation: Flux for automated image updates and Helm releases.

Security Engineer

  1. Vulnerability scanning: Trivy for containers, IaC, and code.
  2. Runtime security: Falco for threat detection in containers.
  3. Supply chain: Docker Scout for image analysis and CVE remediation.

FinOps and Cost Optimization

  1. Kubernetes costs: OpenCost for vendor-neutral cost monitoring.
  2. Terraform costs: Infracost for cost estimates in pull requests.
  3. Multi-cloud visibility: Vantage for recommendations across cloud providers.

Building AI Agents for Infrastructure

  1. Agent framework: LangChain or CrewAI for building custom DevOps agents.
  2. Tool integrations: Explore MCP Servers for connecting AI to infrastructure tools.
  3. Orchestration: Temporal for durable execution of long-running workflows.