Awesome DevOps AI 

A curated list of AI tools, agents, MCP servers, and resources for DevOps, SRE, and Platform Engineering.
The AI revolution is transforming how infrastructure is built, monitored, and operated. This list tracks every meaningful tool at the intersection of AI and DevOps, from coding agents that write Terraform to AI-powered incident response that pages you with a root cause already identified.
Why this list? Engineers are adopting AI tooling faster than any technology shift in history, but the landscape is fragmented across hundreds of repos, products, and frameworks. This is one place to find them all.
257 tools across 20 categories — updated March 2026. See the Quick Start Guide for role-based recommendations.
If this list is useful, please give it a star to help others find it.
Contents
AI Coding Agents for Infrastructure
AI-powered coding agents that help write, review, and maintain infrastructure code including Terraform, Kubernetes manifests, Dockerfiles, and CI/CD pipelines.
- Aider - Terminal-based AI pair programming that works with any LLM, great for infrastructure repos with git-commit-per-change workflows.
- Amazon Q Developer - AWS-native AI assistant with deep CloudFormation, CDK, and AWS service knowledge.
- Clanker - Autonomous systems engineering CLI agent for any cloud environment including AWS, GCP, and Cloudflare.
- Claude Code - Anthropic’s agentic coding tool that excels at large-scale Terraform refactoring, multi-file Kubernetes manifest generation, and infrastructure debugging.
- Cline - Autonomous AI coding agent for VS Code that runs terminal commands, edits files, and handles complex infrastructure tasks.
- Codex - OpenAI’s autonomous coding agent with cloud sandbox execution, strong at generating IaC from natural language descriptions.
- Continue - Open-source AI code assistant for VS Code and JetBrains that supports custom models and local LLMs for air-gapped infrastructure work.
- Cursor - AI-first IDE with inline Terraform and YAML completions and multi-file editing capabilities.
- Devin - Autonomous AI software engineer by Cognition that can independently handle full infrastructure workflows from planning to deployment.
- GitHub Copilot - AI pair programmer integrated into VS Code, JetBrains, and CLI, with Copilot Workspace for multi-file infrastructure changes.
- JetBrains AI - AI assistant built into IntelliJ-based IDEs with context-aware infrastructure code completions and explanations.
- Replit Agent - AI agent that builds and deploys full-stack applications from natural language, useful for rapid prototyping of infrastructure dashboards.
- Sourcegraph Cody - AI coding assistant with full codebase context, ideal for navigating large monorepos with shared infrastructure modules.
- Tabnine - AI code completion that runs locally or in the cloud with enterprise-grade privacy for sensitive infrastructure code.
- Windsurf - AI IDE by Codeium with agentic Cascade mode for multi-step infrastructure tasks.
- Void - Open-source AI code editor forked from VS Code that supports local and remote LLMs for privacy-first infrastructure development.
- Zed AI - High-performance editor with built-in AI assistant, inline generation, and terminal integration for infrastructure workflows.
AI-Powered Kubernetes
AI tools specifically designed for Kubernetes cluster management, troubleshooting, and operations.
- Glasskube - Open-source Kubernetes package manager with AI-assisted package discovery and dependency resolution.
- Headlamp - Extensible Kubernetes web UI with plugin architecture that supports AI-powered cluster visualization and management.
- K8sGPT - AI-powered Kubernetes troubleshooting and diagnostics, a CNCF Sandbox project that scans clusters for issues and explains them in plain English.
- Kagent - CNCF Sandbox AI agent framework for DevOps and platform engineers to run agents inside Kubernetes clusters.
- KAITO - Kubernetes AI Toolchain Operator that simplifies LLM inference and fine-tuning workloads on clusters, a CNCF Sandbox project.
- Komodor - Kubernetes troubleshooting platform with AI-driven root cause analysis, change tracking, and automated remediation workflows.
- kubectl-ai - Google Cloud kubectl plugin that uses LLMs to generate and apply Kubernetes manifests from natural language.
- Kubernetes ChatGPT Bot - ChatGPT integration for Kubernetes troubleshooting via Slack notifications.
- Kubeshark - API traffic analyzer for Kubernetes providing real-time visibility into cluster network traffic for AI-powered anomaly detection.
- Robusta - Kubernetes monitoring and troubleshooting platform with AI root cause analysis and Holmes AI integration for automated diagnostics.
- ValidKube - Open-source tool that validates, cleans, and secures Kubernetes manifests in one interface.
- vCluster - Virtual Kubernetes clusters for development and testing that enable isolated AI workload experimentation.
Tools that bring AI capabilities to Infrastructure as Code workflows.
- Atmos - Universal tool for DevOps workflows that provides a framework for managing Terraform configurations at scale with AI-assisted component discovery.
- AWS Terraform MCP Server - AWS MCP server with Terraform best practices, Checkov security scanning, and AWS provider documentation search.
- Brainboard - Visual Terraform designer with AI-powered architecture generation from cloud diagrams.
- Env0 - Self-service infrastructure platform with AI-assisted policy enforcement, cost estimation, and drift detection for Terraform.
- Firefly - Cloud asset management that uses AI to detect drift, generate Terraform from existing resources, and manage IaC coverage gaps.
- Infracost - Cloud cost estimates for Terraform in pull requests, supporting 1,100+ AWS, Azure, and GCP resources.
- OpenTofu - Open-source Terraform fork maintained by the Linux Foundation, the community-driven foundation for AI-enhanced IaC workflows.
- Pulumi AI - Generates Pulumi IaC programs from natural language using AI, supporting AWS, Azure, GCP, and Kubernetes.
- Spacelift AI - AI-enhanced IaC management platform with drift detection, policy-as-code, and automated remediation.
- Terraform Copilot Prompts - GitHub Copilot prompts for creating and converting Terraform configurations across cloud providers.
- Terrascan - Static code analyzer for IaC that detects compliance and security violations across Terraform, Kubernetes, and Helm.
- Terramate - Orchestration and code generation tool for Terraform that simplifies managing complex multi-stack infrastructure.
- tfswitch - Command-line tool to switch between different versions of Terraform essential for managing multi-version IaC pipelines.
AI Incident Response and Troubleshooting
AI systems that detect, investigate, and remediate production incidents.
- BigPanda - AIOps platform for event correlation, automated root cause analysis, and intelligent incident management across hybrid environments.
- FireHydrant - Incident management platform with AI-powered retrospective generation, automated status pages, and runbook execution.
- GitHub Agentic Workflows - Run AI agents in GitHub Actions for automated issue triage, CI failure analysis, and PR review.
- HolmesGPT - Agentic AI troubleshooting for Kubernetes and cloud-native environments, a CNCF Sandbox project combining observability telemetry with LLM reasoning.
- IncidentFox - Open-source AI SRE platform for automated incident investigation, hypothesis formation, and fix suggestions with Slack and PagerDuty integration.
- Moogsoft - AIOps platform with AI-driven noise reduction, correlation, and situation awareness for reducing alert fatigue.
- Opsgenie - Incident management with AI-powered alert routing, on-call scheduling, and intelligent escalation by Atlassian.
- PagerDuty AIOps - AI event correlation, noise reduction, and intelligent routing that reduces alert fatigue with ML-based grouping.
- Rootly - AI-powered incident management with automated timelines, AI-generated postmortems, and Slack-native workflows.
- Shoreline - AI-powered incident automation that converts runbooks into automated remediation executing across fleets.
- Tracecat - Open-source AI automation for security and reliability operations with 100+ integrations and sandboxed execution.
AI Monitoring and Observability
AI-enhanced monitoring, alerting, and observability platforms.
- Chronosphere - Cloud-native observability platform with AI-driven data optimization that reduces telemetry costs while preserving critical signals.
- Coralogix - Full-stack observability with AI-powered log analysis, anomaly detection, and cost-effective data management.
- Datadog Bits AI - AI assistant for natural language metric queries, root cause analysis, and automated investigation across infrastructure.
- Dynatrace Davis AI - Causal AI engine for automated root cause analysis, impact assessment, and predictive problem detection.
- Grafana - Open-source monitoring and observability platform that serves as the foundation for AI-powered monitoring workflows.
- Grafana AI - Built-in AI agents for observability including SRE agent for root cause analysis, adaptive telemetry for cost reduction, and AI-assisted query generation.
- Groundcover - eBPF-based observability platform with AI-powered root cause analysis that requires zero instrumentation.
- Metoro Guardian - AI observability copilot combining telemetry and code analysis for accurate root cause identification and auto-generated fix PRs.
- New Relic AI - AI monitoring assistant with natural language querying, anomaly explanation, and intelligent alert correlation.
- Prometheus Operator - Cloud-native monitoring foundation essential for AI-powered alerting pipelines, a CNCF project.
- Splunk AI - AI-powered analytics platform for natural language search, anomaly detection, and predictive insights across IT infrastructure.
- Sumo Logic - Cloud-native machine data analytics with AI-driven log analysis, threat detection, and infrastructure intelligence.
- Thanos - CNCF incubating highly available Prometheus setup with long-term storage and global query view for large-scale monitoring.
- VictoriaMetrics - Fast, cost-effective monitoring solution and time series database compatible with Prometheus and Grafana.
AI Security Scanning
AI-powered security tools for infrastructure, containers, and supply chain.
- Aqua Security - Cloud-native security platform with AI-powered runtime protection, image scanning, and compliance enforcement for containers.
- Checkov - Static analysis for IaC security that scans Terraform, CloudFormation, Kubernetes, Helm, and Dockerfile for misconfigurations.
- Falco - CNCF graduated cloud-native runtime security project for threat detection in containers and Kubernetes.
- GitGuardian - AI-powered secrets detection that scans Git repositories, CI/CD pipelines, and Docker images for exposed credentials.
- Lacework - Cloud security platform with behavioral AI that detects anomalies and threats across cloud workloads without rules.
- MCP-Scan - Open-source tool for analyzing Model Context Protocol security issues and auditing MCP servers for vulnerabilities.
- Orca Security - Agentless cloud security with AI-powered risk prioritization across workloads, configurations, and identities.
- Prisma Cloud - Comprehensive cloud-native application protection platform with AI-driven vulnerability prioritization and compliance.
- Semgrep - Fast open-source static analysis for finding bugs and enforcing code standards across 30+ languages including HCL and YAML.
- Snyk - AI-powered security platform with DeepCode AI engine that scans code, containers, IaC, and AI-generated code in real-time.
- SonarQube - Code quality and security analysis platform with AI-powered code smell detection and vulnerability identification.
- Terraform Sentinel - Policy-as-code framework by HashiCorp that enforces fine-grained, logic-based policies on Terraform infrastructure changes.
- tfsec - Security scanner for Terraform code that checks for security misconfigurations and compliance violations.
- Trivy - Comprehensive open-source vulnerability scanner for containers, IaC, Kubernetes, and code that is fast, accurate, and widely adopted.
- Wiz - Cloud security platform that unifies vulnerability findings with cloud context to prioritize exploitable risks.
- Kyverno - CNCF incubating Kubernetes-native policy engine for validating, mutating, and generating configurations.
- OPA Gatekeeper - Policy controller for Kubernetes based on Open Policy Agent for admission control and audit.
AI Cost Optimization
AI and automation tools for cloud cost management, FinOps, and resource optimization.
- Anodot - AI-powered cloud cost management with autonomous anomaly detection, optimization recommendations, and commitment management.
- CAST AI - AI-powered Kubernetes cost optimization with automated rightsizing, spot instance management, and cluster autoscaling.
- CloudZero - Cloud cost intelligence platform with AI-driven cost allocation, anomaly detection, and unit economics tracking.
- Finout - FinOps platform with AI-powered cost allocation across cloud, Kubernetes, and SaaS that combines billing data with observability.
- Kubecost - Real-time Kubernetes cost monitoring by service, deployment, namespace, and container with cloud billing integration.
- nOps - AWS cost optimization platform with AI-driven rightsizing, commitment management, and automated savings execution.
- OpenCost - CNCF Sandbox project for vendor-neutral, real-time Kubernetes cost monitoring and allocation.
- Spot by NetApp - AI-driven cloud infrastructure optimization using spot instances, autoscaling, and intelligent workload placement.
- Turbonomic - IBM AI-powered application resource management that continuously optimizes compute, storage, and network allocation.
- Vantage - Cloud cost transparency platform with AI-powered recommendations across AWS, Azure, GCP, Kubernetes, and Datadog.
- Komiser - Open-source cloud cost management dashboard that analyzes spending across multi-cloud environments.
MCP Servers for DevOps
Model Context Protocol servers that give AI assistants like Claude, ChatGPT, and Cursor access to DevOps tools and infrastructure.
- Atlassian MCP Server - MCP server for Jira and Confluence integration enabling AI agents to query issues, create tickets, and search documentation.
- AWS MCP Servers - Official AWS MCP server suite covering Terraform, CDK, CloudFormation, Lambda, S3, CloudWatch, ECS, and more.
- Awesome MCP Servers - Comprehensive curated list of all MCP servers across every category.
- Cloudflare MCP Server - Official Cloudflare MCP server for managing Workers, KV, R2, and DNS from AI agents.
- Docker MCP Gateway - Docker-maintained MCP server for container management, image operations, and Docker Compose workflows.
- GitHub MCP Server - Official GitHub MCP server for repos, issues, PRs, Actions, and code search from AI agents.
- Kubernetes MCP Server - MCP server for kubectl operations, pod management, and cluster introspection.
- Linear MCP Server - MCP server for Linear project management enabling AI agents to manage issues, projects, and cycles.
- MCP Reference Servers - Official MCP reference implementations including filesystem, Git, GitHub, PostgreSQL, Puppeteer, and more.
- Sentry MCP Server - Official Sentry MCP server for error tracking, issue search, and event analysis from AI agents.
- Terraform MCP Server - Official HashiCorp MCP server for Terraform module search, provider documentation, and policy enforcement.
- Vercel MCP Server - MCP adapter by Vercel for integrating AI agents with serverless deployment and edge function management.
AI-Powered CI/CD
AI tools that enhance continuous integration and delivery pipelines.
- ArgoCD - CNCF GitOps continuous delivery for Kubernetes that serves as the foundation for AI-driven deployment workflows.
- CircleCI - Cloud CI/CD platform with AI-powered test splitting, flaky test detection, and pipeline optimization insights.
- Codefresh - GitOps CI/CD platform built on Argo with AI-assisted pipeline creation and deployment analytics.
- Dagger - Programmable CI/CD engine that runs pipelines in containers, enabling AI agents to compose and execute build workflows.
- GitLab Duo - AI across the GitLab DevSecOps platform with code suggestions, root cause analysis, vulnerability resolution, and CI/CD pipeline generation.
- Harness AIDA - AI Development Assistant for intelligent pipeline creation, failure analysis, and deployment optimization.
- Mergify - AI-powered merge queue and PR automation with intelligent batch merging and conflict resolution.
- PR-Agent - AI-powered pull request analysis that auto-describes, reviews, improves, and generates tests for GitHub, GitLab, and Bitbucket.
- Tekton - Cloud-native CI/CD building blocks for Kubernetes providing the foundation for AI-orchestrated build and deploy pipelines.
- Trunk - Developer experience platform with AI-powered code quality checks, merge queues, and flaky test management.
- Woodpecker CI - Community fork of Drone CI with a simple pipeline engine, container-native execution, and multi-platform support.
AI Log Analysis and Debugging
AI tools for log analysis, pattern detection, and debugging production systems.
- Axiom - Cloud-native log management with AI-powered query generation, anomaly detection, and unlimited data retention.
-
| Elasticsearch - Foundation for AI-powered log analysis with ES |
QL, vector search, and ML anomaly detection. |
- Grafana Loki - Log aggregation system designed for cloud-native environments that pairs with Grafana AI for intelligent log querying.
- LogAI - Salesforce’s open-source toolkit for AI-powered log analysis with ML algorithms for anomaly detection, clustering, and summarization.
- OpenTelemetry Collector - Vendor-agnostic telemetry collection that serves as the essential pipeline for feeding logs, metrics, and traces to AI analysis tools.
- Parseable - Cloud-native log storage and observability platform built in Rust with AI-powered log analysis and alerting.
- Vector - High-performance observability data pipeline for collecting, transforming, and routing logs, metrics, and traces to AI analysis backends.
- Zebrium - ML-powered root cause analysis from logs that automatically identifies incident root cause without manual queries.
- Fluentd - CNCF graduated unified logging layer for collecting, filtering, and routing logs from any source to any destination.
- Fluent Bit - Fast and lightweight log processor and forwarder for Linux, macOS, and embedded systems built for cloud-native environments.
AI Agent Frameworks for Infrastructure
General-purpose AI agent frameworks with strong infrastructure and DevOps use cases.
- AutoGen - Microsoft’s multi-agent framework supporting infrastructure workflows with tool use, code execution, and human-in-the-loop approvals.
- Claude Agent SDK - Anthropic’s framework for building agentic applications with tool use, orchestration, and guardrails for infrastructure automation.
- CrewAI - Multi-agent orchestration framework for building teams of AI agents that handle complex infrastructure tasks like migration planning.
- Dify - LLM application development platform with agent workflows, RAG, and model management for building custom DevOps chatbots.
- LangChain - Framework for building LLM-powered applications, widely used for building custom DevOps agents with tool integrations.
- LangGraph - Library for building stateful, multi-actor applications with LLMs, ideal for complex infrastructure orchestration workflows.
- n8n - Workflow automation platform with 400+ integrations and AI agent capabilities for low-code DevOps automation.
- OpenAI Agents SDK - OpenAI’s framework for building multi-agent systems with handoffs, guardrails, and tracing for infrastructure automation.
- Semantic Kernel - Microsoft’s SDK for integrating LLMs into applications with plugin architecture ideal for building infrastructure automation agents.
- Temporal - Durable execution platform for orchestrating long-running infrastructure workflows with built-in retry and failure handling.
- Wren AI - Open-source text-to-SQL AI agent that generates SQL queries from natural language for infrastructure analytics and reporting.
AI tools for building internal developer platforms, service catalogs, and self-service infrastructure.
- Backstage - CNCF incubating project by Spotify for building developer portals with service catalogs, templates, and plugin-based extensibility.
- Cortex - Internal developer portal with AI-driven service maturity scorecards, ownership tracking, and reliability standards enforcement.
- Cycloid - Platform engineering solution with AI-powered infrastructure self-service, cost governance, and green IT scoring.
- Humanitec - Platform orchestrator that powers enterprise internal developer platforms with dynamic configuration management.
- Kratix - Open-source framework for building platforms-as-a-product on Kubernetes with composable promise-based abstractions.
- Mia-Platform - Internal developer platform with AI-powered microservice orchestration, API management, and developer self-service.
- OpsLevel - Service ownership platform with AI-powered maturity tracking, dependency mapping, and developer self-service.
- Port - Open internal developer portal with AI-powered software catalog, self-service actions, and scorecards for engineering standards.
- Qovery - Platform that provides production-like environments for developers with AI-assisted deployment and environment management.
- Roadie - Managed Backstage platform with AI-powered scaffolding, TechDocs hosting, and developer productivity insights.
- Upbound - Universal cloud platform built on Crossplane for building internal platforms with declarative infrastructure APIs.
- Score - Open-source workload specification that eliminates configuration drift between local and remote environments.
AI for Database Operations
AI tools for database management, query optimization, and data operations.
- Aiven AI - Managed database platform with AI-powered query optimization, anomaly detection, and automated performance tuning.
- Bytebase - Database DevOps and CI/CD platform with AI-assisted schema review, migration management, and SQL linting.
- CloudNativePG - Kubernetes operator for PostgreSQL that manages the full lifecycle of PostgreSQL clusters with automated failover.
- Metabase - Open-source BI platform with natural language querying that enables non-technical users to explore infrastructure databases.
- OtterTune - AI-powered database optimization that automatically tunes PostgreSQL, MySQL, and MariaDB configurations for performance.
- pganalyze - PostgreSQL performance monitoring with AI-powered query optimization recommendations and index advisor.
- PlanetScale - Serverless MySQL platform with AI-powered schema change management, query insights, and non-blocking deploys.
- SchemaHero - Kubernetes-native database schema management tool that applies declarative schema definitions as migrations.
- Vitess - CNCF graduated database clustering system for horizontal scaling of MySQL, essential for AI workloads needing distributed data.
AI for Networking and Service Mesh
AI tools for network management, service mesh, and traffic engineering.
- Calico - Cloud-native networking and security with AI-enhanced network policy management and threat detection for Kubernetes.
- Cilium - eBPF-based networking, security, and observability for Kubernetes with Hubble UI for AI-powered network flow analysis.
- Consul - Service mesh and service discovery by HashiCorp with intentions-based security and automated traffic management.
- Istio - CNCF graduated service mesh providing traffic management, security, and observability for microservices architectures.
- Linkerd - CNCF graduated ultralight service mesh for Kubernetes with automated mTLS, traffic splitting, and golden metrics.
- Ngrok - Unified ingress platform with AI-powered traffic inspection, policy enforcement, and API gateway capabilities.
- Traefik - Cloud-native application proxy with automatic service discovery, Let’s Encrypt integration, and observability features.
- Envoy - CNCF graduated high-performance edge and service proxy powering the data plane for Istio and other service meshes.
AI for Container Security and Supply Chain
AI tools for container image security, software supply chain, and build verification.
- Chainguard - Secure container images and supply chain tools with zero known CVEs, built for reducing vulnerability remediation.
- Cosign - Container signing and verification tool from the Sigstore project for supply chain security.
- Docker Scout - Docker’s AI-powered supply chain security for image analysis, CVE remediation guidance, and base image recommendations.
- Grype - Fast open-source vulnerability scanner for container images and filesystems with SBOM-based analysis.
- Harbor - CNCF graduated cloud-native registry with vulnerability scanning, image signing, and policy-based image replication.
- Slim.AI - Container optimization platform with AI-powered image analysis, vulnerability reduction through minification.
- Syft - Open-source SBOM generator for container images and filesystems supporting multiple output formats.
- Wolfi - Community Linux distribution designed for building minimal container images with automated CVE patching.
AI for Chaos Engineering and Reliability
AI tools for chaos engineering, resilience testing, and reliability validation.
- Chaos Mesh - CNCF incubating cloud-native chaos engineering platform for Kubernetes with fault injection and workflow orchestration.
- Gremlin - Enterprise chaos engineering platform with AI-powered reliability recommendations and targeted failure testing.
- k6 - Modern load testing tool by Grafana with scriptable scenarios and AI-assisted test generation for infrastructure endpoints.
- Litmus - CNCF incubating chaos engineering framework for Kubernetes with a hub of prebuilt experiments and GitOps integration.
- Steadybit - Chaos engineering platform with AI-assisted experiment design and automated reliability validation.
- Testkube - Kubernetes-native test orchestration framework for running any testing tool inside clusters with CI/CD integration.
- Toxiproxy - TCP proxy by Shopify for simulating network conditions and testing system resilience to network failures.
AI for Cloud Migration and Modernization
AI tools that assist with cloud migration planning, execution, and application modernization.
- AWS Application Discovery Service - Automated discovery and planning for cloud migration with dependency mapping and utilization analysis.
- AWS Migration Hub - Central hub for tracking migrations across multiple AWS and partner tools with AI-powered progress tracking.
- Azure Migrate - Unified migration platform with AI-powered assessment, server migration, and database modernization tools.
- Google Cloud Migrate - Comprehensive migration platform with AI-driven assessment, workload discovery, and total cost of ownership analysis.
- Konveyor - Open-source migration toolkit for modernizing applications to Kubernetes with AI-assisted code transformation.
- Zerto - Disaster recovery and workload migration platform with AI-powered resilience and continuous data protection.
AI for GitOps
AI tools for GitOps workflows, declarative infrastructure, and continuous reconciliation.
- Flux - CNCF graduated GitOps toolkit for Kubernetes with automated image updates, Helm releases, and Kustomize reconciliation.
- Helm - CNCF graduated Kubernetes package manager essential for templating and deploying AI workloads and infrastructure components.
- Kargo - Continuous promotion and lifecycle orchestrator for Kubernetes applications across environments with GitOps principles.
- Kustomize - Kubernetes configuration customization tool that enables declarative management of manifests without template engines.
- Weave GitOps - Enterprise GitOps platform with progressive delivery, policy enforcement, and multi-cluster management.
- Crossplane - CNCF incubating project for building cloud-native control planes with declarative infrastructure APIs and GitOps workflows.
System Prompt and Config Templates
Ready-to-use AI agent configurations for infrastructure repositories.
- Awesome CursorRules - Community-curated .cursorrules files for various project types including infrastructure and DevOps.
- ChatGPT Prompts for DevOps - Community-curated prompt library that includes DevOps and system administration automation prompts.
- Claude Code DevOps Toolkit - Production-tested CLAUDE.md files, curated DevOps prompts, automation scripts, and project configs for infrastructure workflows.
- Free AI and DevOps Tools - Collection of 41 free browser-based AI and DevOps tools including prompt builder, system prompt generator, and token counter.
- GitHub Copilot Custom Instructions - Official guide for creating copilot-instructions.md to customize AI behavior per repository.
- Awesome Claude Code - Anthropic’s official Claude Code repository with documentation, examples, and tips for infrastructure workflows.
- DevOps GPT Prompts - Comprehensive prompt engineering guide with patterns applicable to DevOps automation.
Learning Resources
Courses, certifications, articles, and guides on AI for DevOps.
Articles and Guides
Books
Certifications
- AWS AI Practitioner - Foundational AI and ML certification with cloud infrastructure context.
- CKA and KCNA - CNCF Kubernetes certifications that provide essential foundation before adding AI-powered Kubernetes tools.
- Google Cloud Professional ML Engineer - ML engineering certification focused on GCP infrastructure.
- Terraform Associate - HashiCorp IaC certification providing prerequisite knowledge for AI-assisted Terraform workflows.
Podcasts
- Kubernetes Podcast from Google - Weekly podcast covering Kubernetes ecosystem news, interviews, and AI tooling developments.
- Ship It! - Podcast about building and shipping software with coverage of AI-enhanced DevOps workflows.
- The CloudCast - Weekly cloud technology podcast covering AI, DevOps, and infrastructure trends.
Community and Newsletters
Communities, forums, and newsletters covering AI and DevOps.
- CNCF Slack - Cloud Native Computing Foundation community with channels for K8sGPT, HolmesGPT, and AI-native projects.
- DevOps Weekly - Weekly newsletter covering DevOps tooling and practices including AI adoption.
- KubeWeekly - Official CNCF newsletter covering Kubernetes ecosystem updates and AI-powered tooling announcements.
- Platformers Community - Platform engineering community with discussions on AI-powered developer experience and internal platforms.
- r/devops - Reddit community with 780k+ members actively discussing AI tool adoption in DevOps workflows.
- r/Kubernetes - Reddit community with 260k+ members discussing K8sGPT, KAITO, and AI-powered cluster management.
- r/Terraform - Active Reddit community discussing AI-assisted IaC and Terraform automation.
- The New Stack - Publication covering cloud-native, Kubernetes, and AI infrastructure developments.
- TLDR DevOps - Daily DevOps newsletter with AI and automation coverage.
Contributing
Contributions are welcome! Please read the contribution guidelines first. We especially welcome:
- New AI tools for DevOps workflows.
- Corrections to descriptions or broken links.
- New categories as the ecosystem evolves.
Join the discussion to suggest tools or ask questions.
Author
Hammad Haqqani - DevOps Architect and Cloud Engineer
Support
If you find this useful, consider buying me a coffee!
