devops-interview-handbook

Architecture Exercise: Design CI/CD Pipeline

Problem Statement

Design a comprehensive CI/CD pipeline for a microservices architecture with 20+ services, multiple environments (dev, staging, prod), and strict security and compliance requirements.

Requirements

Functional Requirements

  1. Build and Test
    • Build Docker images for each service
    • Run unit tests
    • Run integration tests
    • Run security scans
    • Generate test reports
  2. Deployment
    • Deploy to dev automatically
    • Deploy to staging after approval
    • Deploy to production with manual approval
    • Support blue-green and canary deployments
    • Rollback capability
  3. Quality Gates
    • Code quality checks (linting, formatting)
    • Test coverage thresholds
    • Security vulnerability scanning
    • Infrastructure validation
    • Performance testing (staging)

Non-Functional Requirements

  1. Speed: Pipeline should complete in < 15 minutes for typical changes
  2. Reliability: < 1% pipeline failure rate
  3. Security: No secrets in code, signed artifacts, audit logs
  4. Scalability: Support 100+ commits per day
  5. Cost: Optimize for cost-effectiveness
  6. Compliance: SOC 2, PCI DSS (if handling payments)

Constraints and Assumptions

Constraints

Assumptions

Reference Architecture

Pipeline Flow

Developer Push
     |
     v
[Source Control] GitLab/GitHub
     |
     v
[Trigger Pipeline]
     |
     ├──> [Lint & Format] (2 min)
     |         |
     |         v
     |    [Security Scan] (3 min)
     |         |
     |         v
     |    [Build Docker Image] (5 min)
     |         |
     |         v
     |    [Unit Tests] (5 min) ──┐
     |         |                 |
     |         v                 |
     |    [Integration Tests] (8 min) ──┐
     |         |                       |
     |         v                       |
     |    [Test Reports]              |
     |         |                       |
     |         v                       |
     |    [Quality Gates] ────────────┘
     |         |
     |         v
     |    [Push to ECR] (1 min)
     |         |
     |         v
     ├──> [Deploy to Dev] (Auto) (2 min)
     |         |
     |         v
     |    [Smoke Tests] (2 min)
     |         |
     |         v
     ├──> [Deploy to Staging] (Manual Approval) (2 min)
     |         |
     |         v
     |    [E2E Tests] (10 min)
     |         |
     |         v
     |    [Performance Tests] (5 min)
     |         |
     |         v
     └──> [Deploy to Prod] (Manual Approval) (2 min)
               |
               v
          [Post-Deploy Verification] (2 min)

Component Breakdown

1. Source Control and Triggers

Component: GitLab / GitHub

Configuration:

Pipeline Triggers:

2. Lint and Format Stage

Purpose: Code quality checks

Tools:

Configuration:

lint:
  stage: validate
  script:
    - npm run lint
    - npm run format:check
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request"
    - if: $CI_COMMIT_BRANCH == "main"
  allow_failure: false

Quality Gates:

3. Security Scanning Stage

Purpose: Detect vulnerabilities early

Tools:

Configuration:

security:scan:
  stage: security
  script:
    - snyk test --severity-threshold=high
    - trivy fs --severity HIGH,CRITICAL .
    - git-secrets --scan
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request"
  allow_failure: false  # Fail on high/critical

Quality Gates:

4. Build Stage

Purpose: Build Docker images

Configuration:

build:
  stage: build
  script:
    - docker build
        --cache-from $CI_REGISTRY_IMAGE:latest
        --cache-from $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
        -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
        -t $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
        .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
  only:
    changes:
      - "services/service-name/**/*"
  cache:
    key: docker-$CI_COMMIT_REF_SLUG
    paths:
      - .docker-cache/

Optimizations:

5. Test Stage

Purpose: Run automated tests

Unit Tests:

test:unit:
  stage: test
  script:
    - npm ci --cache .npm --prefer-offline
    - npm run test:unit -- --coverage --reporter junit
  coverage: '/Lines\s*:\s*(\d+\.\d+)%/'
  artifacts:
    reports:
      junit: test-results.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/
      - .npm/

Integration Tests:

test:integration:
  stage: test
  services:
    - postgres:13
    - redis:6
  script:
    - npm ci
    - npm run test:integration
  only:
    - main
    - merge_requests

Quality Gates:

6. Deploy to Dev

Purpose: Automatic deployment to development

Configuration:

deploy:dev:
  stage: deploy
  environment:
    name: development
    url: https://dev.example.com
  script:
    - kubectl config use-context dev
    - kubectl set image deployment/$SERVICE_NAME \
        $SERVICE_NAME=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - kubectl rollout status deployment/$SERVICE_NAME
  only:
    - main
    - develop
  when: on_success

Features:

7. Deploy to Staging

Purpose: Pre-production testing

Configuration:

deploy:staging:
  stage: deploy
  environment:
    name: staging
    url: https://staging.example.com
  script:
    - kubectl config use-context staging
    - kubectl set image deployment/$SERVICE_NAME \
        $SERVICE_NAME=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - kubectl rollout status deployment/$SERVICE_NAME
  when: manual
  only:
    - main

Features:

8. Deploy to Production

Purpose: Production deployment

Configuration:

deploy:production:
  stage: deploy
  environment:
    name: production
    url: https://example.com
  script:
    - kubectl config use-context production
    # Blue-green deployment
    - ./scripts/blue-green-deploy.sh $SERVICE_NAME $CI_COMMIT_SHA
    - kubectl rollout status deployment/$SERVICE_NAME-green
    # Switch traffic
    - kubectl patch service $SERVICE_NAME \
        -p '{"spec":{"selector":{"version":"green"}}}'
    # Verify
    - ./scripts/health-check.sh
    # Cleanup old deployment after 1 hour
    - sleep 3600
    - kubectl delete deployment $SERVICE_NAME-blue
  when: manual
  only:
    - main
    - tags

Features:

9. Post-Deployment Verification

Purpose: Verify deployment success

Configuration:

verify:production:
  stage: verify
  script:
    - ./scripts/smoke-tests.sh
    - ./scripts/check-metrics.sh
    - ./scripts/verify-logs.sh
  when: on_success
  only:
    - main

Checks:

Discussion Points

Deployment Strategies

1. Blue-Green Deployment

How it Works:

Pros:

Cons:

2. Canary Deployment

How it Works:

Pros:

Cons:

3. Rolling Deployment

How it Works:

Pros:

Cons:

Security Considerations

1. Secrets Management

2. Artifact Signing

3. Access Control

4. Network Security

Cost Optimization

Strategies:

Estimated Costs:

Scalability Considerations

For 100+ Commits/Day:

For 20+ Services:

Monitoring and Observability

Pipeline Metrics:

Deployment Metrics:

Tools:

Alternative Approaches

1. GitOps (ArgoCD/Flux)

2. Spinnaker

3. Jenkins

Implementation Phases

Phase 1: Basic Pipeline

Phase 2: Quality Gates

Phase 3: Multi-Environment

Phase 4: Advanced Features

Key Takeaways

  1. Automation: Automate everything possible
  2. Security: Security scanning at every stage
  3. Quality Gates: Fail fast on issues
  4. Speed: Optimize for fast feedback
  5. Reliability: Test thoroughly before production
  6. Monitoring: Track metrics and improve
  7. Documentation: Document processes and runbooks

Follow-up Questions

  1. How do you handle database migrations in CI/CD?
  2. How do you test infrastructure changes?
  3. How do you handle secrets in pipelines?
  4. How do you implement feature flags?
  5. How do you handle rollbacks?