Skip to content

HashiCorp Nomad - Orchestrateur Simple et Efficace

Introduction

HashiCorp Nomad est un orchestrateur de workloads flexible qui se positionne comme une alternative plus simple à Kubernetes pour de nombreux cas d'usage. Contrairement à K8s, Nomad peut orchestrer non seulement des conteneurs, mais aussi des applications natives, des VMs et des tâches batch.

Philosophie Nomad

"Simple, flexible et production-ready dès l'installation - sans la complexité de Kubernetes."

Pourquoi Nomad plutôt que Kubernetes ?

Avantages de Nomad

  • Installation : Binaire unique, aucune dépendance
  • Configuration : Fichiers HCL simples et lisibles
  • Courbe d'apprentissage : Beaucoup plus accessible
  • Maintenance : Moins de composants à gérer
  • Multi-workload : Conteneurs, binaires, VMs, Java, etc.
  • Multi-platform : Linux, Windows, macOS
  • Multi-cloud : AWS, Azure, GCP, on-premise
  • Hybrid : Mix conteneurs/VMs dans même cluster
  • Léger : Consommation mémoire/CPU réduite
  • Rapide : Déploiements plus rapides que K8s
  • Efficace : Bin packing intelligent des ressources
  • Scalable : Jusqu'à 10k nodes facilement
  • Stable : Moins de breaking changes
  • Secure : TLS mutuel natif, ACLs intégrées
  • Observable : Métriques et logs natifs
  • Resilient : Auto-healing et rolling updates

Comparaison détaillée

Critère Nomad Kubernetes Docker Swarm
Complexité ⭐⭐ ⭐⭐⭐⭐⭐
Flexibilité ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐
Écosystème ⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐
Performance ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐
Multi-workload ⭐⭐⭐⭐⭐ ⭐⭐
Learning curve ⭐⭐ ⭐⭐⭐⭐⭐

Architecture Nomad

graph TB
    subgraph "Nomad Cluster"
        subgraph "Server Nodes (3+)"
            Leader["Leader Server<br/>🎯 Scheduler"]
            Server2["Server 2<br/>🔄 Follower"]
            Server3["Server 3<br/>🔄 Follower"]

            Leader -.->|Raft Consensus| Server2
            Leader -.->|Raft Consensus| Server3
        end

        subgraph "Client Nodes"
            Client1["Client Node 1<br/>🖥️ Worker"]
            Client2["Client Node 2<br/>🖥️ Worker"]
            Client3["Client Node 3<br/>🖥️ Worker"]
        end

        subgraph "Jobs & Allocations"
            Job1["Web App Job<br/>📦 3 replicas"]
            Job2["API Job<br/>🔌 2 replicas"]
            Job3["Worker Job<br/>⚙️ 5 replicas"]
        end
    end

    subgraph "HashiCorp Stack Integration"
        Consul["Consul<br/>🔍 Service Discovery"]
        Vault["Vault<br/>🔐 Secrets Management"]
        Terraform["Terraform<br/>🏗️ Infrastructure"]
    end

    %% Connections
    Leader --> Client1
    Leader --> Client2
    Leader --> Client3

    Client1 --> Job1
    Client2 --> Job2
    Client3 --> Job3

    %% Stack integration
    Leader -.->|Service Registration| Consul
    Client1 -.->|Secret Injection| Vault
    Terraform -.->|Provision| Leader

    classDef server fill:#4f46e5,stroke:#3730a3,color:#fff
    classDef client fill:#10b981,stroke:#047857,color:#fff
    classDef job fill:#f59e0b,stroke:#d97706,color:#000
    classDef stack fill:#ef4444,stroke:#dc2626,color:#fff

    class Leader,Server2,Server3 server
    class Client1,Client2,Client3 client
    class Job1,Job2,Job3 job
    class Consul,Vault,Terraform stack

Installation & Configuration

Installation Nomad

# Téléchargement
wget https://releases.hashicorp.com/nomad/1.7.2/nomad_1.7.2_linux_amd64.zip
unzip nomad_1.7.2_linux_amd64.zip
sudo mv nomad /usr/local/bin/

# Vérification
nomad version

# Autocomplétion
nomad -autocomplete-install
# Run temporaire pour test
docker run -d --name nomad-dev \
  -p 4646:4646 \
  hashicorp/nomad:latest agent -dev

# Accès web UI : http://localhost:4646
# /etc/systemd/system/nomad.service
[Unit]
Description=Nomad
Documentation=https://www.nomadproject.io/
Wants=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/nomad.d/nomad.hcl

[Service]
Type=notify
User=nomad
Group=nomad
ExecStart=/usr/local/bin/nomad agent -config=/etc/nomad.d/
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

Configuration Server

/etc/nomad.d/server.hcl
# Configuration Nomad Server
datacenter = "dc1"
data_dir   = "/opt/nomad/data"
log_level  = "INFO"
node_name  = "nomad-server-1"
bind_addr  = "0.0.0.0"

server {
  enabled          = true
  bootstrap_expect = 3  # Nombre de servers dans le cluster

  # Encrypt gossip communications
  encrypt = "cg8StVXbQJ0gPvMd9o7yrg=="

  # Server join configuration
  server_join {
    retry_join = ["nomad-server-1:4648", "nomad-server-2:4648", "nomad-server-3:4648"]
  }
}

# ACLs (optionnel mais recommandé)
acl {
  enabled = true
}

# TLS Configuration (production)
tls {
  http = true
  rpc  = true

  ca_file   = "/etc/nomad.d/certs/ca.pem"
  cert_file = "/etc/nomad.d/certs/server.pem"
  key_file  = "/etc/nomad.d/certs/server-key.pem"

  verify_server_hostname = true
  verify_https_client    = true
}

# Integration Consul
consul {
  address = "127.0.0.1:8500"

  # Auto-advertise services
  auto_advertise      = true
  server_auto_join    = true
  client_auto_join    = true

  # Service tags
  tags = ["nomad", "server"]
}

# Métriques
telemetry {
  collection_interval = "10s"
  disable_hostname    = true
  prometheus_metrics  = true
  publish_allocation_metrics = true
  publish_node_metrics       = true
}

# Web UI
ui_config {
  enabled = true

  # Consul/Vault integration in UI
  consul {
    ui_url = "http://consul.service.consul:8500/ui"
  }

  vault {
    ui_url = "http://vault.service.consul:8200/ui"
  }
}

Configuration Client

/etc/nomad.d/client.hcl
# Configuration Nomad Client
datacenter = "dc1"
data_dir   = "/opt/nomad/data"
log_level  = "INFO"
node_name  = "nomad-client-1"
bind_addr  = "0.0.0.0"

client {
  enabled = true

  # Server addresses to join
  servers = ["nomad-server-1:4647", "nomad-server-2:4647", "nomad-server-3:4647"]

  # Node configuration
  node_class = "compute"

  # Metadata for job constraints
  meta {
    "type" = "compute"
    "zone" = "us-west-1a"
    "instance_type" = "m5.large"
  }

  # Resource configuration
  reserved {
    cpu    = 500    # MHz reserved for system
    memory = 512    # MB reserved for system
    disk   = 1024   # MB reserved for system
  }

  # Network configuration
  network_interface = "eth0"

  # Host volumes
  host_volume "docker-sock" {
    path      = "/var/run/docker.sock"
    read_only = true
  }

  host_volume "logs" {
    path      = "/var/log"
    read_only = false
  }
}

# Plugin configuration
plugin "docker" {
  config {
    allow_privileged = false
    allow_caps = ["audit_write", "chown", "dac_override"]

    # Resource limits
    gc {
      image       = true
      image_delay = "10m"
      container   = true
    }

    # Volume mounts
    volumes {
      enabled = true
    }
  }
}

plugin "raw_exec" {
  config {
    enabled = false  # Disabled by default for security
  }
}

# Consul integration
consul {
  address = "127.0.0.1:8500"

  auto_advertise = true
  client_auto_join = true

  tags = ["nomad", "client"]
}

# Vault integration
vault {
  enabled = true
  address = "http://vault.service.consul:8200"

  # Task identity
  task_token_ttl = "1h"
  create_from_role = "nomad-cluster"
}

Jobs & Workloads

Job Web Application

webapp.nomad
job "web-app" {
  datacenters = ["dc1"]
  type        = "service"

  # Update strategy
  update {
    max_parallel      = 2
    min_healthy_time  = "10s"
    healthy_deadline  = "3m"
    progress_deadline = "10m"
    auto_revert       = true
    canary            = 2
  }

  group "frontend" {
    count = 3

    # Networking
    network {
      port "http" {
        to = 8080
      }
    }

    # Service discovery
    service {
      name = "web-app"
      port = "http"

      tags = [
        "frontend",
        "traefik.enable=true",
        "traefik.http.routers.webapp.rule=Host(`app.example.com`)"
      ]

      check {
        type     = "http"
        path     = "/health"
        interval = "10s"
        timeout  = "2s"
      }
    }

    # Restart policy
    restart {
      attempts = 3
      interval = "5m"
      delay    = "25s"
      mode     = "fail"
    }

    # Task definition
    task "web" {
      driver = "docker"

      config {
        image = "nginx:alpine"
        ports = ["http"]

        mount {
          type   = "bind"
          source = "local/nginx.conf"
          target = "/etc/nginx/nginx.conf"
        }
      }

      # Template configuration
      template {
        data = <<EOH
events {
    worker_connections 1024;
}

http {
    upstream backend {
        {{range service "api"}}
        server {{.Address}}:{{.Port}};
        {{end}}
    }

    server {
        listen 8080;

        location /api/ {
            proxy_pass http://backend/;
        }

        location /health {
            return 200 "OK";
        }
    }
}
EOH
        destination = "local/nginx.conf"
        change_mode = "restart"
      }

      # Resources
      resources {
        cpu    = 100  # MHz
        memory = 128  # MB
      }

      # Environment variables from Vault
      vault {
        policies = ["web-app"]
      }

      template {
        data = <<EOH
{{with secret "secret/web-app"}}
API_KEY="{{.Data.api_key}}"
DB_PASSWORD="{{.Data.db_password}}"
{{end}}
EOH
        destination = "secrets/app.env"
        env         = true
      }
    }
  }
}

Job API Backend

api.nomad
job "api" {
  datacenters = ["dc1"]
  type        = "service"

  group "backend" {
    count = 2

    network {
      port "api" {
        to = 3000
      }
    }

    service {
      name = "api"
      port = "api"

      tags = ["backend", "api"]

      check {
        type     = "http"
        path     = "/health"
        interval = "30s"
        timeout  = "5s"
      }
    }

    task "api-server" {
      driver = "docker"

      config {
        image = "node:18-alpine"
        ports = ["api"]

        command = "node"
        args    = ["server.js"]

        work_dir = "/app"

        mount {
          type   = "bind"
          source = "local/app"
          target = "/app"
        }
      }

      # Artifact download
      artifact {
        source      = "https://github.com/company/api/archive/v1.2.3.tar.gz"
        destination = "local/app"
        options {
          checksum = "sha256:abc123..."
        }
      }

      # Database connection
      template {
        data = <<EOH
{{with service "postgres"}}
{{with index . 0}}
DATABASE_URL="postgres://user:pass@{{.Address}}:{{.Port}}/mydb"
{{end}}
{{end}}
NODE_ENV="production"
PORT="3000"
EOH
        destination = "local/app/.env"
        change_mode = "restart"
      }

      resources {
        cpu    = 500
        memory = 512
      }
    }
  }
}

Job Batch Processing

batch-job.nomad
job "data-processing" {
  datacenters = ["dc1"]
  type        = "batch"

  # Parameterized job
  parameterized {
    payload       = "required"
    meta_required = ["input_file", "output_bucket"]
  }

  group "processor" {
    count = 1

    restart {
      attempts = 2
      delay    = "30s"
      mode     = "fail"
    }

    task "process" {
      driver = "docker"

      config {
        image = "python:3.11-slim"

        command = "python"
        args    = ["process.py", "${NOMAD_META_input_file}"]
      }

      # Script artifact
      artifact {
        source      = "s3://my-bucket/scripts/process.py"
        destination = "local/"
      }

      # AWS credentials from Vault
      vault {
        policies = ["data-processor"]
      }

      template {
        data = <<EOH
{{with secret "aws/creds/data-processor"}}
AWS_ACCESS_KEY_ID="{{.Data.access_key}}"
AWS_SECRET_ACCESS_KEY="{{.Data.secret_key}}"
{{end}}
OUTPUT_BUCKET="${NOMAD_META_output_bucket}"
EOH
        destination = "secrets/aws.env"
        env         = true
      }

      resources {
        cpu    = 1000
        memory = 2048
      }
    }
  }
}

Intégration HashiCorp Stack

Consul Service Discovery

consul-integration.nomad
job "microservice" {
  group "app" {
    network {
      port "http" {}
      port "grpc" {}
    }

    # Service principal
    service {
      name = "user-service"
      port = "http"

      tags = [
        "api",
        "version-v1.2.0",
        "traefik.enable=true"
      ]

      meta {
        version = "1.2.0"
        team    = "platform"
      }

      # Health checks multiples
      check {
        name     = "HTTP Health"
        type     = "http"
        path     = "/health"
        interval = "10s"
        timeout  = "3s"
      }

      check {
        name     = "gRPC Health"
        type     = "grpc"
        port     = "grpc"
        interval = "15s"
        timeout  = "3s"
      }
    }

    # Service connect (service mesh)
    service {
      name = "user-service-sidecar-proxy"
      port = "connect-proxy-user-service"

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "database"
              local_bind_port  = 5432
            }

            upstreams {
              destination_name = "auth-service"
              local_bind_port  = 8080
            }
          }
        }
      }
    }

    task "app" {
      driver = "docker"

      config {
        image = "user-service:v1.2.0"
        ports = ["http", "grpc"]
      }

      # Service discovery via DNS
      template {
        data = <<EOH
# Services disponibles via Consul DNS
DATABASE_HOST="database.service.consul"
AUTH_SERVICE_URL="http://auth-service.service.consul:8080"
CACHE_HOSTS="{{range service "redis"}}{{.Address}}:{{.Port}},{{end}}"
EOH
        destination = "local/services.env"
        env         = true
      }
    }
  }
}

Vault Secrets Management

vault-integration.nomad
job "secure-app" {
  group "app" {
    task "web" {
      driver = "docker"

      # Vault policy required
      vault {
        policies = ["app-policy"]

        # Change mode when secrets rotate
        change_mode   = "restart"
        change_signal = "SIGUSR1"
      }

      # Database credentials (dynamic)
      template {
        data = <<EOH
{{with secret "database/creds/app-role"}}
DB_USERNAME="{{.Data.username}}"
DB_PASSWORD="{{.Data.password}}"
{{end}}
EOH
        destination = "secrets/db.env"
        env         = true
      }

      # Static secrets
      template {
        data = <<EOH
{{with secret "secret/app/config"}}
API_KEY="{{.Data.api_key}}"
ENCRYPTION_KEY="{{.Data.encryption_key}}"
{{end}}
EOH
        destination = "secrets/app.env"
        env         = true
      }

      # TLS certificates
      template {
        data = <<EOH
{{with secret "pki/issue/app-role" "common_name=app.service.consul" "ttl=24h"}}
{{.Data.certificate}}
{{end}}
EOH
        destination = "secrets/tls.crt"
        perms       = "400"
      }

      template {
        data = <<EOH
{{with secret "pki/issue/app-role" "common_name=app.service.consul" "ttl=24h"}}
{{.Data.private_key}}
{{end}}
EOH
        destination = "secrets/tls.key"
        perms       = "400"
      }

      config {
        image = "secure-app:latest"

        mount {
          type   = "bind"
          source = "secrets/tls.crt"
          target = "/etc/ssl/certs/app.crt"
        }

        mount {
          type   = "bind"
          source = "secrets/tls.key"
          target = "/etc/ssl/private/app.key"
        }
      }
    }
  }
}

Cas d'Usage Pratiques

1. Migration Kubernetes → Nomad

Contexte : Startup avec 50 microservices sur K8s, complexité excessive.

Bénéfices migration : - Réduction équipe ops : 3 → 1 personne - Coût infrastructure : -40% (moins d'overhead) - Time to market : déploiements 3x plus rapides - Réduction incidents : -60% (architecture plus simple)

Stratégie migration :

graph LR
    K8s["Kubernetes<br/>50 services"] --> Hybrid["Migration Hybride<br/>6 mois"]
    Hybrid --> Nomad["Nomad<br/>50 services"]

    Hybrid --> Phase1["Phase 1<br/>Services stateless"]
    Hybrid --> Phase2["Phase 2<br/>Bases de données"]
    Hybrid --> Phase3["Phase 3<br/>Services critiques"]

2. Architecture Multi-Cloud

Objectif : Déploiement uniforme AWS + Azure + On-premise

multi-cloud.nomad
job "global-app" {
  # Multi-datacenter deployment
  datacenters = ["aws-us-east-1", "azure-west-europe", "on-prem-dc1"]

  # Contraintes par région
  constraint {
    attribute = "${meta.cloud_provider}"
    operator  = "set_contains_any"
    value     = "aws,azure,on-prem"
  }

  group "app" {
    # 3 instances par datacenter
    count = 3

    # Spread across datacenters
    spread {
      attribute = "${node.datacenter}"
      weight    = 100
    }

    task "service" {
      driver = "docker"

      config {
        image = "app:v1.0.0"
      }

      # Configuration par provider
      template {
        data = <<EOH
{{if eq (env "node.datacenter") "aws-us-east-1"}}
STORAGE_BACKEND="s3"
REGION="us-east-1"
{{else if eq (env "node.datacenter") "azure-west-europe"}}
STORAGE_BACKEND="blob"
REGION="west-europe"
{{else}}
STORAGE_BACKEND="nfs"
REGION="on-premise"
{{end}}
EOH
        destination = "local/config.env"
        env         = true
      }
    }
  }
}

3. Edge Computing & IoT

Scénario : Déploiement applications edge sur sites distants

edge-deployment.nomad
job "edge-processor" {
  datacenters = ["edge-*"]  # Tous les datacenters edge

  # Contraintes hardware edge
  constraint {
    attribute = "${meta.node_type}"
    value     = "edge"
  }

  constraint {
    attribute = "${node.class}"
    value     = "edge-compute"
  }

  group "processor" {
    # Une instance par site edge
    count = 1

    # Persistence locale
    volume "edge-data" {
      type            = "host"
      source          = "edge-storage"
      attachment_mode = "file-system"
      access_mode     = "single-node-writer"
    }

    task "data-processor" {
      driver = "docker"

      config {
        image = "edge-processor:arm64-v1.0"

        # Optimisé ARM64
        platform = "linux/arm64"
      }

      volume_mount {
        volume      = "edge-data"
        destination = "/data"
      }

      # Configuration spécifique site
      template {
        data = <<EOH
SITE_ID="${meta.site_id}"
LAT="${meta.latitude}"
LON="${meta.longitude}"
SYNC_INTERVAL="300s"
CLOUD_ENDPOINT="https://central.company.com/api"
EOH
        destination = "local/site.env"
        env         = true
      }

      # Resources limitées edge
      resources {
        cpu    = 200
        memory = 256
      }
    }
  }
}

Monitoring & Observabilité

Métriques Prometheus

monitoring.nomad
job "monitoring-stack" {
  group "prometheus" {
    network {
      port "prometheus" {
        static = 9090
      }
    }

    service {
      name = "prometheus"
      port = "prometheus"

      tags = [
        "monitoring",
        "traefik.enable=true",
        "traefik.http.routers.prometheus.rule=Host(`prometheus.company.com`)"
      ]
    }

    task "prometheus" {
      driver = "docker"

      config {
        image = "prom/prometheus:latest"
        ports = ["prometheus"]

        args = [
          "--config.file=/etc/prometheus/prometheus.yml",
          "--storage.tsdb.path=/prometheus",
          "--storage.tsdb.retention.time=30d",
          "--web.console.libraries=/etc/prometheus/console_libraries",
          "--web.console.templates=/etc/prometheus/consoles",
          "--web.enable-lifecycle"
        ]

        mount {
          type   = "bind"
          source = "local/prometheus.yml"
          target = "/etc/prometheus/prometheus.yml"
        }
      }

      # Configuration Prometheus avec auto-discovery Nomad
      template {
        data = <<EOH
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "nomad_rules.yml"

scrape_configs:
  # Nomad servers
  - job_name: 'nomad-servers'
    consul_sd_configs:
      - server: 'localhost:8500'
        services: ['nomad']
        tags: ['server']
    relabel_configs:
      - source_labels: [__meta_consul_service_port]
        target_label: __address__
        replacement: '${1}:4646'
    metrics_path: /v1/metrics
    params:
      format: ['prometheus']

  # Nomad clients
  - job_name: 'nomad-clients'
    consul_sd_configs:
      - server: 'localhost:8500'
        services: ['nomad-client']
    relabel_configs:
      - source_labels: [__meta_consul_service_port]
        target_label: __address__
        replacement: '${1}:4646'
    metrics_path: /v1/metrics
    params:
      format: ['prometheus']

  # Applications services
  - job_name: 'services'
    consul_sd_configs:
      - server: 'localhost:8500'
    relabel_configs:
      - source_labels: [__meta_consul_service_metadata_metrics]
        action: keep
        regex: 'true'
      - source_labels: [__meta_consul_service_metadata_metrics_path]
        target_label: __metrics_path__
        regex: '(.+)'
        replacement: '${1}'
EOH
        destination = "local/prometheus.yml"
        change_mode = "restart"
      }

      resources {
        cpu    = 500
        memory = 1024
      }
    }
  }

  group "grafana" {
    network {
      port "grafana" {
        static = 3000
      }
    }

    service {
      name = "grafana"
      port = "grafana"

      tags = [
        "monitoring",
        "dashboard",
        "traefik.enable=true",
        "traefik.http.routers.grafana.rule=Host(`grafana.company.com`)"
      ]
    }

    task "grafana" {
      driver = "docker"

      config {
        image = "grafana/grafana:latest"
        ports = ["grafana"]

        mount {
          type   = "bind"
          source = "local/grafana.ini"
          target = "/etc/grafana/grafana.ini"
        }
      }

      # Configuration Grafana
      template {
        data = <<EOH
[server]
http_port = 3000

[database]
type = sqlite3
path = /var/lib/grafana/grafana.db

[security]
admin_user = admin
admin_password = ${GRAFANA_ADMIN_PASSWORD}

[auth.anonymous]
enabled = true
org_role = Viewer

[dashboards.json]
enabled = true
path = /var/lib/grafana/dashboards
EOH
        destination = "local/grafana.ini"
      }

      env {
        GF_INSTALL_PLUGINS = "grafana-clock-panel,grafana-simple-json-datasource"
      }

      vault {
        policies = ["grafana"]
      }

      template {
        data = <<EOH
{{with secret "secret/grafana"}}
GRAFANA_ADMIN_PASSWORD="{{.Data.admin_password}}"
{{end}}
EOH
        destination = "secrets/grafana.env"
        env         = true
      }

      resources {
        cpu    = 200
        memory = 512
      }
    }
  }
}

Sécurité & Production

Configuration TLS/mTLS

nomad-tls.hcl
# Configuration TLS complète
datacenter = "dc1"
data_dir   = "/opt/nomad/data"

# TLS Configuration
tls {
  http = true
  rpc  = true

  ca_file   = "/etc/nomad.d/certs/ca.pem"
  cert_file = "/etc/nomad.d/certs/nomad.pem"
  key_file  = "/etc/nomad.d/certs/nomad-key.pem"

  # Mutual TLS
  verify_server_hostname = true
  verify_https_client    = true

  # Strong ciphers only
  tls_min_version = "tls12"
  tls_cipher_suites = [
    "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305",
    "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256"
  ]
}

# ACLs Configuration
acl {
  enabled = true
  token_ttl = "30s"
  policy_ttl = "60s"

  # Bootstrap token (à changer après setup)
  # nomad acl bootstrap
}

# Audit logging
audit {
  enabled = true

  sink "file" {
    type   = "file"
    format = "json"
    path   = "/var/log/nomad/audit.log"

    delivery_guarantee = "enforced"

    rotate_bytes = 100000000  # 100MB
    rotate_max_files = 10
  }
}

Políticas ACL

policies.hcl
# Policy développeur
namespace "dev" {
  policy = "write"

  # Limitations
  capabilities = ["submit-job", "dispatch-job", "read-logs"]
}

namespace "prod" {
  policy = "deny"
}

node {
  policy = "read"
}

agent {
  policy = "read"
}

# Policy admin
namespace "*" {
  policy = "write"
  capabilities = ["*"]
}

node {
  policy = "write"
}

agent {
  policy = "write"
}

operator {
  policy = "write"
}

quota {
  policy = "write"
}

plugin {
  policy = "write"
}

Retour d'Expérience Production

Métriques Réelles

Cluster Production : - Taille : 50 nodes (3 servers + 47 clients) - Workloads : 200+ services, 50+ batch jobs - Uptime : 99.95% sur 18 mois - MTTR : 2.3 minutes en moyenne - Déploiements : 150+ par semaine

Comparaison avant/après migration K8s → Nomad :

Métrique Kubernetes Nomad Amélioration
Temps setup cluster 2-3 jours 2-3 heures 90%
Temps déploiement 5-8 minutes 30-60 secondes 80%
Consommation RAM 4-6 GB overhead 500 MB overhead 85%
Complexité config 200+ lignes YAML 50 lignes HCL 75%
Learning curve 6 mois 2 semaines 90%

Patterns & Anti-Patterns

✅ Bonnes Pratiques :

  • Jobs déclaratifs : Tout en HCL versionné Git
  • Secrets externalisés : Intégration Vault systématique
  • Health checks multiples : HTTP + TCP + custom
  • Resource constraints : Toujours définir CPU/Memory
  • Gradual rollouts : Canary + auto-revert

❌ Anti-Patterns à éviter :

  • Raw exec driver : Risque sécurité élevé
  • Secrets hardcodés : Dans templates ou configs
  • Jobs sans resource limits : Monopolisation ressources
  • Update strategy trop agressive : Downtime evitable
  • Monitoring insuffisant : Alerting sur métriques basiques uniquement

Cas d'Usage Optimal

Nomad excelle pour : - Workloads mixtes : Conteneurs + VMs + binaires - Équipes moyennes : 5-50 développeurs - Multi-cloud/hybrid : Déploiement uniforme - Edge computing : Ressources limitées - Simplicité d'exploitation : Équipe ops réduite

Kubernetes reste meilleur pour : - Écosystème riche : Operators, Helm charts - Applications cloud-native : 12-factor apps - Grandes équipes : 100+ développeurs - Besoins spécialisés : Service mesh complexe, CRDs

Conclusion

HashiCorp Nomad représente une alternative solide et pragmatique à Kubernetes pour de nombreux cas d'usage. Sa simplicité d'installation, configuration et maintenance en font un choix judicieux pour les équipes qui privilégient l'efficacité opérationnelle à l'exhaustivité fonctionnelle.

Recommandations :

  1. Évaluez vos besoins réels : Nomad vs K8s selon taille équipe et complexité
  2. Commencez simple : Cluster dev en 30 minutes avec mode -dev
  3. Intégrez la stack HashiCorp : Consul + Vault + Terraform
  4. Investissez dans l'observabilité : Métriques, logs, alerting
  5. Automatisez la sécurité : TLS, ACLs, rotation secrets

Nomad vous permet de vous concentrer sur vos applications plutôt que sur l'orchestrateur - et c'est précisément sa plus grande force.