Zero‑Downtime Log Aggregation on EKS: Building a Resilient Loki Pipeline with Fluent Bit and Promtail ‣ 2026-03-15

In a dynamic Kubernetes environment, keeping logs available while performing rolling upgrades or configuration changes is critical for observability and troubleshooting. This guide walks you through a concrete, step‑by‑step workflow that couples Fluent Bit and Promtail on Amazon Elastic Kubernetes Service (EKS) to deliver uninterrupted log ingestion into Loki. By leveraging the lightweight data shipper, Fluent Bit, and the flexible Promtail sidecar, you can maintain a continuous data pipeline even when nodes are cordoned, drained, or replaced.

Why Combine Fluent Bit and Promtail?

Both Fluent Bit and Promtail excel at collecting logs, but each brings unique strengths that complement one another on EKS:

Fluent Bit is a high‑performance, resource‑efficient agent that can be deployed as a DaemonSet to capture container logs and system metrics.
Promtail is the official Loki client designed for Kubernetes. It auto‑discovers pods, attaches labels, and streams logs directly to Loki with minimal configuration.
When paired, Fluent Bit can preprocess and filter logs, while Promtail enriches them with Kubernetes metadata, ensuring that every log line carries the necessary context.

The combination also mitigates the risk of log loss during node upgrades: if a node is taken offline, Fluent Bit can route its logs to a central location, and Promtail continues to emit new logs from healthy nodes without interruption.

Prerequisites

Before you begin, make sure you have the following:

A functional EKS cluster (version 1.28 or later) with administrative access.
kubectl configured to communicate with your cluster.
Helm 3 installed on your local machine.
A Loki instance running in the cluster, either via Grafana Loki Helm chart or your own deployment.
Basic familiarity with Kubernetes resources such as DaemonSets, ConfigMaps, and Services.

Step 1: Deploy Loki with a High‑Availability Configuration

Start by installing Loki with a replica set that can survive node drain events. The following Helm command installs Loki in the monitoring namespace with two replicas and a highly available storage backend (e.g., S3 or GCS):

helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring --create-namespace \
  --set loki.persistence.enabled=true \
  --set loki.persistence.storageClassName=gp2 \
  --set loki.persistence.size=200Gi \
  --set loki.server.replicaCount=2

After deployment, confirm that the Loki pods are running:

kubectl get pods -n monitoring -l app=loki

Step 2: Deploy Fluent Bit as a DaemonSet

Fluent Bit will handle raw log ingestion and basic filtering. Deploy it with the following Helm chart configuration, which includes a fluent-bit ConfigMap for filtering rules:

helm repo add fluent https://fluent.github.io/helm-charts
helm install fluent-bit fluent/fluent-bit \
  --namespace kube-system \
  --set config.map="$(cat fluent-bit-config.conf)" \
  --set config.enabled=true \
  --set config.host=true \
  --set config.containerNames="*"

Example fluent-bit-config.conf:

[SERVICE]
    Flush        1
    Log_Level    info
    Parsers_File parsers.conf

[INPUT]
    Name         tail
    Path         /var/log/containers/*.log
    Tag          kube.*
    Docker_Mode  On
    Skip_Long_Lines On

[OUTPUT]
    Name        loki
    Match       *
    Url         http://loki.monitoring.svc:3100/api/prom/push
    BatchWait   1
    BatchSize   1000
    Labels      {job="fluent-bit", cluster="${cluster_name}", namespace="${namespace_name}"}

Fluent Bit’s output plugin streams logs directly to Loki, but we’ll still deploy Promtail for Kubernetes label enrichment in the next step.

Step 3: Install Promtail with Label Injection

Promtail enriches logs with Kubernetes metadata. Deploy Promtail using Helm, ensuring that it only processes logs that Fluent Bit has not already forwarded to Loki. Use a promtail-config.yaml that includes a scrape_configs section pointing to the Loki service:

helm repo add grafana https://grafana.github.io/helm-charts
helm install promtail grafana/promtail \
  --namespace kube-system \
  --set promtail.loki.serviceName=loki.monitoring.svc \
  --set promtail.loki.servicePort=3100 \
  --set promtail.scrapeConfigs[0].job_name="kubernetes-pods" \
  --set promtail.scrapeConfigs[0].pipeline_stages[0].match="kube_pod_container_name: .*" \
  --set promtail.scrapeConfigs[0].relabel_configs[0].source_labels=([__meta_kubernetes_namespace,__meta_kubernetes_pod_name]) \
  --set promtail.scrapeConfigs[0].relabel_configs[0].separator=";" \
  --set promtail.scrapeConfigs[0].relabel_configs[0].target_label="job" \
  --set promtail.scrapeConfigs[0].relabel_configs[0].replacement="kube-pod"

Promtail will now watch the /var/log/containers/*.log directory, attach labels such as namespace and pod_name, and push enriched logs to Loki. Because Fluent Bit already streams logs, Promtail’s role is to supplement with metadata for pods that Fluent Bit might miss during upgrades.

Step 4: Configure Zero‑Downtime Node Draining

With both agents running, you can safely perform rolling upgrades or maintenance. Here’s a recommended process:

Mark node as unschedulable using kubectl cordon.
Use kubectl drain with the --ignore-daemonsets flag to evict pods, allowing Fluent Bit and Promtail DaemonSets to continue running.
Verify that the DaemonSets are still healthy after the drain. kubectl get pods -n kube-system -l app=fluent-bit should show a ready pod on each node.
Once the node is clean, you can update the node image or apply security patches.
After upgrading, kubectl uncordon the node to resume scheduling.

Because Fluent Bit captures logs from /var/log/containers regardless of pod status, and Promtail enriches labels on new pods, there is no gap in log ingestion during this process.

Step 5: Validate Continuous Log Flow

After deploying the agents and performing a test drain, you can confirm continuous logging with Grafana or Loki’s built‑in UI:

Navigate to http:///loki/api/v1/query_range and execute a query such as {job="fluent-bit"}.
Check that logs appear with the expected timestamps even during the drain.
Inspect Promtail’s metrics endpoint http://promtail.kube-system.svc:9080/metrics to ensure it is collecting data from all pods.

Tuning for Performance and Reliability

For production workloads, consider the following optimizations:

Buffering and Back‑pressure: Increase Fluent Bit’s Buffer_Chunk_Size and Buffer_Max_Size in the ConfigMap to handle burst traffic.
Compression: Enable gzip compression in the Loki output plugin to reduce network usage.
Horizontal Scaling of Loki: Add more replicas to the Loki stateful set if you observe query latency spikes.
Health Checks: Expose liveness and readiness probes for both agents to let Kubernetes restart them automatically on failure.
Enable sidecar mode for Promtail to reduce overhead when running on the same node as Fluent Bit.

Monitoring the Log Pipeline

Use Prometheus and Grafana dashboards to monitor the health of Fluent Bit and Promtail. Key metrics include:

fluentbit_output_messages_total – number of messages sent to Loki.
promtail_scrape_error_total – errors encountered during log collection.
loki_ingester_injected_series_count – series count, indicating growth and potential retention issues.

Set up alerts for high back‑pressure or failed pushes to ensure you’re notified before log loss occurs.

Conclusion

By orchestrating Fluent Bit as a lightweight collector and Promtail as a metadata injector, you create a resilient log aggregation stack on EKS that can survive node upgrades, maintenance windows, and unexpected failures without dropping logs. The combination offers a low‑overhead, Kubernetes‑native solution that scales with your cluster, ensuring that observability remains intact even in the most dynamic environments.

Automate AI Refactoring in CI/CD to Catch Bugs Before Release – A 2026 Step‑by‑Step Guide

How to Pick an IDE for Low‑Code Mobile Apps with AI Code Completion

When to Adopt Rust for Low‑Latency Financial Services: A Practical Guide