In a dynamic Kubernetes environment, keeping logs available while performing rolling upgrades or configuration changes is critical for observability and troubleshooting. This guide walks you through a concrete, step‑by‑step workflow that couples Fluent Bit and Promtail on Amazon Elastic Kubernetes Service (EKS) to deliver uninterrupted log ingestion into Loki. By leveraging the lightweight data shipper, Fluent Bit, and the flexible Promtail sidecar, you can maintain a continuous data pipeline even when nodes are cordoned, drained, or replaced.
Why Combine Fluent Bit and Promtail?
Both Fluent Bit and Promtail excel at collecting logs, but each brings unique strengths that complement one another on EKS:
- Fluent Bit is a high‑performance, resource‑efficient agent that can be deployed as a DaemonSet to capture container logs and system metrics.
- Promtail is the official Loki client designed for Kubernetes. It auto‑discovers pods, attaches labels, and streams logs directly to Loki with minimal configuration.
- When paired, Fluent Bit can preprocess and filter logs, while Promtail enriches them with Kubernetes metadata, ensuring that every log line carries the necessary context.
The combination also mitigates the risk of log loss during node upgrades: if a node is taken offline, Fluent Bit can route its logs to a central location, and Promtail continues to emit new logs from healthy nodes without interruption.
Prerequisites
Before you begin, make sure you have the following:
- A functional EKS cluster (version 1.28 or later) with administrative access.
- kubectl configured to communicate with your cluster.
- Helm 3 installed on your local machine.
- A Loki instance running in the cluster, either via Grafana Loki Helm chart or your own deployment.
- Basic familiarity with Kubernetes resources such as DaemonSets, ConfigMaps, and Services.
Step 1: Deploy Loki with a High‑Availability Configuration
Start by installing Loki with a replica set that can survive node drain events. The following Helm command installs Loki in the monitoring namespace with two replicas and a highly available storage backend (e.g., S3 or GCS):
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
--namespace monitoring --create-namespace \
--set loki.persistence.enabled=true \
--set loki.persistence.storageClassName=gp2 \
--set loki.persistence.size=200Gi \
--set loki.server.replicaCount=2
After deployment, confirm that the Loki pods are running:
kubectl get pods -n monitoring -l app=loki
Step 2: Deploy Fluent Bit as a DaemonSet
Fluent Bit will handle raw log ingestion and basic filtering. Deploy it with the following Helm chart configuration, which includes a fluent-bit ConfigMap for filtering rules:
helm repo add fluent https://fluent.github.io/helm-charts
helm install fluent-bit fluent/fluent-bit \
--namespace kube-system \
--set config.map="$(cat fluent-bit-config.conf)" \
--set config.enabled=true \
--set config.host=true \
--set config.containerNames="*"
Example fluent-bit-config.conf:
[SERVICE]
Flush 1
Log_Level info
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/containers/*.log
Tag kube.*
Docker_Mode On
Skip_Long_Lines On
[OUTPUT]
Name loki
Match *
Url http://loki.monitoring.svc:3100/api/prom/push
BatchWait 1
BatchSize 1000
Labels {job="fluent-bit", cluster="${cluster_name}", namespace="${namespace_name}"}
Fluent Bit’s output plugin streams logs directly to Loki, but we’ll still deploy Promtail for Kubernetes label enrichment in the next step.
Step 3: Install Promtail with Label Injection
Promtail enriches logs with Kubernetes metadata. Deploy Promtail using Helm, ensuring that it only processes logs that Fluent Bit has not already forwarded to Loki. Use a promtail-config.yaml that includes a scrape_configs section pointing to the Loki service:
helm repo add grafana https://grafana.github.io/helm-charts
helm install promtail grafana/promtail \
--namespace kube-system \
--set promtail.loki.serviceName=loki.monitoring.svc \
--set promtail.loki.servicePort=3100 \
--set promtail.scrapeConfigs[0].job_name="kubernetes-pods" \
--set promtail.scrapeConfigs[0].pipeline_stages[0].match="kube_pod_container_name: .*" \
--set promtail.scrapeConfigs[0].relabel_configs[0].source_labels=([__meta_kubernetes_namespace,__meta_kubernetes_pod_name]) \
--set promtail.scrapeConfigs[0].relabel_configs[0].separator=";" \
--set promtail.scrapeConfigs[0].relabel_configs[0].target_label="job" \
--set promtail.scrapeConfigs[0].relabel_configs[0].replacement="kube-pod"
Promtail will now watch the /var/log/containers/*.log directory, attach labels such as namespace and pod_name, and push enriched logs to Loki. Because Fluent Bit already streams logs, Promtail’s role is to supplement with metadata for pods that Fluent Bit might miss during upgrades.
Step 4: Configure Zero‑Downtime Node Draining
With both agents running, you can safely perform rolling upgrades or maintenance. Here’s a recommended process:
- Mark node as unschedulable using
kubectl cordon. - Use
kubectl drainwith the--ignore-daemonsetsflag to evict pods, allowing Fluent Bit and Promtail DaemonSets to continue running. - Verify that the DaemonSets are still healthy after the drain.
kubectl get pods -n kube-system -l app=fluent-bitshould show a ready pod on each node. - Once the node is clean, you can update the node image or apply security patches.
- After upgrading,
kubectl uncordonthe node to resume scheduling.
Because Fluent Bit captures logs from /var/log/containers regardless of pod status, and Promtail enriches labels on new pods, there is no gap in log ingestion during this process.
Step 5: Validate Continuous Log Flow
After deploying the agents and performing a test drain, you can confirm continuous logging with Grafana or Loki’s built‑in UI:
- Navigate to
http://and execute a query such as/loki/api/v1/query_range {job="fluent-bit"}. - Check that logs appear with the expected timestamps even during the drain.
- Inspect Promtail’s metrics endpoint
http://promtail.kube-system.svc:9080/metricsto ensure it is collecting data from all pods.
Tuning for Performance and Reliability
For production workloads, consider the following optimizations:
- Buffering and Back‑pressure: Increase Fluent Bit’s
Buffer_Chunk_SizeandBuffer_Max_Sizein the ConfigMap to handle burst traffic. - Compression: Enable gzip compression in the Loki output plugin to reduce network usage.
- Horizontal Scaling of Loki: Add more replicas to the Loki stateful set if you observe query latency spikes.
- Health Checks: Expose liveness and readiness probes for both agents to let Kubernetes restart them automatically on failure.
- Enable
sidecarmode for Promtail to reduce overhead when running on the same node as Fluent Bit.
Monitoring the Log Pipeline
Use Prometheus and Grafana dashboards to monitor the health of Fluent Bit and Promtail. Key metrics include:
fluentbit_output_messages_total– number of messages sent to Loki.promtail_scrape_error_total– errors encountered during log collection.loki_ingester_injected_series_count– series count, indicating growth and potential retention issues.
Set up alerts for high back‑pressure or failed pushes to ensure you’re notified before log loss occurs.
Conclusion
By orchestrating Fluent Bit as a lightweight collector and Promtail as a metadata injector, you create a resilient log aggregation stack on EKS that can survive node upgrades, maintenance windows, and unexpected failures without dropping logs. The combination offers a low‑overhead, Kubernetes‑native solution that scales with your cluster, ensuring that observability remains intact even in the most dynamic environments.
