- ADR-0040: OPA Gatekeeper policy framework (constraint templates, progressive enforcement, warn-first strategy) - ADR-0041: Falco runtime threat detection (modern eBPF on Talos, Falcosidekick → Alertmanager integration) - ADR-0042: Trivy Operator vulnerability scanning (5 scanners enabled, ARM64 scan job scheduling, Talos adaptations) - Update ADR-0018: mark Falco as implemented, link to detailed ADRs - Update README: add 0040-0042 to ADR table, update badge counts
7.8 KiB
Falco Runtime Threat Detection
- Status: accepted
- Date: 2026-02-09
- Deciders: Billy
- Technical Story: Deploy runtime security monitoring to detect anomalous behavior and threats inside running containers
Context and Problem Statement
Admission policies (Gatekeeper) and vulnerability scanning (Trivy) operate at deploy-time and scan-time respectively, but neither detects runtime threats — a container executing unexpected commands, opening unusual network connections, or reading sensitive files after it has been admitted.
How do we detect runtime security threats in a Talos Linux environment where kernel module compilation is impossible?
Decision Drivers
- Detect runtime threats that admission policies can't prevent
- Must work on Talos Linux (immutable root filesystem, no kernel headers)
- Alert on suspicious activity without blocking legitimate workloads
- Stream alerts into the existing notification pipeline (Alertmanager → ntfy → Discord)
- Minimal performance impact on AI/ML GPU workloads
Considered Options
- Falco with modern eBPF driver — CNCF runtime security
- Tetragon — Cilium-based eBPF security observability
- Sysdig Secure — Commercial runtime security
- No runtime detection — Rely on admission policies only
Decision Outcome
Chosen option: Option 1 - Falco with modern eBPF, because it's CNCF graduated, supports the modern eBPF driver required for Talos, and integrates with Alertmanager via Falcosidekick.
Positive Consequences
- Detects container escapes, unexpected shells, sensitive file reads at runtime
- modern_ebpf driver works on Talos without kernel module compilation
- Falcosidekick routes alerts to Alertmanager, integrating with existing pipeline
- JSON output enables structured log processing
- Runs on every node including control plane via tolerations
Negative Consequences
- eBPF instrumentation adds minor CPU/memory overhead per node
- Tuning rules to reduce false positives requires ongoing attention
- Falcosidekick adds a Redis dependency for event deduplication
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Every Cluster Node │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Falco (DaemonSet) │ │
│ │ Driver: modern_ebpf (least-privileged) │ │
│ │ Runtime: containerd socket │ │
│ │ │ │
│ │ Kernel syscalls → eBPF probes → Rule evaluation │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ JSON alert output │ │
│ └─────────────────────────────────┬───────────────────┘ │
│ │ gRPC │
└────────────────────────────────────┼────────────────────────┘
│
▼
┌────────────────────────┐
│ Falcosidekick │
│ │
│ → Alertmanager │
│ → Prometheus metrics │
│ → Web UI │
└────────────┬───────────┘
│
┌────────────────┼─────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌────────────┐ ┌──────────────┐
│ Alertmanager │ │ Prometheus │ │ Web UI │
│ → ntfy │ │ (metrics) │ │ (inspect) │
│ → Discord │ └────────────┘ └──────────────┘
└──────────────┘
Deployment Configuration
| Chart | falco from https://falcosecurity.github.io/charts |
| Namespace | security |
| Driver | modern_ebpf with leastPrivileged: true |
| Container runtime | Containerd only (/run/containerd/containerd.sock) |
| Output format | JSON (json_output: true) |
| Minimum priority | warning |
| Log destination | stderr (syslog disabled) |
| Buffered outputs | false (immediate delivery) |
Resources
| CPU Request/Limit | Memory Request/Limit |
|---|---|
| 100m / 1000m | 512Mi / 1024Mi |
Node Coverage
Falco tolerates all taints (NoSchedule + NoExecute with Exists operator), ensuring it runs on every node including:
- Control plane nodes
- GPU worker nodes with dedicated taints
- ARM64 edge nodes
Talos Linux Adaptations
| Challenge | Solution |
|---|---|
| No kernel headers/module compilation | modern_ebpf driver (compiles at build-time, loads at runtime) |
| Immutable root filesystem | leastPrivileged: true — minimal host mounts |
| No syslog daemon | stderr-only logging, no syslog output |
| Containerd (not Docker/CRI-O) | Explicit containerd socket mount at /run/containerd/containerd.sock |
Falcosidekick (Alert Routing)
Falcosidekick receives Falco events via gRPC and fans them out to multiple targets:
| Target | Configuration | Minimum Priority |
|---|---|---|
| Alertmanager | http://alertmanager-operated.observability.svc.cluster.local:9093 |
warning |
| Prometheus | Metrics exporter enabled | — |
| Web UI | Enabled (ClusterIP service) | — |
Redis persistence: 1Gi PVC on nfs-slow StorageClass (NFS chosen for ARM node compatibility).
Detection Categories
Falco uses the default ruleset plus local overrides. Key detection categories include:
| Category | Example Rules |
|---|---|
| Container escape | ptrace attach, mount namespace changes |
| Unexpected shells | Shell spawned in non-shell container |
| Sensitive file access | Reading /etc/shadow, /etc/passwd |
| Network anomalies | Unexpected outbound connections |
| Privilege escalation | setuid/setgid calls, capability changes |
| Cryptomining | Known mining pool connections, CPU abuse patterns |
Observability
ServiceMonitor: Enabled with label release: prometheus, scraping Falcosidekick metrics.
Alert flow: Falco → Falcosidekick → Alertmanager → ntfy → Discord (same pipeline as all other cluster alerts, documented in ADR-0039).
Links
- Implements ADR-0018 (runtime detection component)
- Related to ADR-0039 (alerting pipeline)
- Falco Documentation
- Falcosidekick
- modern_ebpf driver