# Falco Runtime Threat Detection * Status: accepted * Date: 2026-02-09 * Deciders: Billy * Technical Story: Deploy runtime security monitoring to detect anomalous behavior and threats inside running containers ## Context and Problem Statement Admission policies (Gatekeeper) and vulnerability scanning (Trivy) operate at deploy-time and scan-time respectively, but neither detects runtime threats — a container executing unexpected commands, opening unusual network connections, or reading sensitive files after it has been admitted. How do we detect runtime security threats in a Talos Linux environment where kernel module compilation is impossible? ## Decision Drivers * Detect runtime threats that admission policies can't prevent * Must work on Talos Linux (immutable root filesystem, no kernel headers) * Alert on suspicious activity without blocking legitimate workloads * Stream alerts into the existing notification pipeline (Alertmanager → ntfy → Discord) * Minimal performance impact on AI/ML GPU workloads ## Considered Options 1. **Falco with modern eBPF driver** — CNCF runtime security 2. **Tetragon** — Cilium-based eBPF security observability 3. **Sysdig Secure** — Commercial runtime security 4. **No runtime detection** — Rely on admission policies only ## Decision Outcome Chosen option: **Option 1 - Falco with modern eBPF**, because it's CNCF graduated, supports the modern eBPF driver required for Talos, and integrates with Alertmanager via Falcosidekick. ### Positive Consequences * Detects container escapes, unexpected shells, sensitive file reads at runtime * modern_ebpf driver works on Talos without kernel module compilation * Falcosidekick routes alerts to Alertmanager, integrating with existing pipeline * JSON output enables structured log processing * Runs on every node including control plane via tolerations ### Negative Consequences * eBPF instrumentation adds minor CPU/memory overhead per node * Tuning rules to reduce false positives requires ongoing attention * Falcosidekick adds a Redis dependency for event deduplication ## Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ Every Cluster Node │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Falco (DaemonSet) │ │ │ │ Driver: modern_ebpf (least-privileged) │ │ │ │ Runtime: containerd socket │ │ │ │ │ │ │ │ Kernel syscalls → eBPF probes → Rule evaluation │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ JSON alert output │ │ │ └─────────────────────────────────┬───────────────────┘ │ │ │ gRPC │ └────────────────────────────────────┼────────────────────────┘ │ ▼ ┌────────────────────────┐ │ Falcosidekick │ │ │ │ → Alertmanager │ │ → Prometheus metrics │ │ → Web UI │ └────────────┬───────────┘ │ ┌────────────────┼─────────────────┐ ▼ ▼ ▼ ┌──────────────┐ ┌────────────┐ ┌──────────────┐ │ Alertmanager │ │ Prometheus │ │ Web UI │ │ → ntfy │ │ (metrics) │ │ (inspect) │ │ → Discord │ └────────────┘ └──────────────┘ └──────────────┘ ``` ## Deployment Configuration | | | |---|---| | **Chart** | `falco` from `https://falcosecurity.github.io/charts` | | **Namespace** | `security` | | **Driver** | `modern_ebpf` with `leastPrivileged: true` | | **Container runtime** | Containerd only (`/run/containerd/containerd.sock`) | | **Output format** | JSON (`json_output: true`) | | **Minimum priority** | `warning` | | **Log destination** | stderr (syslog disabled) | | **Buffered outputs** | `false` (immediate delivery) | ### Resources | CPU Request/Limit | Memory Request/Limit | |-------------------|----------------------| | 100m / 1000m | 512Mi / 1024Mi | ### Node Coverage Falco tolerates **all** taints (`NoSchedule` + `NoExecute` with `Exists` operator), ensuring it runs on every node including: - Control plane nodes - GPU worker nodes with dedicated taints - ARM64 edge nodes ## Talos Linux Adaptations | Challenge | Solution | |-----------|----------| | No kernel headers/module compilation | `modern_ebpf` driver (compiles at build-time, loads at runtime) | | Immutable root filesystem | `leastPrivileged: true` — minimal host mounts | | No syslog daemon | stderr-only logging, no syslog output | | Containerd (not Docker/CRI-O) | Explicit containerd socket mount at `/run/containerd/containerd.sock` | ## Falcosidekick (Alert Routing) Falcosidekick receives Falco events via gRPC and fans them out to multiple targets: | Target | Configuration | Minimum Priority | |--------|---------------|------------------| | Alertmanager | `http://alertmanager-operated.observability.svc.cluster.local:9093` | `warning` | | Prometheus | Metrics exporter enabled | — | | Web UI | Enabled (ClusterIP service) | — | **Redis persistence:** 1Gi PVC on `nfs-slow` StorageClass (NFS chosen for ARM node compatibility). ## Detection Categories Falco uses the default ruleset plus local overrides. Key detection categories include: | Category | Example Rules | |----------|---------------| | Container escape | ptrace attach, mount namespace changes | | Unexpected shells | Shell spawned in non-shell container | | Sensitive file access | Reading `/etc/shadow`, `/etc/passwd` | | Network anomalies | Unexpected outbound connections | | Privilege escalation | setuid/setgid calls, capability changes | | Cryptomining | Known mining pool connections, CPU abuse patterns | ## Observability **ServiceMonitor:** Enabled with label `release: prometheus`, scraping Falcosidekick metrics. Alert flow: Falco → Falcosidekick → Alertmanager → ntfy → Discord (same pipeline as all other cluster alerts, documented in [ADR-0039](0039-alerting-notification-pipeline.md)). ## Links * Implements [ADR-0018](0018-security-policy-enforcement.md) (runtime detection component) * Related to [ADR-0039](0039-alerting-notification-pipeline.md) (alerting pipeline) * [Falco Documentation](https://falco.org/docs/) * [Falcosidekick](https://github.com/falcosecurity/falcosidekick) * [modern_ebpf driver](https://falco.org/docs/event-sources/kernel/modern-ebpf/)