
Architecture
Data flow from Cloudflare's GraphQL API through the collector into the observability stack
flowchart TD
CF["Cloudflare GraphQL API"]
POLL["Poll Scheduler"]
FW["Firewall Collector"]
HTTP["HTTP Collector"]
METRICS["Metrics Exporter"]
LOKIC["Loki Client"]
TRACE["Trace Context"]
SLOG["Structured Logger"]
LOKI["Loki"]
PROM["Prometheus"]
TEMPO["Tempo"]
GRAFANA["Grafana"]
CF -->|"GraphQL queries"| POLL
POLL --> FW
POLL --> HTTP
FW -->|"JSON log lines"| LOKIC
HTTP -->|"JSON log lines"| LOKIC
HTTP -->|"gauge updates"| METRICS
FW -->|"event counters"| METRICS
LOKIC -->|"POST /loki/api/v1/push"| LOKI
METRICS -->|"/metrics"| PROM
TRACE -->|"OTLP gRPC"| TEMPO
SLOG -->|"trace_id injection"| TRACE
LOKI --> GRAFANA
PROM --> GRAFANA
TEMPO --> GRAFANA
classDef source fill:#0c2d48,stroke:#38bdf8,color:#e0f2fe
classDef collector fill:#1e293b,stroke:#334155,color:#e2e8f0
classDef sink fill:#132a1f,stroke:#22c55e,color:#dcfce7
classDef viz fill:#2d2513,stroke:#f97316,color:#fef3c7
class CF source
class POLL,FW,HTTP,METRICS,LOKIC,TRACE,SLOG collector
class LOKI,PROM,TEMPO sink
class GRAFANA viz
Data Flow
Poll Cycle
- The poll scheduler triggers on a configurable interval (default 5 minutes)
- Two collectors run in parallel within each cycle:
- Firewall collector queries
firewallEventsAdaptivefor individual WAF events - HTTP collector queries
httpRequestsAdaptiveGroupsfor aggregated traffic stats
- Firewall collector queries
- Each collector is wrapped in an OpenTelemetry span for end-to-end trace visibility
Firewall Events
- Each event becomes a JSON log line pushed to Loki under
{job="cloudflare", type="firewall"} - Event counts are tracked as Prometheus counters broken down by action type (block, challenge, allow)
- Fields captured: action, client IP, host, method, path, query, ray name, rule ID, source, user agent, country
HTTP Traffic
- Aggregated groups are pushed to Loki under
{job="cloudflare", type="http_traffic"}as JSON - Request counts are exposed as Prometheus gauges labeled by method, status code, and country
- Edge response bytes are tracked as a separate gauge
Observability
- Prometheus: 9 metric families covering poll health, firewall events, HTTP traffic, Loki push status, and build info
- Loki: Two structured log streams with distinct label sets for filtering
- Tempo: Full trace per poll cycle with child spans for each API call and Loki push
- Log-trace correlation: A custom slog handler injects
trace_idandspan_idinto every log line, enabling one-click navigation between logs and traces in Grafana
Resilience
- Both Cloudflare and Loki clients retry on transient failures (HTTP 429, 502, 503, 504) with exponential backoff up to 3 attempts
Retry-Afterheaders are honored when present- On startup, the collector backfills up to the configured window (default 1 hour) to catch events from while it was down