Monitoring & Observability
Structured logs
Section titled “Structured logs”Every log line is a JSON object. Key fields:
| Field | Type | Description |
|---|---|---|
msg | string | Event type — "request", "payment_settled", "settlement_failed", etc. |
timestamp | string | ISO 8601 timestamp |
level | string | "debug", "info", "warn", "error" |
method | string | HTTP method |
path | string | Request path |
route | string | Matched route pattern, e.g. "POST /v1/messages" |
status | number | HTTP status returned to client |
duration_ms | number | Total request duration |
price | string | Price charged, e.g. "$0.075" |
payer | string | Payer wallet address |
tx_hash | string | On-chain transaction hash |
error | string | Error message, if any |
Log levels: info (successful requests), warn (402s, rate limits), error (settlement/upstream failures), debug (verbose, disable in production).
Prometheus metrics
Section titled “Prometheus metrics”Enable with:
gateway: metrics: trueCounters
Section titled “Counters”| Metric | Labels | Description |
|---|---|---|
tollbooth_requests_total | route, method, status | Total requests |
tollbooth_payments_total | route, outcome | Payment attempts (success, rejected, missing) |
tollbooth_settlements_total | strategy, outcome | Settlement attempts (success, failure) |
tollbooth_cache_hits_total | route | Verification cache hits |
tollbooth_cache_misses_total | route | Verification cache misses |
tollbooth_rate_limit_blocks_total | route | Requests blocked by rate limiting |
tollbooth_upstream_errors_total | upstream, status | Non-2xx responses from upstreams |
tollbooth_revenue_usd_total | route | Cumulative revenue in USD |
Histograms
Section titled “Histograms”| Metric | Labels | Description |
|---|---|---|
tollbooth_request_duration_seconds | route, method | End-to-end request latency |
tollbooth_settlement_duration_seconds | strategy | Settlement latency |
tollbooth_upstream_duration_seconds | upstream | Upstream response latency |
Gauges
Section titled “Gauges”| Metric | Description |
|---|---|
tollbooth_active_requests | Currently in-flight requests |
Useful queries
Section titled “Useful queries”# p95 latency per routehistogram_quantile(0.95, sum by (le, route, method) (rate(tollbooth_request_duration_seconds_bucket[5m])))
# 402 ratesum(rate(tollbooth_requests_total{status="402"}[5m])) / sum(rate(tollbooth_requests_total[5m]))
# Settlement success raterate(tollbooth_settlements_total{outcome="success"}[5m]) / rate(tollbooth_settlements_total[5m])
# Revenue rate by routerate(tollbooth_revenue_usd_total[5m])Prometheus scrape config
Section titled “Prometheus scrape config”scrape_configs: - job_name: tollbooth metrics_path: /metrics static_configs: - targets: ["localhost:3000"]Troubleshooting
Section titled “Troubleshooting”High 402 rate — Check if clients are sending the payment-signature header. Verify route prices haven’t changed. Check the error field on status: 402 log lines.
Settlement failures — Check facilitator endpoint health. Look at duration_ms on settlement_failed log lines for timeouts.
High upstream latency — Compare duration_ms with upstream timing. If they’re close, tollbooth isn’t the bottleneck.