Hello!!
We operate a backend API stack on GCP. Currently:
-
Varnish handles caching + SourceIP authentication.
(IP refers to the infrastructure servers’ IPs, while Source is a URL parameter passed in API calls that identifies the property.) -
Both Varnish and backend run on the same GCP server.
As we scale for resilience and performance, we need to implement:
-
Authentication & Validation
-
Route requests to the correct backend.
-
Enforce API key + SourceIP-based auth.
-
Validate identities via parameters:
-
Source (URL param, used in backend calls)
-
IP (frontend server IPs)
-
USR-ID (end-user ID, from URL param)
-
-
-
Rate Limiting
-
Based on Source + IP + USR-ID.
-
Alert at x% of the threshold. Block at y% (API Gateway) or z% (Varnish).
-
Configurable block durations.
-
-
Circuit Breaking / Backend Protection
-
Stop routing on backend failure.
-
Note: In Varnish, we use a heartbeat health-check mechanism that is similar to a circuit breaker, but not a true implementation
-
-
Logging & Observability
-
Track rate-limit breaches, blocks, circuit-breaking events, and request metadata.
-
Alerts on abnormal traffic or backend failures.
-
Options We’re Considering
-
API Gateway (New Development)
-
Central entry point for traffic.
-
Handles auth, routing, rate limiting, logging, and observability.
-
Centralised logic –> easier management as APIs grow.
-
Adds an extra hop
-
-
Enhanced Varnish (Current, with Modifications)
-
Already deployed per server.
-
Would need manual updates for: rate limiting (Source/IP/USR-ID), logging, and backend protection.
-
No true circuit breaker, but can serve stale cache or block during backend downtime
-
Key Questions for the Community
-
Centralisation vs Distribution: Better to centralise controls in an API Gateway, or enhance Varnish on each server??
-
Performance & Maintenance: Does the extra hop of an API Gateway outweigh its benefits in observability and control??
-
Scaling with Varnish: How do you avoid config drift and manage scaling in a Varnish-based setup??
-
Deployment Topology: Should Varnish and backend run on the same GCP server, or be separated for resilience??
-
Real-World Experiences: Has anyone migrated from Varnish-based controls to an API Gateway?? What worked, what didn’t??
Looking for:
-
Real-world experiences
-
Best practices
-
Resource recommendations