API Gateway vs Varnish for API Security & Traffic Control - Seeking Advice

Hello!!

We operate a backend API stack on GCP. Currently:

  • Varnish handles caching + SourceIP authentication.
    (IP refers to the infrastructure servers’ IPs, while Source is a URL parameter passed in API calls that identifies the property.)

  • Both Varnish and backend run on the same GCP server.

As we scale for resilience and performance, we need to implement:

  • Authentication & Validation

    • Route requests to the correct backend.

    • Enforce API key + SourceIP-based auth.

    • Validate identities via parameters:

      • Source (URL param, used in backend calls)

      • IP (frontend server IPs)

      • USR-ID (end-user ID, from URL param)

  • Rate Limiting

    • Based on Source + IP + USR-ID.

    • Alert at x% of the threshold. Block at y% (API Gateway) or z% (Varnish).

    • Configurable block durations.

  • Circuit Breaking / Backend Protection

    • Stop routing on backend failure.

    • Note: In Varnish, we use a heartbeat health-check mechanism that is similar to a circuit breaker, but not a true implementation

  • Logging & Observability

    • Track rate-limit breaches, blocks, circuit-breaking events, and request metadata.

    • Alerts on abnormal traffic or backend failures.


Options We’re Considering

  1. API Gateway (New Development)

    • Central entry point for traffic.

    • Handles auth, routing, rate limiting, logging, and observability.

    • Centralised logic –> easier management as APIs grow.

    • Adds an extra hop

  2. Enhanced Varnish (Current, with Modifications)

    • Already deployed per server.

    • Would need manual updates for: rate limiting (Source/IP/USR-ID), logging, and backend protection.

    • No true circuit breaker, but can serve stale cache or block during backend downtime


Key Questions for the Community

  • Centralisation vs Distribution: Better to centralise controls in an API Gateway, or enhance Varnish on each server??

  • Performance & Maintenance: Does the extra hop of an API Gateway outweigh its benefits in observability and control??

  • Scaling with Varnish: How do you avoid config drift and manage scaling in a Varnish-based setup??

  • Deployment Topology: Should Varnish and backend run on the same GCP server, or be separated for resilience??

  • Real-World Experiences: Has anyone migrated from Varnish-based controls to an API Gateway?? What worked, what didn’t??


Looking for:

  • Real-world experiences

  • Best practices

  • Resource recommendations

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.