Skip to main content

Caddy HTTPS Servers

What

The entry point for all requests that are enventually managed by farfalla is a load balanced fleet of Caddy servers. These servers are hosted in AWS managed via Laravel Forge.

This first public layer takes care of automatically issuing and renewing certificates for both our own wildcard subdomains like alephdigital.publica.la and third party custom domains like digital.revisbarcelona.com.

Request Lifecycle

This section explains the complete lifecycle of a request in our multi-tenant application architecture, covering all the key services involved.

Here is a simplified overview of the request flow:

Key Parts of the System

  • User: Located in Uruguay for this example.
  • User's Device: A browser-enabled device.
  • DNS: Resolves domain names to IP addresses.
  • AWS Global Accelerator: Routes requests to the nearest AWS region. It also allows us to have a single domain as an entry point to the infrastructure, regardless of where the request is being made from. This configuration is done entirely inside AWS console. The Global Accelerator ensures that the user is connected to the server with lowest latency.
  • AWS EC2: Hosts the Caddy server.
  • Caddy: Acts as a reverse proxy and handles TLS management.
  • ZeroSSL / Let's Encrypt: Provides HTTPS certificates.
  • DynamoDB: Centralized storage for certificates.
  • farfalla-https-guard: Microservice for validating domain and tenant information.
  • AWS API Gateway 2.0 (HTTP APIs): Fronts AWS Lambda.
  • AWS Lambda: Executes farfalla's logic.
  • farfalla: Monolith service managing tenant-specific logic and core product features.
  • PHP and Laravel: Language and framework behind farfalla.

Request Lifecycle Flow

This is an example using the URL https://digital.revisbarcelona.com/library.

  1. Initial Request from User The user in Uruguay wants to load https://digital.revisbarcelona.com/library.
  2. DNS Lookup The user's device performs a DNS lookup for digital.revisbarcelona.com, which resolves as:
    • digital.revisbarcelona.combarcelona-wfsrt-59-ytqs.app.publica.la (CNAME).
      • It's our customer's responsibility to setup a DNS record of type CNAME to point digital.revisbarcelona.com to the domain barcelona-wfsrt-59-ytqs.app.publica.la
    • barcelona-wfsrt-59-ytqs.app.publica.laad83420ef3101bf80.awsglobalaccelerator.com (CNAME).
      • It's our responsibility to setup this DNS record of type CNAME.
      • We manage it in Cloudflare with this 3 CNAME records:
        • *.app.publica.la -> ad83420ef3101bf80.awsglobalaccelerator.com
        • app.publica.la -> ad83420ef3101bf80.awsglobalaccelerator.com
        • *.publica.la -> ad83420ef3101bf80.awsglobalaccelerator.com
    • ad83420ef3101bf80.awsglobalaccelerator.com has two A records: 76.223.34.22 and 13.248.160.216.
      • This A records are a responsibility of AWS. We consider this IPs as dynamic and do not depend on those specific IPs.
  3. Global Accelerator The device attempts to load the content from one of these IPs.
    • AWS Global Accelerator receives the request and checks its origin.
    • Global Accelerator routes the request to the nearest region, which is sa-east-1.
  4. Request Handling by EC2 and Caddy
    • The EC2 server in sa-east-1 receives the request.
    • Caddy, running in that EC2 server, inspects the domain digital.revisbarcelona.com, determines it's a custom domain, and executes the custom domain handler.
    • Caddy checks if it already has a valid certificate for the domain in its local cache. If not, it checks the centralized storage in DynamoDB.
    • If no certificate is found, Caddy calls the on_demand_tls.ask endpoint to verify whether it should generate a certificate.
  5. Domain Validation and Certificate Issuance
    • The on_demand_tls.ask endpoint is managed by farfalla-https-guard, hosted via Laravel Vapor and AWS Lambda.
    • Caddy receives a 200 response, confirming certificate generation is allowed.
    • Caddy creates an atomic lock in DynamoDB (LOCK-issue_cert_digital.revisbarcelona.com) to ensure no duplicate certificate issuance.
    • Caddy contacts ZeroSSL via API to generate a 30-day HTTPS certificate.
    • ZeroSSL requires domain validation using the HTTP_CSR_HASH challenge, which Caddy successfully completes.
    • ZeroSSL generates the certificate, which Caddy stores in DynamoDB:
      • certificates/zerossl/digital.revisbarcelona.com/digital.revisbarcelona.com.json certificate metadata
      • certificates/zerossl/digital.revisbarcelona.com/digital.revisbarcelona.com.key private key
      • certificates/zerossl/digital.revisbarcelona.com/digital.revisbarcelona.com.crt certificate file
  6. Certificate Distribution and Cache Management
    • Other Caddy instances across the fleet can now retrieve the certificate from DynamoDB.
    • Caddy removes the atomic lock from DynamoDB and caches the certificate locally for future use.
  7. Reverse Proxy to farfalla
    • Caddy forwards the request via reverse proxy to https://farfalla-entry-point.publica.la. This specific endpoint is proxied through Cloudflare before reaching its origin (an AWS API Gateway V2 endpoint: d-uynhasnc45.execute-api.us-east-1.amazonaws.com).
    • This roundtrip through Cloudflare allows us to leverage its security features, including custom rate limiting, Web Application Firewall (WAF) rules, and automated DDoS mitigation, adding an extra layer of protection before requests hit our core application infrastructure.
    • This endpoint handles both wildcard subdomains like alephdigital.publica.la and custom domains like digital.revisbarcelona.com.
    • Caddy adds the X-Forwarded-Host header to indicate the original domain (digital.revisbarcelona.com) to the upstream service.
  8. farfalla Request Handling
  9. Tenant Resolution
    • farfalla checks whether digital.revisbarcelona.com is a valid tenant subdomain or final_domain.
    • It confirms the tenant ID (tenant_id=2) and initializes the app for that tenant using the CurrentTenant service.
    • This setup enables tenant-specific logic like the global tenant() helper.
  10. Response Generation and Delivery
    • farfalla processes the /library route and generates the HTML response.
    • The response is passed back to AWS Lambda, which hands it off to AWS API Gateway 2.0.
    • AWS API Gateway returns the response to Caddy, which forwards it to AWS Global Accelerator.
    • AWS Global Accelerator sends the final response to the user's device.
  11. Rendering
    • The user's device renders the page at https://digital.revisbarcelona.com/library.

Current servers and load balancing setup

We have separate fleets dedicated to staging and production.

IPNameLocation
34.229.139.178custom-domains-prod-us-02us-east-1 (USA, N. Virginia)
15.228.13.208custom-domains-prod-br-02sa-east-1 (South America, São Paulo)
52.30.112.138custom-domains-prod-eu-01eu-west-1 (Europe, Ireland)
---
18.209.57.166custom-domains-staging-us-01us-east-1 (USA, N. Virginia)
18.231.143.167custom-domains-staging-br-02sa-east-1 (South America, São Paulo)

At the moment we have the following architecture for both Staging and Production environment

Staging

`

Production

`

Caddy config

Automatic and On demand HTTPS

We use Caddy to manage almost all HTTPS certificates, except those managed by Cloudflare or AWS.

Specifically, these types of certificates:

  • First level wildcards of *.publica.la for production subdomains we create automatically for each new tenant
    • alephdigital.publica.la
    • fgilio.publica.la
  • Second level wildcards of *.app.publica.la for production subdomains we create automatically for each new tenant
  • Custom domains for tenants that decide to setup their own via a CNAME.
    • digital.revisbarcelona.com
    • kiosco.latercera.com
  • First level wildcards of *.publicala.me for production subdomains we create automatically for each new tenant
    • reader-qa-staging.publicala.me
  • Second level wildcards of *.staging-farfalla.publica.la for production subdomains we create automatically for each new tenant
    • demoreaderqastaging.staging-farfalla.publica.la

Caddy is able to handle all of this automatically, more info in their docs here and here.

Wildcards

For wildcard subdomains we use Let's Encrypt free certificates, we don't use it for everyting because it has a very low rate limt.

We use Let's Encrypt via it's standard ACME protocol.

It uses the DNS challenge, so we include the dns.providers.cloudflare module and provide API keys so that Caddy can automatically manage the TXT DNS records during the challenge.

Custom domains

For custom domains we use a paid ZeroSSL account.

We use ZeroSSL via it's proprietary REST API, that Caddy supports natively.

It uses the HTTP_CSR_HASH challenge, which does not requiere a custom module.

Trusted Proxies

Within the (defaultSiteConfig) reverse_proxy configuration block, we utilize the trusted_proxies directive. This directive is essential for ensuring Caddy correctly identifies the original client IP address when requests pass through intermediaries like Cloudflare.

We configure trusted_proxies with the official list of Cloudflare's IP ranges. Maintaining an accurate list ensures that headers like X-Forwarded-For are processed reliably, which is crucial for logging, rate limiting, and security features.

The IP ranges are sourced directly from Cloudflare:

Misc

  • Caddy ads a header called X-Caddy-Id with Forge's server ID to both response (to the user) and request (to farfalla) headers.

Monitoring and Maintenance

We can monitor all the servers and certificate types renewal from this dashboard https://ohdear.app/status-page/https-servers-and-certificates

Misc Caddy and servers

Use this command to switch to the root user and get access to Caddy's logs

sudo su - root

Use this command to tail Caddy's logs

journalctl -u caddy.service -b -f -n 10

Use this command to list Caddy's custom modules, such as DynamoDB storage

caddy list-modules --skip-standard

DynamoDB

You can run all these commands from AWS CloudShell.

COUNT LIKE %.publica.la% AND NOT LIKE %.app.publica.la%

aws dynamodb scan \
--table-name caddy_ssl_certificates \
--filter-expression "contains(PrimaryKey, :suffix) AND NOT contains(PrimaryKey, :exclude_suffix)" \
--expression-attribute-values '{":suffix": {"S": ".publica.la"}, ":exclude_suffix": {"S": ".app.publica.la"}}' \
--select "COUNT"

COUNT ocsp records

aws dynamodb scan \
--table-name caddy_ssl_certificates \
--filter-expression "contains(PrimaryKey, :suffix)" \
--expression-attribute-values '{":suffix": {"S": "ocsp"}}' \
--select "COUNT"
# -> 1053

COUNT .app.publica.la records

aws dynamodb scan \
--table-name caddy_ssl_certificates \
--filter-expression "contains(PrimaryKey, :suffix)" \
--expression-attribute-values '{":suffix": {"S": ".app.publica.la"}}' \
--select "COUNT"
# -> 3319

COUNT .staging-farfalla.publica.la records

aws dynamodb scan \
--table-name caddy_ssl_certificates \
--filter-expression "contains(PrimaryKey, :suffix)" \
--expression-attribute-values '{":suffix": {"S": ".staging-farfalla.publica.la"}}' \
--select "COUNT"
# -> 3319

Recipes

Recipes are small Bash scripts that we use to run tasks across many servers.

We store recipes in our Forge Account, more on their docs.

Receipes are named using this pattern: {create/update date} --- {target environment} --- {purpose}

These are the recipes we currently have:

  • 54551 2024.09.05 GENERAL Upgrade custom Caddy build
  • 34095 2023.03.11 STAGING Setup new Caddy server from scratch
  • 61607 2024.06.06 STAGING Upgrade custom Caddy build and update config
  • 63346 2024.09.05 STAGING Download manual HTTPS certificate 20240905_wilcard-cert_.staging-farfalla.publica.la
  • 41983 2024.09.05 STAGING Update Caddy config
  • 34096 2023.03.11 PRODUCTION Setup new Caddy server from scratch
  • 61609 2024.06.06 PRODUCTION Upgrade custom Caddy build and update config
  • 63347 2024.09.05 PRODUCTION Download manual HTTPS certificate 20240905_wilcard-cert_.publica.la
  • 41984 2024.09.05 PRODUCTION Update Caddy config

* The number at the begining is the recipe ID in Forge.

IMPORTANT

Recipes are constantly updated each time we need to use them, because most of the times we're making at least a small change. This is why we consider Forge's version of the receipes the "source of truth", the recipes you see here are only a reference.

Creating a new server

To set up a new caddy server follow the next steps:

1. Create and configure the server using Forge

  1. Login to Forge with your credentials
  2. Go to Servers Page and press the button CREATE CIRCLE SERVER
  3. Select the circle publica.la and credential Staging or production depending on the case.
  4. Create a server with the following characteristics:
  • Type: Load Balancer
  • Name: Use the following naming convention custom-domains-{staging or prod}-{location}-{number} (refer to other previously created servers)
  • Region: Select the region of your choise
  • Server Size: Select the size of your choise
  • VPC: Select Create New
  • VPC Name: Give a meaninful name to the new VPC
  • Post-Provision Recipe: Use the recipe according to your needs.
  • Make sure the option Add Server's SSH Key To Source Control Providers is checked.
  1. Use the recipe called "Setup new Caddy server from scratch".

2. Connect the server to AWS Global Accelerator

After you have successfully created the new server, you need to add it to the AWS Global accelerator (load balancer). Following next steps:

  1. Enter the Global Accelerator Page in AWS Console
  2. Select CustomDomainsProduction
  3. You'll find a listener for each port 443 and 80
  4. Select a listener and press button Add enpoint group
  5. In region info select the region of the new server
  6. Expand Configure health checks
  7. Set Health check port to 80
  8. Set Health check protocol to HTTP
  9. Set Health check path to /health
  10. Set Health check interval to 10
  11. Set Threshold count to 2
  12. Click Next and add the caddy server as endpoint.
  13. Save your changes
note

Repeate steps 4.1 to 4.2 for each listener 443 and 80

IMPORTANT

Make sure to add the IP of the new instance to Farfalla.

Be mindful, if you resize or modify the IP of one of the instance you'll need to update the provious file in farfalla

Upgrading Caddy build

1. Get custom build

If you want to updagrade Caddy to a new version, but not it's config, follow the next steps:

  • Visit Caddy Download page
  • Select platform Linux amd64
  • Select module caddy.storage.dynamodb
  • Select module dns.providers.cloudflare
  • Press button Download, wait for the binary to be built and downloaded

2. Upload to S3

Once you've downloaded the binary caddy_linux_amd64_custom, upload it to our caddy store repository in our s3 account: https://caddy-store.s3.amazonaws.com/.

The bucket is in the production account, the one with Account ID 375481448855.

Remember to also make the file publicly accesible.

important

Make sure to select the role publica.la - production to find the proper bucket

3. Upgrade servers using recipe

Now that you have uploaded the new version to the bucket, update the recipe "GENERAL Upgrade custom Caddy build" to poing to the new build and run it in the intended servers.

HTTPS Guard - Manually test

How to validate a customer's domain?

Production

We have to execute the following URL:

https://farfalla-https-guard.publica.la/api/v1/caddy-check-BYZJVBNM8WUVXRDZ?domain=domain_client

Replacing domain_client with the domain you want to verify.

This test can yield two results:

  1. Status 200, with a message saying Domain Authorized.

  2. Status 503. if the entered domain does not exist.

Staging

https://staging-farfalla-https-guard.publica.la/api/v1/caddy-check-BYZJVBNM8WUVXRDZ?domain=domain_client

Replacing domain_client with the domain you want to verify.

This test can yield two results:

  1. Status 200, with a message saying Domain Authorized.

  2. Status 503. if the entered domain does not exist.

For more information visit the following link

Troubleshooting

This section provides guidance on common issues and how to resolve them. While our Caddy server setup is robust, occasional issues can arise.

Handling Unhealthy Servers

Our monitoring service, OhDear, checks the health of each Caddy server by sending a GET request to its public IP address on the /health path (e.g., http://18.209.57.166/health). If a server fails this health check, OhDear sends an alert through Squadcast. These occurrences are rare.

If you receive an alert for an unhealthy server, follow these steps to investigate and resolve the issue:

  1. Access AWS Console: Log in to the appropriate AWS account (e.g., staging or production).
  2. Navigate to Global Accelerator: Go to the Global Accelerator service page.
  3. Identify Unhealthy Listener: Check the listeners. An "Unhealthy endpoint" status indicates an issue. Note that since the same server handles traffic for both port 80 (HTTP) and port 443 (HTTPS), both listeners might show as unhealthy if a server is down.
  4. Inspect Endpoint Group: Navigate to the specific "Endpoint group" associated with the unhealthy listener. The health status should also be visible here.
  5. Locate the EC2 Instance: Identify the EC2 instance acting as the endpoint. The health status will be visible here as well. Note the EC2 Instance ID (e.g., i-04c795823b2c82eb7).
  6. Reboot the Instance: If the server is confirmed to be unresponsive or unhealthy, select the EC2 instance and choose the "Reboot" option. In most cases, a reboot resolves the issue.

It is unusual for these servers to become unresponsive. If rebooting does not resolve the issue, further investigation into Caddy logs or server metrics may be necessary.

Maintenance Log

Thursday 2024.09.09

Responsible: Franco Gilio and Ignacio Milano Reason:

  • Continuation from Thursday 2024.09.05 maintenance tasks

Action:

Server(s): All servers

Thursday 2024.09.05

Responsible: Franco Gilio and Ignacio Milano Reason:

  • A bug in Caddy, ZeroSSL or both generated errors during the certificates renewal. Refs: 1, 2
  • Caddy fallback to Let's Encrypt worked fine until it reached it's rate limit.
  • At one point all certificates of *.publica.la failed renewal, custom domains where not affected because each has it's own rate limit with Let's Encrypt.

Action:

  • Updated staging and production servers to use a wildcard certificate for *.publicala.me, *.staging-farfalla.publica.la and *.publica.la.
  • Took it as an opportunity to work on this task.
  • Took it as an opportunity to simplify the Caddyfiles, now reusing some portions.
  • Took it as an opportunity to improve the documentation of how our complete HTTPs system works.

Server(s): All servers


Thursday 2024.06.06 - 02

Responsible: Franco Gilio Action:

  • Update /latest-issue-cover-image route to use micelios new handler that returns the image instead of a redirect

Server(s): All servers


Thursday 2024.06.06 - 01

Responsible: Franco Gilio Action:

Server(s): All servers


Wednesday 2023.06.21

Responsible: Franco Gilio Action:

  • Setup new server custom-domains-prod-eu-02, because AWS turned off the previous server

Server(s): All servers


Saturday 2023.03.11

Responsible: Franco Gilio Action:

  • Upgrade Caddy to version v2.6.4 with DynamoDB storage package to 3.0.1
  • Create fresh servers for all regions, based on Ubuntu 22
  • Add Ireland as EU 01 region

Server(s): All servers


Friday 2022.12.16

Responsible: Franco Gilio Action: Upgrade Caddy to version v2.6.2 and enable GZIP compression Script: caddy_server_upgrade.sh Server(s): All servers


Mon Jan 27, 2022

Responsible: Gonzalo Parra Action: Upgrade Caddy to version v2.4.6 Script: caddy_server_upgrade.sh

Server(s): All servers


Mon Aug 2, 2021

Responsible: Franco Gilio & Ignacio Milano Action: Add Let's Encrypt as issuer fallback in Caddy Script: Add Let's Encrypt as issuer fallback in Caddy - PRODUCTION

Server(s): PRODUCTION Caddy servers


Mon Aug 2, 2021

Responsible: Franco Gilio & Ignacio Milano Action: Add Let's Encrypt as issuer fallback in Caddy Script: Add Let's Encrypt as issuer fallback in Caddy - STAGING

Server(s): STAGING Caddy servers


Thu Jul 29, 2021

Responsible: Gonzalo Parra Action: Update dynamodb configuration in Caddyfile Script: dynamoDB_fix.sh

Server(s): All servers


Mon Jul 5, 2021

Responsible: Gonzalo Parra Action: Upgrade Caddy to version v2.4.2 Script: caddy_server_upgrade.sh

Server(s): All servers


X

Graph View