Recently, I decided to put my own site behind Amazon’s Cloudfront CDN service. One way that they offer to accomplish this is via an origin pull. This means that the Cloudfront service connects to your own server, pulls your content from it, and distributes it to it’s various edge locations.
In a nutshell, you create a distribution at AWS, which is essentially a series of rules that define what origin server to pull from, whether to pay attention to things like cookies and querystrings, how long to cache things for, what data centers to cache the data at, etc. Amazon responds by creating a unique subdomain for your distribution on the cloudfront.net domain, and connects it transparently to a combination of EC2 instances and S3 storange containers (you don’t need to worry about these – they take care of it). At that point Cloudfront opens persistent connections to your server to pull, cache, and distribute content to end users as needed. You can even point to your Cloudfront domain via a CNAME record in your own domain.
This is quick and easy to set up, and works well.
After hunting through my logs, I noticed something about the requests from Cloudfront to my origin server. Here’s an example line:
220.127.116.11 - - [06/Jan/2013:06:43:24 -0500] "GET /?feed=atom HTTP/1.0" 200 10084 "-" "Amazon CloudFront"
It turns out that HTTP/1.0 which Cloudfront uses for its requests was causing the issue – by default the gzip_http_version in nginx is 1.1. As long as it’s set that way, your content will get sent to Cloudfront uncompressed, which can cost you more in bandwidth costs and result in longer wait times as Amazon will also be sending more data to your users per request.
Remedy this by making sure that for content you’re distributing with Cloudfront from an origin server, your nginx configuration on that server includes:
In the end, my gzip settings ended up being:
…and my content is once again compressed when transmitted through AWS.