AWS CloudFront

Feb 25, 2021

Folks working in web development are familiar with CDNs - Content Delivery Networks. CDNs help in a way accelerate the delivery of static content like media, HTML, CSS, JSS files to end-users. This is a network of specialized servers called CDN servers. The network of CDN servers is called CDN. AWS CloudFront is a CDN service that does the same simultaneously giving you more control of the same.

We have known till now that AWS data centers are located in regions and that every region has 2 or more availability zones. Well, that's not the end. AWS also maintains Edge locations which are data center locations located closer to the users. Edge locations are used for content delivery services to minimize the delivery latency.

These Edge locations are also known as Point of Presence (POP) for AWS. Additionally, there are regional edge caches that are positioned between the origin server (or AWS data center) and POPs. They have more caching capabilities as compared to POPs. Caching of the content plays a very important role here, but before we discuss that let us first understand how CloudFront works.

When a user requests for a static resource, AWS data centers route the request to the Edge location which is closest to the user based on the minimum latency calculation. The Edge location checks if the requested file is present in its cache. If it is present the file is forwarded to the user, thus providing optimal performance.

However, if the file is not present in its cache, POPs forward the request to the regional edge cache. Regional edge cache repeats the same search procedure. If the file is available in its cache it forwards the same to POP, which in turn forwards the same to the user. If not, the regional edge cache requests the origin server for the file.

This way, the origin server is consulted only once (the first time) per regional edge location. Subsequent requests made for the same file via the same edge server are served directly from the edge server or regional edge cache. Regional edge caches have more caching capacity - this is because POPs periodically remove the files to make room for content that is more popular.

The origin servers could be S3 buckets, EC2 instance server web content, etc. To use CloudFront to serve static content of our websites via Amazon’s CDN network, we need to create a distribution. We specify the origin server in the distribution along with several other parameters like access, security, cache key, origin request settings, geo-restrictions, and logging.

The creation of CloudFront distribution provides us with a custom URL that can be used to request static content from applications. However, if you wish to use your own domain name e.g. example.com, you can do the same by adding an alternate domain name to the distribution.

As mentioned before, caching plays a major role in delivering the content fastest to the users. When requests are routed to Edge locations, Edge locations perform a search in the cache using cache keys. If the key is found i.e. the requested content is available and served, it is called a cache hit. If the key is not found, it is known as a cache miss. Cache hit and miss define cache hit ratio which determines the level of performance achieved.

With CloudFront policies, one can configure the way cache keys are handled as well as the origin requests. Policies written for controlling the cache keys result in reduced load on the origin server and reduced latency for end-users. Cache keys are associated values like URL query strings, HTTP headers, and cookies. Policies for controlling cache keys are written to these values of incoming requests.

Policies are written to control the origin request, determine the information that should be part of the request being forwarded to the origin server. Request information like the URL path, request body, and HTTP headers are controlled in this type of policy. By default, information used in cache policies is also part of the origin request. This information might as well be used for reporting and analytics.

In general, policies control the way request forwarding is handled. Policies for cache keys are different from origin requests. This is because to achieve optimal performance lesser cache keys should be used (inversely proportional). However, as far as origin requests are concerned, it is always good to get additional parameters for reporting and insights. Thus the policy types are maintained even though it works on the same set of parameters.

CloudFront lets us impose access restrictions on the content being made available via Edge locations. At times, organizations want only specific users to have access to certain data. For example, a resource file that contains confidential data should be accessed by those who are authorized.

There are several ways to impose these restrictions. One way is to use CloudFront signed URLs or cookies. When you store a file on S3 bucket or EC2 instance, if publicly available, it may be accessed using the URL directly to the resource. However, since this content can also be made available via CloudFront, signed URLs can be configured to control the access to specific sets of users.

Similarly, there might be a situation where we don't want users from a particular region to have access to certain content. In this situation, we can use geo-blocking to restrict users trying to access the files from a specific location. There are few more methods to restrict access like restricting access to S3, ALBs, using AWS WAF, and field-level encryption to protect sensitive data.

There is a lot more you can do with CloudFront when it comes to configuring the request and response behavior, generating custom responses, support for video on demand, and live streaming. For example, CloudFront can be used to serve error pages in case an application responds with a specific error code.

However, I found Lambda@Edge the most interesting. The idea is to execute Lambda functions on the servers located in Edge locations. Lambda@Edge is an extension of AWS Lambda where we trigger these functions based on an event. The event could be one of the below:

When CloudFront receives a request from the user
Before CloudFront forwards a request to the origin
When CloudFront receives a response from the origin
Before CloudFront forwards the response to the user

Lambda@Edge can be used to perform similar functions performed by some of the other CloudFormation configurations like generating error pages, they can access request information like headers, request body. Some of the situations where we may use Lambda@Edge are as below:

Overriding response header
Generate static responses on the fly
Access query string parameters to generate headers
.. so on.

That was a brief introduction to AWS CloudFront. Of course, the posts in these blog series are not meant to recreate AWS documentation. These posts intend to provide a flying overview of what this service is about. If you like the content, consider subscribing, following, and sharing this blog post!

Let's Do Tech

Discussion about this post