AWS CloudFront

Introduction to CDN

Suppose there is a new website in a local area that is gaining traction. Soon, the organization that is behind the website is experiencing a surge in their traffic from other local areas as well as other countries, but realizes that if they don't serve the content of the website with faster response time, this popularity will eventually fizz out and they won't be able to cash in at this opportunity. Why this is important? Because the websites that loads faster will provide a better user experience as no user in today's time wants to wait longer than expected to consume the content served on their website.

So, what's the solution to the above problem? Add more servers that hosts web front end content (Static Content such as HTML, CSS and Dynamic Content such as Javascript). Yeah that solves this problem on a small area wise division where those servers would be put. But what if website is receiving traffic from a brand new area. The organization just can't keep adding servers as this would be costly as well as non scalable. Here enters the service to rescue such situation and that solution is none other than CDN. Yes, Content Delivery Network can be used to provide a scalable solution to resolve the serving of web content faster by having a bunch of servers that are geographically distributed across different locations. This will server the content from the nearest CDN node server to the user. One can consider the CDN server as remote cache of the website. They are basically reducing the distance between the origin of server hosting the content and the user of the website, automatically reducing the load times.

Other than the faster load times for better user experience, using caching and optimizations, CDNs are able to save bandwidth, increase content availability and redundancy to hardware failures of web servers by providing alternate servers and may also be able to protect against DDOS attacks and provide fresh TLS/SSL certificates to host. They can also compress files and minification. Apart from that, CDNs use load balancing, failover and anycast routing to make sure that even if other CDN servers or even if the data center goes down, the experience would be seamless and problems would be unknown to the end user and website is always up. CDNs can also provide or enable WAFs to protect the websites against vulnerabilities. Hence CDNs are important for any individual or organization with an internet property. CloudFront is AWS's answer to the service of CDN.

Tech behind CloudFront

As explained in AWS Global infrastructure page, there are many edge locations and regional edge locations that allow caching of the original website. These actually help because the regional edge may cache your website first and then other edge locations can take a copy from regional edge caches instead of taking it from the source directly. This is known as tiered caching.

CloudFront uses optimizations such as TCP fast open (TFO - alternate to continue TCP connection w/o 3 way handshake for repeated connections), and keep alive connections have definitely improved performance.

One Important thing which defines CDN is Caching Behavior, which are set of rules bound to an origin that define how the CDN handles and process incoming requests. This behavior is controlled by multiple configuration settings such as

  • Path Patterns (it is like caching will depend on matching pattern rule example: /images/*.jpg -> /images/* -> *.gif, so a /image/sample.gif will match with 2nd pattern first so that caching behavior will be used to cache the object from the origin server),

  • Multiple Origins (Website components or the entire website can be kept at multiple origins which may depend on the region, i.e, content may change depending on country like in Netflix),

  • Query Strings (example: search parameter in Get URL of the website can get cached to serve the content faster to newer audience if there is a surge in a certain search query after the first person searches where the content is directly served from origin server and then it gets cached for subsequent searches),

  • Securing Objects through HTTPS (depending on whether http is hit or https, different cache can be served)

  • TTL (Different TTLs for different cached elements)

The above configuration are complimentary to other elements such as origin behavior, cache duration, cookie or query string forwarding, request headers, compression, encryption etc.

CloudFront's cache retention is made to keep objects longer in cache and to minimize cache churn using techniques like tiered caching (explained earlier) and de-duplication (Instead of caching based on URL for every new request which is different from previous URL hits, it is always better to divide the entire content from origin server into different small elements, also known as chunks; these chunks of elements is kept with unique reference such that these references ultimately add up to serve any content. This ultimately helps to remove redundant chunks and only if new content is asked, then only new element chunk is made).

CloudFront Features

  • CloudFront also supports Pay-as-you-go model. Hence one can use CloudFront when let's just say they have a anticipated demand like during some sale to increase app availability.

  • CloudFront Data transfer is free , if the origin server is an AWS endpoint like S3 or EC2 etc as long as the traffic is within AWS.

  • CloudFront also supports multiple origins that will enable redundancy.

  • CloudFront also supports micro services etc to support modern day application tier.

  • CloudFront also supports Access Control to kind of provide geographical restrictions to your website.

  • CloudFront supports WebSocket, as well as the HTTP protocol.

  • CloudFront integrates easily with AWS Shield and WAF to protect against a large variety of attacks including DDOS

  • CloudFront integrates easily with AWS Certificate Manager to automatically handle renewal of TLS certificates and provide features such as OCSP stapling (OCSP is a real-time check of the status of a certificate validity, that browser normally performs when a user visits on https website, by checking with the CA that issued certificate has not been revoked and OCSP stapling is a process to keep a digitally-signed and time-stamped version of the OCSP response directly on the webserver for faster TLS /SSL handshakes), session tickets (Session Tickets help speed up the time spent restarting or resuming an SSL session by encrypting SSL session information and storing it in a ticket that the client can use to resume a secure connection instead of repeating the SSL handshake process), perfect forward secrecy (Forward secrecy achieved by generating new session keys for each message ensuring that past communications cannot be decrypted even if the secret long-term key is compromised.), and field level encryption (Field-level encryption configurations helps protect specific data that end users inserts in POST requests. CloudFront encrypts the data at the edge location, using a public key that is provided to it, before forwarding the request to the origin. The specific application component that has the appropriate private key can decrypt the data, but the data remains protected as it’s passed through other parts of the system).

  • One can restrict access to content through a number of capabilities. example, with signed URLs and signed cookies, one can support token authentication to restrict access to only authenticated viewers.

  • Through Geo-restriction, one can prevent users in specific geographic locations from accessing certain content. With OAI, or origin access identity, one can restrict access to an S3 bucket to only be accessible from CloudFront.

  • In CloudFront, Lamda@Edge can help to serve requests faster than previously by using Edge Locations.

  • When certain portion of website (such as css) is updated, CloudFront supports invalidation to ultimately re-cache the new file. If one wants to update files frequently, it is recommended to primarily use file versioning for faster processing. At max, CloudFront supports 3000 invalidations (it can be any combination - 3000 requests for 1 file each, one request for 3000 files or 30 requests for 100 files each). First 1000 invalidations in a month are free, rest are chargeable. These 1000 invalidation is overall limit. If an organization has more than 1 distribution, so 1000 free will be applicable as changes over sum of both and not individually.

  • CloudFront also supports Error Handling for better user experience

Few important pointers before moving ahead:

The flow of request is is as follows:

When the response is served from Cache, it is called as Cache hit and when CDN has to go to origin request to fetch response, then it is called as cache miss.

Setting up AWS CloudFront

Prerequisite : An application hosted on EC2 or Elastic Beanstalk with or w/o Application Load Balancer is required.

There are mainly 3 config parts/areas, which are Origin Settings, Cache Behavior and distribution settings.

  • Let's begin with the first one: Origin Settings:

  • Cache Behavior Settings

For static content, managed cache policy of "Caching Enabled" can be easily used

Please refer this for more details about field level encryption configuration

  • Other Distribution Settings

Additional Configuration Options in CloudFront

The above snapshot makes one thing very clear that for a single distribution, there can be multiple behaviors based on different URLs or headers or cookies etc. Hence, it can be said that CloudFront Distribution is a collection of an origin and all associated caching rules while handling its traffic.

Error Handling

Invalidations

Invalidations as shown above is basically a way developer or AWS Admin telling CloudFront that its needs to update its cache at the edge location by taking the content from Origin. It does it when the viewer requests the updated file and due to this invalidation configuration, CloudFront considers it as cache miss and gets the file from origin.

Geo Restrictions

WAF

Origin Access

Two ways of Origin Access: Origin Access Control and Origin Access Identity. OAC uses signed URL's/Cookies to restrict access to certain users. OAI uses canonical user of CloudFront to have permission to read a bucket. OAC is more stringent than OAI.

  • OAC requires use of Amazon library during development process of application code present in the origin server.

Create 2048 bit new priv key pair online using: https://travistidwell.com/jsencrypt/demo/

Save the above and users will be served with signed URLs of the format:

  • OAI can be best explained with the following diagram:

Select appropriate settings and necessary details and output can be obtained.

Field Level Encryption

The use case is explained earlier, i.e. protection of post body parameters. The way to set this is as follows:

Post that it is possible to set Field Level Distribution in Behaviour page

Lambda@Edge

Lambda@Edge allows devs to deploy Node or Python code at the same edge locations where the content is cached. It's a fully managed solution, so there's no servers to manage, which helps on security, compliance, and also your time and resources.

Points to Remember:

  • Only Node and Python are supported languages for running on Lambda@Edge. One can't just deploy Lambda@Edge by itself. It has to be associated with a CloudFront distribution in order to deploy to the edge locations.

  • Use Case of Lambda: CloudFront can customize the content delivery perhaps in such a manner that one can send search engine crawlers to a static version of the content while delivering real users to the dynamic version. Another use case is to do content security by adding or altering headers.

  • Working of Lamda@Edge:

Last updated