Web Fundamentals: HTTP Caching

Let’s walk through the mechanics of HTTP caching. HTTP caching is used to reduce latency by delivering content from caches that are closer to the client and reducing bandwidth since no network traffic is required to serve a (locally) cached resource.

There are two types of caches. Private caches and public caches.

A public cache is something like a shared cache which usually sits between the server and the user agent (browser). Those public caches or HTTP proxies can usually be found at large cooperates and ISPs. Public caches are not used for resources which require HTTP Authentication. Furthermore, HTTPS encrypted traffic can also not be cached.

A private cache is located at the client and cannot be used by other clients. It’ usually the browser’s cache. Also authenticated and encrypted requests are subject to private caching if not stated otherwise. If you don’t want to have sensitive information stored on the user’s client (e.g. credit card details) caching should be disabled (see Cache-Control: no-store).

Controlling caching with HTTP headers

With the HTTP header Cache-Control you can specify the caching policies for requests and responses. A caching policy for a response could tell the user agent that the response should not be cached or caching is allowed by private caches only.

no-cache vs. no-store

The no-store directive advises the user agent and public caches to not store the response. So no copy of the response should be stored locally.

The no-cachedirective forces caches to perform the request to the origin server for validation before releasing a cached copy.

Let’s look at common Cache-Control header directives to control response caching:

  • no-store – disable caching, no local copies are stored
  • no-cache – allows caching, but a request must be sent to the server for validation.
  • public – marks authenticated responses as cachable. By default, authenticated responses are marked as private.
  • private – allows caching for single users, usually in the user agent’s cache.
  • must-revalidate – instruct the cache that has to follow the defined freshness rules without exceptions. In some circumstances, caches are allowed to serve stale content which can be prevented by this directive.
  • max-age=<seconds> – Defines the relative time in seconds since the request until the cached version expires.

How not to control caching

  • HTML meta tags
  • Pragma HTTP Headers

Cache validation

ETags

An ETag is a fingerprint of the resource’s content. If the cached version of a response has expired, the user agent sends the cached fingerprint along with the request. The server compares the fingerprint and can skip the response by returning a 304 “not modified” response instead of the actual (unchanged and thus already cached) content.

Request with If-None-Match header

To make a request conditional, the client sends the If-None-Match HTTP header with the cached ETag value. The server responds with a 200 “OK” if and only if the ETag send with the request does not match the ETag of the current version of the resource.

If the none-match condition failed, which means that the resource hasn’t changed, the HTTP server responds with a 304 “Not Modified” status.

An interesting fact about ETags is that it can be abused for user tracking. You’ll find more details in the ETag Wikipedia article.

Invalidation and update of a cached resources

Your users deserve fast loading times and thus you’re extensively using caching with long expiration times. That’s great, but how do you make sure that your users get the latest and greatest updates of your web application?

To profit from caching and also make sure that new resources get loaded you can change the filename when the file’s content changes. Usually, a hash of the file’s content is used and appended to the file name. This ideally happens during build time.

This approach can only work if the HTML document is re-validated on each request. Otherwise, the new URLs are not visible to the client.

HTTP/2 and caching

The major advantage of HTTP/2 is the reuse of an existing TCP connection to transfer multiple resources instead of opening one TCP connection per request.

Caching works as in HTTP/1.1 and is mainly controlled by the Cache-Control headers and ETags with conditional requests. When it comes to web performance optimization HTTP/2 introduced two new features which are not present in HTTP/1.1. Stream prioritization lets the user agent specify what order they want to receive resources. Server push sends extra resources to the user agent before it knows that they are needed.

Links

Did you like this post?

Web Fundamentals: Overview

The web consists of three separate concepts:

  • URL – Uniform Resource Locator
    • unique identifier for a resource in the web
  • HTTP – HyperText Transfer Protocol
    • Protocol to retrieve a representation of a resource through a URL.
  • HTML – HyperText Markup Language
    • an HTML document can represent a resource and link to other resources through their URL

URL

A URL identifies and locates a resource anywhere in the web.

An identifier is unique if at most one entity corresponds to it. For example, a Sales tax identification number uniquely identifies a person. It’s a unique identifier, but it’s not a locator since it won’t tell you where the person can be found.

A locator is unique if at most one location corresponds to it. For example, a post address uniquely identifies a location. It points to one specific location, but it doesn’t allow you to identify a person.

So a well-defined URL combines identification and location. Let’s look at the following URL:

https://de.wikipedia.org/wiki/Tim_Berners-Lee

The URL identifies the Wikipedia article of Tim Berners Lee, the inventor of HTML and the World Wide Web. It also allows us to locate the Wikipedia article. We can put the URL in our browser’s location bar to retrieve the article.

HTTP

HTTP is a protocol to transfer representations from a server to a client. The protocol standardizes how clients send a request for a representation of a resource through its URL.

HTTP standardizes how servers reply with a response that can contain a representation.

The client request

After resolving the server’s IP address, the client can send an HTTP request. A request consists of three parts:

  • request line – indicates a method, request URI and HTTP version
  • header fields – key-value-pairs including Host, Accept and User-Agent
  • body – an optional body

This is how a HTTP request for the Tim Berners Lee article looks like:

GET /wiki/Tim_Berners-Lee HTTP/1.1
Host: de.wikipedia.org
User-Agent: curl
Accept: text/html

The request line starts with the HTTP method. It is case-sensitive. The following most widely used methods are:

  • GET – transfer a representation
  • HEAD – transfer only status and headers (no body)
  • POST – perform a resource-specific operations
  • PUT – replace all representations
  • DELETE – remove all representations
  • OPTIONS – describes the communication options for the target resource

Read-Only methods like GET, HEAD, and OPTIONS are not causing any state change or side-effect on the server side. This is an important and servers should never break this contract.

PUT and DELETE can change the state of a resource. Both methods are idempotent if repetitions are not altering the outcome. The same request can be performed multiple times and the result remains the same.

After the method, you find the request URI which is the URL path followed by the protocol version. All three components are separated by a whitespace character.

Following the request line, the header fields are specified. The request header fields allow the client to pass additional information about the request and about the client itself to the server.

A client must include a Host header in all HTTP/1.1 request messages. Although the client resolves the server’s hostname to an IP address, the hostname is still sent to the server. This has the benefit that one server can host multiple websites. The Host header tells the server which one to pick.

The server response

When a server receives a request, it generates a response. A HTTP response is structured as followed:

  • status line – indicates HTTP version, status code and reason phrase
  • header fields – additional information about the response, including Content-Type, Content-Length, etc.
  • body – an optional body containing the actual content

Let’s see how a HTTP response for our previous request could look like:

HTTP/1.1 200 OK
Date: Sat, 18 Apr 2020 23:47:12 GMT
Content-Type: text/html; charset=UTF-8
Last-Modified: Sun, 24 Jan 2016 17:12:34 GMT

<!DOCTYPE html>
<html lang="en">
...

A HTTP response has a status code which indicates how the request was handled. The status code is a 3-digit integer and are fully defined. Status codes are classified into five different categories. The first digit defines the class of the response:

  • 1xx – Informational, request received, continuing process
  • 2xx – Success – request understood and accepted
  • 3xx – Redirection – further action required to complete the request
  • 4xx – Client Error – request contains bad syntax or cannot be fulfilled
  • 5xx – Server Error – server failed to fulfill the request

In our example, the server responds with a status code 200, which means that the request was handled successfully.

HTML

The HyperText Markup Language is a markup language (not a programming language) that captures the structure of a document. HTML comes along with other technologies like CSS that describes the appearance or JavaScript that describes behaviour.

The most powerful feature of HTML is Hypertext. Hypertexts are links that connect web pages to one another. Links can point to resources on the same website or external ones. This feature makes the Web so powerful.

HTML divides a document into elements that are indicated by opening and closing tags, which consist of the element name surrounded by “&lt;” and “&gt;“. An element can have additional attributes which are located inside the opening tag. Here is an example of an image tag:

<img src="image.jpg" alt="a image" />

The image tag in this example uses a self-closing tag since it has no child nodes. An example of an element with child nodes is the paragraph tag:

<p class="summary">
  This is a paragraph<br/>
  with <em>emphasized</em> words.
</p>

The HTML specification defines a set of elements that have certain semantics and can be used in HTML. The specification also contains rules about the ways in which the elements can be nested. A HTML documents consist of a tree of elements and text. Here is an example of a basic HTML document:

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>Document</title>
  </head>
  <body>
    <h1>Headline</h1>
  </body>
</html>

Did you like this post?