The web consists of three separate concepts:
- URL – Uniform Resource Locator
- unique identifier for a resource in the web
- HTTP – HyperText Transfer Protocol
- Protocol to retrieve a representation of a resource through a URL.
- HTML – HyperText Markup Language
- an HTML document can represent a resource and link to other resources through their URL
A URL identifies and locates a resource anywhere in the web.
An identifier is unique if at most one entity corresponds to it. For example, a Sales tax identification number uniquely identifies a person. It’s a unique identifier, but it’s not a locator since it won’t tell you where the person can be found.
A locator is unique if at most one location corresponds to it. For example, a post address uniquely identifies a location. It points to one specific location, but it doesn’t allow you to identify a person.
So a well-defined URL combines identification and location. Let’s look at the following URL:
The URL identifies the Wikipedia article of Tim Berners Lee, the inventor of HTML and the World Wide Web. It also allows us to locate the Wikipedia article. We can put the URL in our browser’s location bar to retrieve the article.
HTTP is a protocol to transfer representations from a server to a client. The protocol standardizes how clients send a request for a representation of a resource through its URL.
HTTP standardizes how servers reply with a response that can contain a representation.
The client request
After resolving the server’s IP address, the client can send an HTTP request. A request consists of three parts:
- request line – indicates a method, request URI and HTTP version
- header fields – key-value-pairs including Host, Accept and User-Agent
- body – an optional body
This is how a HTTP request for the Tim Berners Lee article looks like:
GET /wiki/Tim_Berners-Lee HTTP/1.1
The request line starts with the HTTP method. It is case-sensitive. The following most widely used methods are:
- GET – transfer a representation
- HEAD – transfer only status and headers (no body)
- POST – perform a resource-specific operations
- PUT – replace all representations
- DELETE – remove all representations
- OPTIONS – describes the communication options for the target resource
Read-Only methods like GET, HEAD, and OPTIONS are not causing any state change or side-effect on the server side. This is an important and servers should never break this contract.
PUT and DELETE can change the state of a resource. Both methods are idempotent if repetitions are not altering the outcome. The same request can be performed multiple times and the result remains the same.
After the method, you find the request URI which is the URL path followed by the protocol version. All three components are separated by a whitespace character.
Following the request line, the header fields are specified. The request header fields allow the client to pass additional information about the request and about the client itself to the server.
A client must include a
Host header in all HTTP/1.1 request messages. Although the client resolves the server’s hostname to an IP address, the hostname is still sent to the server. This has the benefit that one server can host multiple websites. The
Host header tells the server which one to pick.
The server response
When a server receives a request, it generates a response. A HTTP response is structured as followed:
- status line – indicates HTTP version, status code and reason phrase
- header fields – additional information about the response, including Content-Type, Content-Length, etc.
- body – an optional body containing the actual content
Let’s see how a HTTP response for our previous request could look like:
HTTP/1.1 200 OK
Date: Sat, 18 Apr 2020 23:47:12 GMT
Content-Type: text/html; charset=UTF-8
Last-Modified: Sun, 24 Jan 2016 17:12:34 GMT
A HTTP response has a status code which indicates how the request was handled. The status code is a 3-digit integer and are fully defined. Status codes are classified into five different categories. The first digit defines the class of the response:
- 1xx – Informational, request received, continuing process
- 2xx – Success – request understood and accepted
- 3xx – Redirection – further action required to complete the request
- 4xx – Client Error – request contains bad syntax or cannot be fulfilled
- 5xx – Server Error – server failed to fulfill the request
In our example, the server responds with a status code 200, which means that the request was handled successfully.
The most powerful feature of HTML is Hypertext. Hypertexts are links that connect web pages to one another. Links can point to resources on the same website or external ones. This feature makes the Web so powerful.
HTML divides a document into elements that are indicated by opening and closing tags, which consist of the element name surrounded by “
<” and “
>“. An element can have additional attributes which are located inside the opening tag. Here is an example of an image tag:
<img src="image.jpg" alt="a image" />
The image tag in this example uses a self-closing tag since it has no child nodes. An example of an element with child nodes is the paragraph tag:
<p class="summary"> This is a paragraph<br/> with <em>emphasized</em> words. </p>
The HTML specification defines a set of elements that have certain semantics and can be used in HTML. The specification also contains rules about the ways in which the elements can be nested. A HTML documents consist of a tree of elements and text. Here is an example of a basic HTML document:
<!DOCTYPE html> <html lang="en"> <head> <title>Document</title> </head> <body> <h1>Headline</h1> </body> </html>