42-webserv

RFC2616 - summary

The Hypertext Transfer Protocol (HTTP) is an application-level
protocol for distributed, collaborative, hypermedia information
systems. HTTP has been in use by the World-Wide Web global
information initiative since 1990. The first version of HTTP,
referred to as HTTP/0.9, was a simple protocol for raw data transfer
across the Internet. HTTP/1.0, as defined by RFC 1945 [6], improved
the protocol by allowing messages to be in the format of MIME-like
messages. - June 1999

HTTP Basics

HTTP Roles:

Client: Initiates requests (e.g., browsers, robots).
Server: Receives and processes requests, sending responses.
Proxy: Intermediary that forwards requests and responses.
Gateway: Intermediary that translates requests for one protocol to another.
Tunnel: Relays connections without modifying messages.
Cache: Stores responses for faster retrieval.

HTTP Communication:

Typically uses TCP/IP on port 80.
Involves a request-response cycle:
Client sends a request. Server processes the request and sends a response.

HTTP Messages:

Consist of a start line, headers, and an optional body.
Start line: METHOD REQUEST-URI HTTP-Version
Headers: Provide metadata about the request or response.
Body: Contains the actual data being transmitted.

HTTP Methods:

These are the ones we will use the most in our webserv project

GET: Retrieves a resource.
POST: Submits data to a server.
DELETE: Deletes a resource.
PUT: Replaces a resource with a new one.
HEAD: Retrieves metadata about a resource.
OPTIONS: Requests information about the communication options available on a resource.
TRACE: Echoes the received request back to the client.
CONNECT: Establishes a tunnel to a server.

HTTP URLs:

Example: http://host:port/path?query

host: Domain name or IP address.
port: Optional port number (default: 80).
path: Path to the resource.
query: Optional query string parameters.

HTTP Date/Time:

All timestamps are in GMT/UTC format.

HTTP Caching:

Caches store responses to reduce network traffic and improve performance.
Caching mechanisms include expiration times and validators (Last-Modified, ETag).
Caches can be configured to prioritize freshness, performance, or security.

what is MIME?

MIME (Multipurpose Internet Mail Extensions) is an Internet standard that extends the format of email to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets.

URI vs URL

A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. A Uniform Resource Locator (URL) is a specific type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network “location”). URI is a superset of URL. It can be a link to a phone number or mail address or URL.

content-coding value tokens.

gzip: An encoding format produced by the file compression program “gzip” (GNU zip)
compress: The encoding format produced by the common UNIX file compression program “compress”.

Chunked Transfer Coding:

This is the most common transfer coding used in HTTP/1.1. (actually the only one) It allows the server to send the response in chunks, which is useful when the total content length is not known at the start of the response. Each chunk is preceded by its size in hexadecimal format, followed by a CRLF (carriage return and line feed).
The end of the message is indicated by a chunk of size zero.

Transfer-Encoding Header

The Transfer-Encoding header is used to specify the form of encoding used to safely transfer the payload body to the user.
It can take multiple values, but “chunked” is the most commonly used.

Here is an example of an HTTP response using chunked transfer coding:

HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked

7\r\n
Mozilla\r\n
9\r\n
Developer\r\n
7\r\n
Network\r\n
0\r\n
\r\n

7\r\nMozilla\r\n: The first chunk is 7 bytes long and contains “Mozilla”.
9\r\nDeveloper\r\n: The second chunk is 9 bytes long and contains “Developer”.
7\r\nNetwork\r\n: The third chunk is 7 bytes long and contains “Network”.
0\r\n\r\n: The final chunk is 0 bytes long, indicating the end of the message.

All HTTP/1.1 applications MUST be able to receive and decode the “chunked” transfer-coding, and MUST ignore chunk-extension extensions they do not understand.

media types

HTTP uses Internet Media Types in the Content-Type and Accept header fields in order to provide open and extensible data typing and type negotiation.

media-type = type “/” subtype *( “;” parameter )
type = token
subtype = token

example

HTTP Request with Accept Header:
The Accept header in an HTTP request specifies the media types that the client can accept. The server uses this information to select an appropriate response format.

GET /example HTTP/1.1
Host: www.example.com
Accept: text/html, application/json;q=0.9, image/webp

text/html: The client prefers HTML content.
application/json;q=0.9: The client can also accept JSON content, with a quality factor of 0.9 (lower preference than HTML).
image/webp: The client can accept WebP images.

HTTP Response with Content-Type Header

The Content-Type header in an HTTP response specifies the media type of the returned content. The Content-Type header is typically used in HTTP responses. Or in POST requests to specify the media type of the request body.

HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Content-Length: 138

<!DOCTYPE html>
<html>
<head>
    <title>Example</title>
</head>
<body>
    <h1>Hello, World!</h1>
</body>
</html>

Here is an example of an HTTP POST request with the Content-Type header:

POST /submit-form HTTP/1.1
Host: www.example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 27

name=John+Doe&age=30

HTTP Message

4.1 Message Types

HTTP messages consist of requests from client to server and responses from server to client.

HTTP Message

4.1 Message Types

HTTP messages consist of requests from client to server and responses from server to client.

GET /index.html HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive

Request Line: GET /index.html HTTP/1.1

GET: The HTTP method. /index.html: The requested resource. HTTP/1.1: The HTTP version. Headers:

Host: Specifies the domain name of the server. User-Agent: Provides information about the client software. Accept: Specifies the media types the client can accept. Accept-Language: Specifies the preferred languages for the response. Accept-Encoding: Specifies the content encodings the client can accept. Connection: Controls whether the network connection stays open after the current transaction.

Example: POST Request

POST /submit-form HTTP/1.1
Host: www.example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 27

name=John+Doe&age=30

Certainly! Here are examples of HTTP messages, including both requests from a client to a server and responses from a server to a client.

HTTP Request Example

An HTTP request message consists of a request line, headers, and an optional body.

Example: GET Request

GET /index.html HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive

Request Line: GET /index.html HTTP/1.1
- GET: The HTTP method.
- /index.html: The requested resource.
- HTTP/1.1: The HTTP version.
Headers:
- Host: Specifies the domain name of the server.
- User-Agent: Provides information about the client software.
- Accept: Specifies the media types the client can accept.
- Accept-Language: Specifies the preferred languages for the response.
- Accept-Encoding: Specifies the content encodings the client can accept.
- Connection: Controls whether the network connection stays open after the current transaction.

Example: POST Request

POST /submit-form HTTP/1.1
Host: www.example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 27

name=John+Doe&age=30

Request Line: POST /submit-form HTTP/1.1
- POST: The HTTP method.
- /submit-form: The requested resource.
- HTTP/1.1: The HTTP version.
Headers:
- Host: Specifies the domain name of the server.
- Content-Type: Specifies the media type of the request body.
- Content-Length: Specifies the length of the request body.
Body: name=John+Doe&age=30
- The data being sent to the server.

HTTP Response Example

An HTTP response message consists of a status line, headers, and an optional body.

Example: 200 OK Response

HTTP/1.1 200 OK
Date: Mon, 27 Jul 2009 12:28:53 GMT
Server: Apache/2.2.14 (Win32)
Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT
Content-Length: 88
Content-Type: text/html
Connection: Closed

<!DOCTYPE html>
<html>
<head>
    <title>Example</title>
</head>
<body>
    <h1>Hello, World!</h1>
</body>
</html>

Status Line: HTTP/1.1 200 OK
- HTTP/1.1: The HTTP version.
- 200: The status code.
- OK: The reason phrase.
Headers:
- Date: The date and time the response was generated.
- Server: Information about the server software.
- Last-Modified: The date and time the resource was last modified.
- Content-Length: The length of the response body.
- Content-Type: The media type of the response body.
- Connection: Indicates whether the connection should be closed after the response.
Body:
- The HTML content of the response.

Example: 404 Not Found Response

HTTP/1.1 404 Not Found
Date: Mon, 27 Jul 2009 12:28:53 GMT
Server: Apache/2.2.14 (Win32)
Content-Length: 230
Content-Type: text/html; charset=iso-8859-1
Connection: Closed

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
    <title>404 Not Found</title>
</head>
<body>
    <h1>Not Found</h1>
    <p>The requested URL /not-found was not found on this server.</p>
    <hr>
    <address>Apache/2.2.14 (Win32) Server at www.example.com Port 80</address>
</body>
</html>

Status Line: HTTP/1.1 404 Not Found
- HTTP/1.1: The HTTP version.
- 404: The status code.
- Not Found: The reason phrase.
Headers:
- Date: The date and time the response was generated.
- Server: Information about the server software.
- Content-Length: The length of the response body.
- Content-Type: The media type of the response body.
- Connection: Indicates whether the connection should be closed after the response.
Body:
- The HTML content of the response, indicating that the requested resource was not found.

Summary

HTTP Request: Consists of a request line, headers, and an optional body. Examples include GET and POST requests.
HTTP Response: Consists of a status line, headers, and an optional body. Examples include 200 OK and 404 Not Found responses.

reading the rfc 2616

RFC 2616: Resource Identification in HTTP Requests

When a client sends an HTTP request, the server needs to determine the exact resource being requested. This is done by examining two parts of the request:

Request-URI: This is the URL of the resource.
Host header field: This specifies the domain name of the server.

How the server determines the resource:

Absolute URI: If the Request-URI is an absolute URL (e.g., https://www.example.com/page.html), the server uses the host part of the URL and ignores any Host header field.
Relative URI with Host header: If the Request-URI is relative (e.g., /page.html) and a Host header is present, the server uses the host from the header.
Relative URI without Host header (HTTP/1.0): The server may use heuristics to guess the intended resource.

Example:

If a client sends a request to http://www.example.com/page.html, the server will identify the resource as /page.html on the www.example.com domain.

Request Headers:

Request headers provide additional information about the client and the request itself. They are similar to function parameters in programming languages. Some common request headers include:

Accept: Specifies the media types the client can understand.
Authorization: Contains credentials for authentication.
Host: Specifies the server the client is requesting.
User-Agent: Provides information about the client software.

Response Headers:

Response headers provide information about the server and the response. The first line of a response is the Status-Line, which includes:

HTTP-Version: The protocol version used.
Status-Code: A numeric code indicating the status of the request (e.g., 200 for OK, 404 for Not Found).
Reason-Phrase: A textual phrase describing the status code.

Other common response headers include:

Content-Type: Specifies the media type of the response body.
Content-Length: Specifies the length of the response body.
Server: Provides information about the server software.
Date: Specifies the date and time the message was sent.

RFC 2616: Persistent Connections

Purpose: Persistent connections improve HTTP performance by reducing the overhead of establishing and tearing down TCP connections for each request. This leads to:

Reduced network congestion: Fewer TCP packets are sent.
Lower latency: Subsequent requests are faster as the connection is already established.
Efficient resource utilization: Servers and clients can handle more requests with fewer resources.

How it works:

Default behavior: HTTP/1.1 connections are persistent by default.
Connection closure: Clients and servers can signal the closure of a connection using the Connection header with the close token.
Negotiation: Clients and servers can negotiate whether to keep a connection open based on the presence or absence of the Connection header.
Message length: All messages in a persistent connection must have a defined length to prevent ambiguity.

Practical Considerations for Persistent Connections

Timeouts:

Servers and proxies should have timeouts to avoid keeping idle connections open indefinitely.
Clients and servers should gracefully close connections when timing out.
Clients should be able to retry failed requests, especially for idempotent operations.

Connection Management:

Clients should limit the number of simultaneous connections to a server.
Servers should prefer flow control over abruptly closing connections.

The 100 (Continue) Status:

Clients can use the Expect: 100-continue header to check if the server is willing to accept a request body before sending it.
Servers should respond with 100 Continue or a final status code.

Client Behavior on Premature Connection Closure

If a client sends a request with a body and the connection is closed prematurely before receiving a response, the client should either:

Retry the request: Attempt to resend the request.
Implement exponential backoff: Increase the delay between retries to avoid overwhelming the server.
Stop retrying on error: If an error response is received, cease retries.

HTTP Method Definitions

Safe and Idempotent Methods:

GET: Retrieves a resource.
HEAD: Retrieves metadata about a resource.

Non-Safe, Idempotent Methods:

PUT: Replaces a resource with a new one.
DELETE: Deletes a resource.

Non-Safe, Non-Idempotent Methods:

POST: Submits data to a server (e.g., form data, file uploads).
OPTIONS: Requests information about the communication options available on a resource.
TRACE: Echoes the received request back to the client.
CONNECT: Used for tunneling protocols (e.g., SSL).

Method Descriptions GET: Retrieves a representation of a specified resource. Should not have side effects.
HEAD: Similar to GET, but only returns header information. Used for checking resource metadata.
POST: Submits data to a server for processing. Often used for creating new resources or triggering actions.

PUT: Replaces a resource with a new representation. Typically used for updating existing resources or creating new ones.
DELETE: Removes a specified resource.
OPTIONS: Requests information about the communication options available for a resource. Used for determining supported HTTP methods and other capabilities.
TRACE: Echoes the received request back to the client. Used for debugging and testing.
CONNECT: Establishes a tunnel to a server, typically for HTTPS over HTTP.

Method Allowability

The Allow header field indicates the HTTP methods supported by a resource.

405 Method Not Allowed: Returned when a requested method is not supported.
501 Not Implemented: Returned when the server does not recognize or support the requested method.

Note: Method names are case-sensitive.
GET and HEAD methods are required for general-purpose servers.
Other methods are optional but must adhere to the specified semantics.

HTTP Status Codes

Informational (1xx):

100 Continue: The initial part of the request has been received and is not rejected yet. The client should continue sending the request body.
101 Switching Protocols: The server understands the client’s request to switch protocols and will do so after the current response.

Successful (2xx):

200 OK: The request was successful.
201 Created: A new resource was created.
202 Accepted: The request was accepted for processing, but the processing has not been completed.
203 Non-Authoritative Information: The returned metainformation is not the definitive set.
204 No Content: The request was successful, but there is no content to return.
205 Reset Content: The user agent should reset the document view.
206 Partial Content: The server has fulfilled a partial GET request for the resource.

Redirection (3xx):

300 Multiple Choices: The requested resource has multiple possible representations. The client may choose one of them.
301 Moved Permanently: The requested resource has been permanently moved to a new location.
302 Found: The requested resource is temporarily located at a different URI.
303 See Other: The requested resource can be found under a different URI, and the client should use a GET method to retrieve it.
304 Not Modified: The resource has not been modified since the last request.
305 Use Proxy: The requested resource must be accessed through the specified proxy server.
306 (Unused): This status code is no longer used.
307 Temporary Redirect: The requested resource is temporarily located at a different URI. The client should continue to use the original URI for future requests. Client Error (4xx)

The 4xx class of status client error codes

400 Bad Request: The request was malformed and could not be understood.
401 Unauthorized: The request requires authentication.
403 Forbidden: The server understood the request but refuses to fulfill it.
404 Not Found: The requested resource could not be found.
405 Method Not Allowed: The specified method is not allowed for the requested resource.
406 Not Acceptable: The requested resource cannot generate a response that matches the Accept headers sent in the request.
407 Proxy Authentication Required: The client must first authenticate itself with the proxy.
408 Request Timeout: The client did not produce a request within the server’s timeout period.
409 Conflict: The request could not be completed due to a conflict with the current state of the resource.
410 Gone: The requested resource is no longer available.
411 Length Required: The server requires a Content-Length header field.
412 Precondition Failed: The precondition given in one or more of the request-header fields evaluated to false.
413 Request Entity Too Large:
414 Request-URI Too Long: The Request-URI is too long.
415 Unsupported Media Type: The server does not support the media type of the request entity.
416 Requested Range Not Satisfiable: The requested range is not satisfiable.
417 Expectation Failed: The server cannot meet the expectations specified in the Expect header.

HTTP Status Codes: Server Error (5xx)

500 Internal Server Error: The server encountered an unexpected condition.
501 Not Implemented: The server does not support the requested functionality.
502 Bad Gateway: The server received an invalid response from an upstream server.
503 Service Unavailable: The server is temporarily unavailable.
504 Gateway Timeout: The server timed out while waiting for a gateway or proxy to respond.
505 HTTP Version Not Supported: The server does not support the HTTP version used in the request.
504 Gateway Timeout: The server timed out waiting for a response from an upstream server.
505 HTTP Version Not Supported: The server does not support the HTTP version used in the request.

Server-Driven, Agent-Driven, and Transparent Negotiation

Server-Driven Negotiation:

The server selects the best representation based on information from the request (e.g., Accept headers).
Advantages: Efficient for the server, can provide a good initial guess.
Disadvantages: Less accurate in determining user preferences, can limit cache efficiency.

Agent-Driven Negotiation:

The client selects the best representation from a list provided by the server (e.g., using the 300 Multiple Choices status code).
Advantages: More accurate in matching user preferences.
Disadvantages: Requires an additional round trip to the server.

Transparent Negotiation:

A combination of server-driven and agent-driven negotiation.

HTTP Caching

HTTP caching is a mechanism to improve performance and reduce network traffic. It involves storing responses and reusing them for subsequent requests.

Expiration: Caches store responses for a specific time period.
Validation: Caches can validate cached responses with the origin server to ensure freshness.
Semantic Transparency: Caches should ideally preserve the original semantics of the response.

Cache Correctness and Warnings

Cache Correctness:

A cache must return the most up-to-date response.
This can be achieved by:

Revalidating the response with the origin server.
Checking if the response is “fresh enough” based on expiration times.
Returning a 304 Not Modified, 305 Use Proxy, or error response.
If the cache cannot communicate with the origin server, it should return a cached response or an error.

Warnings:
Caches must attach warnings to responses that are not fresh or first-hand. Warnings are categorized by warn-codes:
1xx: Freshness or revalidation warnings.
2xx: Warnings about entity body or headers.
Warnings can be multiple and in different languages.
Caches may prioritize warnings based on their severity or type.

User Agent Behavior:

User Overrides: User agents may allow users to override default caching behavior, such as disabling validation or setting aggressive maximum stale times.
User Notifications: User agents should notify users when non-transparent caching occurs, especially when displaying stale content.

Cache Behavior:

Stale Responses: Caches may return stale responses in certain circumstances, but must mark them with a Warning header.
Client Requests: Caches should respect client requests for fresh or first-hand responses.
Technical Limitations: Caches may not be able to fulfill all client requests due to technical or policy constraints.

Client-Controlled Behavior:

Clients can use the Cache-Control header to:

Specify a maximum acceptable age for unvalidated responses.
Force revalidation of all responses.
Specify a minimum time remaining before expiration.
Accept stale responses up to a certain age.

Server-Specified Expiration:

Explicit Expiration: Origin servers can set explicit expiration times using the Expires header or the max-age directive in the Cache-Control header.
Forced Revalidation: To force revalidation, servers can set an expiration time in the past or use the must-revalidate cache-control directive.

Limitations: Expiration times only apply to cached responses, not first-hand responses.

Heuristic Expiration:

When explicit expiration times are not provided, caches can estimate expiration times based on headers like Last-Modified.
Heuristic expiration should be used cautiously as it can compromise semantic transparency.

Age Calculation:

Age Value: The time elapsed since a response was generated or revalidated.
Corrected Initial Age: Accounts for network delays and request initiation time.
Current Age: The sum of the corrected initial age and resident time in the cache.

Calculation Steps:

Calculate the apparent age based on the Date header and response time.
Compare the apparent age with the Age header value and take the maximum.
Correct the received age for network delay.
Add the resident time in the cache to the corrected initial age.

Expiration Calculations:

Freshness Lifetime: Determined by the Expires header or max-age directive.
Age Calculation: Calculated based on the Date header, Age header, and request/response times.
Expiration Check: A response is fresh if its freshness lifetime is greater than its current age.
Disambiguating Expiration Values: If multiple fresh responses exist, the one with the newer Date header is preferred.

Validation Model:

Conditional Requests: Clients can make conditional requests using headers like If-Modified-Since and If-None-Match.
Server Validation: The server compares the validator (e.g., ETag or Last-Modified) in the request with the current validator of the resource.
Responses:
- If the validators match: The server sends a 304 Not Modified response.
- If the validators don’t match: The server sends a full response with a new validator.

Last-Modified:

A simple validator based on the last modification time of a resource.
Less precise, especially for resources that change frequently.

ETag (Entity Tag):

An opaque string that uniquely identifies a specific version of a resource. More precise and flexible than Last-Modified.
Can be used for both strong and weak validation:
Strong Validator: Changes whenever the resource content changes.
Weak Validator: Changes only when the semantic meaning of the resource changes.

Validation Process:

Conditional Request: A client sends a request with a validator (Last-Modified or ETag) in the If-Modified-Since or If-None-Match header.
Server Validation: The server compares the received validator with the current validator of the resource.
Response:
If the validators match: The server sends a 304 Not Modified response.
If the validators don’t match: The server sends a full response with a new validator.

Key Points:

ETags are more flexible and precise than Last-Modified.
Weak ETags can be used for resources that change frequently but semantically equivalent.
Strong validators are required for sub-range requests.
Caches and clients should use strong comparison for conditional requests, except for full-body GET requests with weak validators.

Cache Validators:

Last-Modified: A simple validator based on the last modification time of a resource.
ETag: A more precise and flexible validator, often used for stronger validation.

Rules for Using Validators:

Origin Servers: Should send both ETag and Last-Modified headers if possible. Can use weak ETags for resources that change semantically but not structurally.
Clients: Should use ETags when available. Can use Last-Modified for non-subrange requests. Can use Last-Modified for subrange requests if it’s a strong validator.

Expiration:

Explicit Expiration: Servers can set explicit expiration times using Expires or max-age.
Heuristic Expiration: Caches can estimate expiration times based on Last-Modified if explicit times are not provided.
Age Calculation: Caches calculate the age of a response based on Date and Age headers, as well as request and response times.

Cache Control:

Cache-Control Headers: Allow fine-grained control over caching behavior.
Directives: Include max-age, s-maxage, must-revalidate, public, and private.

Key Points:

Caches should prioritize strong validators (ETags) over weak validators (Last-Modified).
Clients and caches should follow specific rules for using validators and expiration times.

Constructing Responses from Caches

Header Types:

End-to-End Headers: Transmitted to the final recipient.
Hop-by-Hop Headers: Used for single connections, not stored or forwarded.

Header Modification Rules:

Transparent Proxies: Should not modify end-to-end headers, except for specific cases like Expires.
Non-Transparent Proxies: May modify headers but must add a Warning header if transformations are made.

Combining Responses:

304 Not Modified: Cache uses the stored entity-body.
206 Partial Content: Cache combines the received subrange with the stored subranges if the validators match.

Header Combination Rules:

End-to-end headers from the new response replace those in the cache entry.
Warning headers with warn-code 1xx are removed.
Warning headers with warn-code 2xx are retained.

Combining Byte Ranges:

A cache can combine subranges only if:
Both the incoming response and the cache entry have validators.
The validators match using strong comparison.

Caching Negotiated Responses and Invalidations

Caching Negotiated Responses:

Caches must consider the Vary header when caching responses.
If the Vary header includes request headers that differ between subsequent requests, the cache cannot use the cached response.
A Vary header of * indicates that the response is specific to the exact request and cannot be cached.

Invalidation:

Certain methods (PUT, DELETE, POST) require invalidation of the corresponding resource.
Caches should invalidate resources based on Location and Content-Location headers.
Caches should invalidate resources for methods they don’t understand.

Key Points:

Caches must be careful when handling negotiated responses to ensure correct behavior.
Invalidations are necessary to maintain data consistency.
Proper cache invalidation helps prevent stale data and security issues.

Write-Through Caching and History Mechanisms

Write-Through Caching:

All non-safe methods (POST, PUT, DELETE, etc.) must be written through to the origin server before responding to the client. This ensures data consistency and prevents potential data loss.

Cache Replacement:

When a new cacheable response is received, the cache can replace an existing entry.
The replacement decision should consider factors like expiration time, frequency of access, and cache capacity.

History Mechanisms:

User agents may store previously viewed resources for later access.
History mechanisms should not enforce expiration times.
User agents may warn users about the potential staleness of history items.

Key Points:

Write-through caching is essential for maintaining data consistency.
Caches should prioritize newer, more relevant content when replacing entries.
History mechanisms should provide a way for users to view past content, even if it’s stale.

The Accept Header

The Accept header field is used to specify the media types that a client is willing to accept in a response. This allows the server to tailor the response format to the client’s capabilities.

Key Points:

Media Ranges: The Accept header can specify media types, media type ranges, or a wildcard (/) to indicate acceptance of any media type.
Quality Values: Quality values (qvalues) can be used to express preference for certain media types. A higher qvalue indicates a stronger preference.
Default Behavior: If no Accept header is present, the server may assume that the client accepts all media types.
Server Response: If the server cannot send a response in a format acceptable to the client, it should send a 406 Not Acceptable response.

Example:

Accept: text/html, application/xhtml+xml, */*

This header indicates that the client prefers HTML, but is also willing to accept XHTML or any other format.

Additional Considerations:

Character Sets: The Accept-Charset header can be used to specify preferred character sets.
Content Encoding: The Accept-Encoding header can be used to specify preferred content encodings (e.g., gzip, deflate).

Accept-Language Header

The Accept-Language header field is used to specify the preferred languages for the response. This allows the server to tailor the response content to the client’s language preferences.

Key Points:

Language Ranges: The header can specify specific languages or language ranges. Quality Values: Quality values (qvalues) can be used to indicate the relative preference for each language. Matching: The server matches the requested languages with the available languages and selects the best match based on the quality values. Default Behavior: If no Accept-Language header is present, the server may assume that any language is acceptable.

Example:

Accept-Language: en-US, en;q=0.8, fr;q=0.5

This header indicates that the client prefers English (US), but is also willing to accept other English dialects or French.

Additional Considerations:

Privacy: Sending a detailed Accept-Language header might reveal user preferences. Clients should be cautious about the level of detail they include. Language Matching: Language matching can be complex, especially when dealing with regional variants and dialects.

The Allow Header

The Allow header is used to specify the HTTP methods that are supported by a resource. This information is typically returned in a 405 Method Not Allowed response.

Key Points:

Method Listing: The header lists the allowed methods in a comma-separated list. Informational Purpose: The Allow header informs the client of the valid methods, but it doesn’t enforce them.
PUT Requests: The Allow header can be used to recommend supported methods for a newly created or modified resource.

Example:

Allow: GET, HEAD, PUT, DELETE

This indicates that the resource supports GET, HEAD, PUT, and DELETE methods.

The Authorization Header

The Authorization header is used to provide authentication credentials for a request. It’s typically used in response to a 401 Unauthorized response from the server.

Key Points:

Authentication Schemes: HTTP supports various authentication schemes, such as Basic and Digest.
Credential Protection: Credentials should be sent securely to avoid interception.
Shared Cache Restrictions: Shared caches must be careful when handling requests with authorization headers to avoid privacy and security issues.

The Cache-Control Header

The Cache-Control header provides detailed instructions for caching behavior. It can be used by both clients and servers to control caching.

Key Directives:

no-cache: Prevents caching of the entire response or specific parts of it.
no-store: Prevents caching of any part of the response.
max-age: Specifies the maximum age of a cached response.
s-maxage: Similar to max-age, but applies only to shared caches.
must-revalidate: Requires revalidation with the origin server before using a stale response.
public: Makes the response cacheable by any cache.
private: Restricts caching to non-shared caches.

Cache-Control: No-Store Directive

The no-store directive is a powerful tool for preventing sensitive information from being cached. When this directive is included in a response, it instructs caches to not store any part of the response, including headers, body, or metadata.

Key Points:

Prevents Caching: The no-store directive is designed to prevent caching of sensitive information.
Applies to Both Requests and Responses: It can be used in both requests and responses.
Strict Enforcement: Caches must make a best effort to remove the information from volatile storage as soon as possible.
Limitations: While the no-store directive is a strong measure, it’s not foolproof. Malicious actors or compromised caches might still be able to store information.

Use Cases:

Sensitive Data: Protecting highly confidential information, such as passwords or credit card numbers.
Session-Specific Data: Preventing caching of session-specific information that could lead to security vulnerabilities.
Dynamic Content: Ensuring that dynamic content is always fetched from the origin server.

Note: While the no-store directive is a powerful tool, it’s important to use it judiciously. Overusing it can negatively impact performance, as it prevents caching of otherwise cacheable content.

Cache Revalidation and Reload Controls

HTTP provides mechanisms for clients to control how caches handle requests and responses. These mechanisms ensure data freshness and prevent stale content from being served.

End-to-End Revalidation: Forces a revalidation with the origin server, bypassing intermediate caches.
Specific End-to-End Revalidation: Revalidates a specific resource with the origin server. Unspecified End-to-End Revalidation: Revalidates any cached resource.

Cache-Control Directives:

no-cache: Forces a revalidation with the origin server.
max-age=0: Forces a revalidation with the origin server.
only-if-cached: Instructs the cache to return a cached response or a 504 Gateway Timeout error.
must-revalidate: Requires caches to revalidate stale responses before using them.
proxy-revalidate: Similar to must-revalidate, but only applies to shared caches.
no-transform: Prevents intermediate caches from modifying the response.

Understanding the Directives:

no-cache and max-age=0 are used to force revalidation.
only-if-cached is used to request a cached response without revalidation.
must-revalidate and proxy-revalidate ensure that stale responses are not used without revalidation.
no-transform prevents intermediate caches from modifying the response.

Cache-Control Extensions:

HTTP allows for extensions to the Cache-Control header to define custom caching behaviors. These extensions can be used to specify more granular control over caching, such as community-specific caching policies or other specific use cases.

Extensibility: The Cache-Control header is designed to be extensible.
Backward Compatibility: New extensions should be designed to work with older clients and servers.
Informational and Behavioral Extensions: Informational extensions provide additional information without affecting behavior, while behavioral extensions modify the standard caching behavior.

Connection Header:

The Connection header is used to specify options that are specific to a particular connection and should not be forwarded to subsequent connections.

Connection Options: Connection options are specified as tokens in the Connection header.
close Option: Indicates that the connection should be closed after the current request/response.
Header Removal: Proxies must remove headers listed in the Connection header before forwarding the request or response.

Content-Language, Content-Length, Content-Location, and Content-MD5 Headers

Content-Language:

Specifies the natural language(s) of the content.
Helps clients select appropriate content based on language preferences.
Multiple languages can be specified.

Content-Length:

Indicates the size of the entity-body in bytes.
Used by clients to determine the expected data transfer size.

Content-Location:

Specifies the actual location of the resource, which may differ from the requested URI. Helps clients identify the source of the content.

Content-MD5:

Provides a cryptographic hash of the entity-body.
Used for verifying the integrity of the received data.
Helps detect accidental or malicious modifications during transmission.

example:

HTTP/1.1 200 OK
Content-Length: 1234
Content-MD5: Q2FmMjNhM2E0NDQzNDk1NmI5M2Y0NDY5ZDQzNQ==
Content-Language: en-US

<html>
...
</html>

Content-Range header

The Content-Range header is used in HTTP to specify the range of bytes being sent in a partial content response. This is often used in response to a request with a Range header, where the client requests a specific range of bytes from a resource.

Example Request:


GET /large-file.txt HTTP/1.1
Range: bytes=500-999

Response:


HTTP/1.1 206 Partial Content
Content-Range: bytes 500-999/1234
Content-Length: 500
Content-Type: text/plain

The client requests bytes 500 to 999 of the file.
The server responds with a 206 Partial Content status code.
The Content-Range header specifies the range of bytes being sent (500-999 out of a total of 1234 bytes).
The Content-Length header indicates the length of the partial content being sent (500 bytes).

Content-Type header

The Content-Type header specifies the media type of the entity-body in an HTTP response. This information allows the client to interpret the content correctly.

Example:


HTTP/1.1 200 OK
Content-Type: text/html

<html>
  <body>
    This is an HTML document.
  </body>
</html>

In this example, the Content-Type header indicates that the response body is an HTML document.

Date Header

The Date header specifies the date and time when the message was originated. It helps in tracking the freshness of the content.

Example:

HTTP/1.1 200 OK
Date: Tue, 13 Nov 2023 11:15:22 GMT
Content-Type: text/plain

Hello, world!

In this example, the Date header indicates that the response was generated on November 13, 2023 at 11:15:22 GMT.

Clockless Origin Server Operation

If an origin server doesn’t have a clock, it can’t set accurate Expires or Last-Modified headers. To handle this, the server can:

Avoid Setting Expiration Headers: If the server can’t determine an accurate expiration time, it should avoid setting Expires or Last-Modified headers. This will force caches to revalidate the resource with the origin server on every request.
Set a Past Expiration Time: The server can set an Expires header to a time in the past. This will effectively mark the resource as expired, forcing caches to revalidate it.

ETag Header

The ETag header provides an entity tag that uniquely identifies a specific version of a resource. Caches can use this tag to determine whether a cached copy is still valid.

Example:


HTTP/1.1 200 OK
ETag: "12345"

In this example, the ETag header indicates that the current version of the resource has the entity tag “12345”.

Expires Header

The Expires header tells a cache how long a response is considered fresh.

What a cache does: A cache stores copies of website resources (like HTML pages, images) to improve loading speed for users. If a user requests a resource that’s already in the cache, the cache can deliver it instead of fetching it from the server again.
How Expires works: The server sends an Expires header with a date and time in the future. The cache considers the response fresh until that time. Once it’s past the expiry, the cache should re-validate the resource with the server before using it again.

Here are some key points about Expires:

The presence of Expires doesn’t guarantee the resource will change at that time.
Invalid dates, including “0”, are treated as “already expired”.
A server can set Expires to a date in the past to mark a response as expired.
A server can set Expires to a future date (around a year) to mark a response as cacheable.
The Cache-Control header (covered elsewhere) can override Expires.

Example:


HTTP/1.1 200 OK
Expires: Thu, 01 Dec 1994 16:00:00 GMT
Content-Type: text/html

<html>This is an HTML page</html>

In this example, the response is considered fresh until December 1st, 1994 at 4:00 pm GMT.

###If-Match Header

The If-Match header is used to make a request conditional. It allows a client to specify a list of entity tags that the resource must match.

Key Points:

Conditional Requests: Used to prevent overwriting or modifying a resource that has changed since the client last accessed it. Entity Tags: Entity tags are unique identifiers assigned to resources. Matching: The server compares the provided entity tags with the current entity tag of the resource. Response: If a match is found, the request is processed as usual. If no match is found, the server returns a 412 (Precondition Failed) response.

Example:


PUT /resource HTTP/1.1
If-Match: "12345"

This request will only be processed if the current entity tag of the resource is “12345”. If the resource has changed, the server will return a 412 status code.

If-Modified-Since Header

The If-Modified-Since header is used to make a request conditional based on a specific date and time. It allows the client to request a resource only if it has been modified since the specified time.

Conditional Requests: Used to avoid unnecessary data transfer for resources that haven’t changed. Date Comparison: The server compares the specified date with the last modification date of the resource. Response: If the resource has been modified since the specified date, the server returns the updated resource. If the resource hasn’t been modified, the server returns a 304 (Not Modified) response.

Example:


GET /resource HTTP/1.1
If-Modified-Since: Thu, 01 Dec 1994 16:00:00 GMT

This request will only return the resource if it has been modified after December 1, 1994 at 4:00 PM GMT.

If-None-Match Header

The If-None-Match header is similar to If-Match, but it’s used to prevent a resource from being modified if it has not changed.

Key Points:

Conditional Requests: Used to avoid overwriting a resource that hasn’t changed. Entity Tags: The server compares the provided entity tags with the current entity tag of the resource. Response: If a match is found, the server returns a 304 (Not Modified) response. If no match is found, the request is processed as usual.

Example:


PUT /resource HTTP/1.1
If-None-Match: "12345"

This request will only modify the resource if its current entity tag is not “12345”.

If-Unmodified-Since Header

The If-Unmodified-Since header is used for conditional requests, similar to If-Modified-Since. It allows a client to specify a date and time. The server checks if the resource has been modified since that time.

Conditional Requests: Used to prevent overwriting a resource that hasn’t changed. Date Comparison: The server compares the specified date with the last modification date of the resource. Response: If the resource hasn’t been modified, the request is processed as usual. If the resource has been modified since the specified date, the server returns a 412 (Precondition Failed) response.

Example:


PUT /resource HTTP/1.1
If-Unmodified-Since: Thu, 01 Dec 1994 16:00:00 GMT

This request will only succeed if the resource hasn’t been modified after December 1, 1994 at 4:00 PM GMT.

Last-Modified Header

The Last-Modified header indicates the date and time a server believes the resource was last modified.

Informational Header: Provides an estimate of the last modification time. Implementation Specific: The exact meaning depends on the server and resource type. Origin Server: Set by the server that created the resource.

Example:


Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT

This header suggests the resource was last modified on November 15, 1994, at 12:45 PM GMT. The accuracy depends on the server implementation.

Proxy-Authenticate and Proxy-Authorization

These headers are used for proxy authentication, where a client needs to authenticate with a proxy server before accessing the target resource.

Proxy-Authenticate: Sent by the proxy server to challenge the client for authentication credentials. Proxy-Authorization: Sent by the client to provide authentication credentials to the proxy server.

Range Header

The Range header is used to request specific parts of a resource, rather than the entire resource. This is often used for partial downloads or resuming interrupted downloads.

Byte Ranges: Specifies the byte range of the resource to be retrieved. Partial Content: Server responds with a 206 (Partial Content) status code and the requested range. Efficient Downloads: Allows for resuming interrupted downloads and downloading large files in chunks.

Example:


GET /large-file.txt HTTP/1.1
Range: bytes=500-999

This request asks for bytes 500 to 999 of the file large-file.txt.

Referer Header

The Referer header (often misspelled as Referrer) allows a client to specify the URI of the webpage that linked to the current request.

Key Points:

Informational Header: Provides information about the source of the request.
Server Tracking: Used by servers to track backlinks, analyze user navigation, and optimize caching.
Security Considerations: Some clients may choose not to send the Referer header for privacy reasons.

Example: HTTP

GET /article.html HTTP/1.1 Referer: https://www.example.com/category/news

Use code with caution.

This request indicates that the user accessed article.html by clicking a link on the page https://www.example.com/category/news. Retry-After Header

The Retry-After header is used in response messages to indicate how long a client should wait before retrying a request.

Key Points:

503 (Service Unavailable): Used to inform the client about the expected downtime of the service.
3xx (Redirection): Suggests a minimum waiting time before retrying the redirected request.
Value Format: Can be an HTTP date or the number of seconds to wait.

Example:


Retry-After: 120  (Wait 2 minutes)

Server Header

The Server header identifies the software used by the server to handle the request.

Key Points:

Informational Header: Identifies the server software and version.
Security Considerations: Revealing detailed server information might be a security risk. Some servers allow configuration to hide the version information.

Example:


Server: Apache/2.4.47 (Ubuntu)

This example shows the server is using Apache web server version 2.4.47 on an Ubuntu system.

TE Header

Note: The TE header is rarely used in modern HTTP communication. It was designed for specifying transfer codings and trailers in chunked transfer encoding.

Transfer Codings: Defines compression or framing mechanisms used to transfer data.
Trailers: Optional headers sent at the end of a chunked transfer.

Example Values:
TE: deflate (Accepts deflate compression)
TE: trailers (Accepts trailers in chunked transfer)

Trailer Header

The Trailer header is used in HTTP/1.1 to specify a list of header fields that will be sent in a trailer after the message body, specifically in chunked transfer-coding.

Chunked Transfer-Coding: Used for messages with unknown or variable length.
Trailer Fields: Additional header fields sent after the message body.
Purpose: Can be used to send information that depends on the message body, such as checksums or content lengths.

Example:

Trailer: Content-MD5

In this example, the Content-MD5 header will be sent after the message body, allowing for the calculation of the MD5 hash of the entire message.

Transfer-Encoding Header

The Transfer-Encoding header specifies the encoding used for the message body. The most common value is chunked, which is used for messages with an unknown or variable length.

Chunked Transfer-Coding: Divides the message body into chunks, each preceded by its size in hexadecimal.
Other Encodings: Other encodings, such as gzip or compress, can be used to compress the message body.

Example:


Transfer-Encoding: chunked

Upgrade Header

The Upgrade header allows a client to request a protocol upgrade to a different protocol, such as HTTP/2.

Key Points:

Protocol Upgrade: Used to switch to a different protocol during an existing connection. Server Approval: The server must agree to the upgrade by sending a 101 (Switching Protocols) response. Common Use Case: Upgrading from HTTP/1.1 to HTTP/2.

Example:


Upgrade: h2c

This request indicates that the client would like to upgrade the connection to HTTP/2.

User-Agent Header

The User-Agent header identifies the software application making the HTTP request.

Identification: Provides information about the browser, operating system, or other software making the request. Statistical Purposes: Used for tracking usage statistics of different user agents. Content Negotiation: Servers can tailor responses based on user agent capabilities.

Example:


User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36

This example shows a User-Agent string for a Chrome browser on Windows 10.

Vary Header

The Vary header is used by servers to indicate which request headers influence the selection of a response.

Cache Control: Helps caches determine if a cached response is still valid for a subsequent request.
Server Negotiation: Indicates factors considered by the server when selecting a response (e.g., user language, accept headers).

Example:


Vary: Accept-Language, User-Agent

This example indicates that the server considers the user’s language and browser type when selecting a response.

Via Header

The Via header is used by intermediaries (proxies and gateways) to track the path a request or response takes through the network.

Request Tracing: Shows the sequence of intermediaries that forwarded the request or response. Protocol Information: Includes the protocol versions used by each intermediary. Security: Sensitive information like internal hostnames may be masked.

Example:


Via: 1.1 loadbalancer.example.com (HAProxy/2.6.9), 2.0 internal-proxy.example.com (squid/3.5)

This example shows the request was forwarded by two intermediaries: a load balancer and an internal proxy.

Warning Header

The Warning header is used to convey additional information about a response, often related to potential issues or modifications made to the original response.

Key Points:

Informational: Provides additional context or warnings to the client.
Caching: Used to indicate potential issues with caching the response.
Transformation: Signals if the response has been modified or transformed by an intermediary.

Example:


Warning: 110 Response is stale

This warning indicates that the response is stale and might not be the most recent version.

Common Warning Codes:

110: Response is stale.
111: Revalidation failed.
112: Disconnected operation.
113: Heuristic expiration.
214: Transformation applied.

Note: The exact meaning and handling of Warning headers can vary depending on the specific warn-code and the implementation of the client and server.

Security Considerations in HTTP

HTTP, while a powerful protocol, can pose security risks if not used carefully. Here are some key security considerations:

Privacy and Security Risks:

Sensitive Information Leakage: Care must be taken to avoid unintentionally exposing sensitive information like passwords, credit card numbers, or personal data in HTTP requests.
Server Log Analysis: Server logs can potentially reveal user browsing habits and preferences, which could be misused.
Referer Header Exploitation: The Referer header can expose the origin of a request, potentially revealing private information or sensitive resources.
Insecure HTTP Methods: Using the GET method for sensitive data submission can expose that data in the URL, which might be logged or cached.

Mitigation Strategies:

HTTPS: Use HTTPS to encrypt communication between the client and server, protecting sensitive data from eavesdropping.
Secure Form Submission: Use the POST method instead of GET for sensitive form submissions to avoid exposing data in the URL.
Careful Header Usage: Use the Referer header judiciously, and consider disabling it in certain situations to protect privacy.
Server Configuration: Configure servers to securely log and store sensitive information, and to minimize the exposure of sensitive details in server responses.
User Awareness: Educate users about the risks of exposing sensitive information and encourage them to use strong passwords and avoid sharing personal information unnecessarily.

Quoting from RFC 2616:

15.1.3 Encoding Sensitive Information in URI’s Because the source of a link might be private information or might reveal an otherwise private information source, it is strongly recommended that the user be able to select whether or not the Referer field is sent. For example, a browser client could have a toggle switch for browsing openly/anonymously, which would respectively enable/disable the sending of Referer and From information.
Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol. Authors of services which use the HTTP protocol SHOULD NOT use GET based forms for the submission of sensitive data, because this will cause this data to be encoded in the Request-URI. Many existing servers, proxies, and user agents will log the request URI in some place where it might be visible to third parties. Servers can use POST-based form submission instead

Privacy Issues Connected to Accept Headers

The Accept header can reveal information about the user’s preferences, such as language and content type. While this information can be useful for tailoring content, it can also compromise user privacy.

Key Privacy Concerns:

Language Preferences: Language preferences can reveal information about a user’s cultural background, geographic location, or personal interests. Content Type Preferences: User preferences for specific content types can be used to profile user behavior and interests. User Tracking: By analyzing the Accept header along with other information, servers can track user behavior across different websites.

Mitigation Strategies:

Minimal Header Information: Users can configure their browsers to send minimal header information, including the Accept header. Privacy-Focused Browsers: Using privacy-focused browsers can help mitigate these issues by limiting the amount of information shared with websites. Ad-Blockers and Privacy Extensions: These tools can block tracking scripts and limit the information shared with websites. Server-Side Privacy Practices: Server administrators should be mindful of how they collect and use user data, including information derived from Accept headers.

Attacks Based on File and Path Names

HTTP servers can be vulnerable to attacks that exploit file system paths.

Key Vulnerabilities:

Directory Traversal: Attackers can use techniques like ../ to access files outside the intended directory. File Inclusion: Attackers can inject malicious code into requests to include files from the server’s file system.

Mitigation Strategies:

Input Validation: Strictly validate and sanitize user input to prevent malicious input. File System Permissions: Ensure that files and directories are configured with appropriate permissions to restrict access. Web Application Firewalls (WAFs): Use WAFs to protect against common web application attacks, including directory traversal and file inclusion. Keep Software Updated: Regularly update server software and applications to address known vulnerabilities. Secure Configuration: Configure servers to use secure settings and disable unnecessary features.

DNS Spoofing and Other Security Risks

DNS spoofing is a type of cyberattack where an attacker manipulates the DNS resolution process to redirect users to malicious websites. This can lead to various security risks, including:

Phishing Attacks: Users may be redirected to fake websites designed to steal personal information.
Malware Infection: Malicious websites can infect user devices with malware.
Data Theft: Sensitive information, such as login credentials, can be compromised.

Mitigation Strategies:

Strong DNS Security: Use DNSSEC to validate the authenticity of DNS responses. Firewall Configuration: Configure firewalls to block suspicious traffic and filter DNS requests.
User Awareness: Educate users about the risks of DNS spoofing and how to identify suspicious websites.

Other Security Risks

HTTP Clients and Security:

Caching Hostname Lookups: Clients should be cautious about caching DNS resolution results, as IP addresses can change.
Sensitive Information Leakage: Clients should be careful to avoid exposing sensitive information in HTTP requests, such as passwords or credit card numbers.
Authentication Credentials: Clients should securely store and manage authentication credentials to prevent unauthorized access.

Proxy Servers and Security:

Man-in-the-Middle Attacks: Proxies can be vulnerable to man-in-the-middle attacks, where attackers intercept and manipulate traffic.
Data Privacy: Proxies should handle user data with care and implement appropriate security measures to protect privacy.
Denial of Service Attacks: Proxies can be targets of denial-of-service attacks, which can disrupt service and impact user experience.

Mitigation Strategies:

Secure Configuration: Configure proxies to use strong security practices, such as SSL/TLS encryption, strong authentication, and regular security updates.
Access Control: Implement strict access controls to limit access to sensitive information and administrative functions.
Logging and Monitoring: Monitor proxy logs for suspicious activity and security threats. Regular Security Audits: Conduct regular security audits to identify and address vulnerabilities.