2737 views
How Does the Internet Work?
Ever wondered what happens ‘under the hood’, when you open a website in your browser? Although we open countless websites every day, few of us understand the technical intricacies of what happens.
The description below explains what happens, starting from the press of the enter key on the keyboard. I’m writing this to structure my understanding about the web, and hope it helps you do the same.
The Start: A Keyboard Press
Key Down Event
Your finger hits the enter key, pushes it down, and it bottoms out. What’s next? A typical Universal Serial Bus (USB) keyboard is powered by the computer’s USB host controller. Usually, it has contacts arranged in layers with an insulating layer in between (see image below). The key press makes the layers connect, completing an electrical circuit. And this signals to the keyboard controller that the enter key has been pressed.
Keyboard Circuitry
Most keyboards have a circuit layout designed in the shape of a grid. That’s smart because each key needs to be uniquely identifiable which works well with rows and columns (similar to a chessboard). When the enter key bottoms out, the circuit between a specific row and column is completed. Here’s a simplified illustration of this:
Keycode Conversion
The controller on the keyboard detects the row and column information that corresponds to the enter key and converts it to a keycode integer. This code is stored on the keyboard, ready to be encoded for the USB cable transfer once the endpoint is polled by the host USB controller.
From Keyboard to Computer
USB Cable
The USB cable connects the keyboard to the computer and facilitates the ‘serial’ (i.e. one bit at a time in sequence) data transfer. The diagram below shows the structure of a USB cable.
USB Cable Types
The most common type of USB cables (USB 1.x and 2.0) have four wires: two for power (VBUS and GND) and two for data transfer (D+ and D-). To increase the data transfer speed, USB 3.x cables include additional wires.
Differential Signaling
Inside of the USB cable, data is transferred using differential signaling over the D+ and D- wires. This signaling technique uses two complementary signals to transmit information. One wire carries the signal and the other carries the inverted signal as illustrated below.
Packet Transfer
The bits of data that sequentially make their way through the USB cable are often referred to as ‘packets’ that follow the low level USB protocol. When the signal reaches the host USB controller it is decoded and interpreted by the keyboard device driver.
Key-Down Message Travels to the Application
Key Press Event
The key press is passed to Microsoft’s Keyboard Human Interface Device KBDHID.sys
kernel-mode driver for USB devices, which converts the Human Interface Device
(HID) usage (i.e., key event) into a scan code.
Drivers
In our ‘google.com’ example, the press of the enter key originates from a keyboard; therefore, the KBDHID.sys
driver interfaces with keyboard class driver KBDCLASS.sys
.
This class driver, in turn, communicates with the Windows 32-bit Kernel Win32K.sys
driver.
Active Window
To identify the currently active window, Win32K.sys
uses the GetForegroundWindow()
API. This ensures that the key press event is addressed to the desired target
which is the browser window.
Message Queue
Next, the message dispatcher responsible for managing the message flow between the operating system and open applications calls SendMessage(hWnd, uMsg, wParam, lParam)
.
This adds a message to the queue for a specific window which is processed by the message processing function WindowProc
.
Parameter | Name | Description |
---|---|---|
hWnd | Window handle | Unique identifier assigned to each window |
Msg | Message | The message to be sent |
wParam | Word Parameter | Additional message-specific information |
lParam | Long Parameter | Info about the key press, including repeat count, scan code, extended key flag, and key context code |
Browser’s URL Bar is Parsed and Request is Sent
Parsing the URL Bar
The input in the browser’s URL bar is parsed to check if it’s a URL or search query. In our ‘google.com’ example, the input contains the domain ‘google’ and the top level domain (TLD) ‘.com’. The browser automatically adds the protocol ‘http or https’, prefix ‘www’, and resource ‘/’ (i.e., the index).
GET Request
To retrieve the Google website form Google’s servers, a GET Request is sent. This is one of the methods used in the HTTP protocol for retrieving data. The next step is to choose between the regular HTTP and encrypted HTTPS protocol.
Choosing the Request Protocol: HTTP or HTTPS?
Checking the HSTS List
How exactly does the browser decide between HTTP and HTTPS? First, the browser looks at its built-in HSTS (HTTP Strict Transport Security) list. If a site is on this list, it receives a HTTPS request. But if it isn’t, the browser uses HTTP in the initial request. If a website is not on the HSTS list but requires HTTPS, it tells the browser to switch to HTTPS in future visits.
Chrome HSTS
Chrome maintains an “HSTS Preload List” (and other browsers maintain lists based on the Chrome list). The listed domains will be preconfigured with HSTS when chrome is first installed. In our ‘google.com’ example, the first request will use the https protocol because (needless to say) Google is on the list.
DNS Lookup: Domain Name to IP Address
Cache Check
Using the domain name, the Domain Name System (DNS) lookup looks-up the Internet Protocol (IP). First, the browser checks the local DNS cache. If not found, a request goes to the Internet Service Provider's (ISP) DNS Server, which also has a cache. If none of these caches have the information, the recursive DNS server steps in.
Recursive DNS Server
The recursive DNS Server receives the request and follows a chain of referrals (hence the name ‘recursive’) until it reaches an authoritative DNS server that can provide the necessary information (i.e., the IP that belongs to the domain).
Top Level Domains (TLD)
Root name servers hold information about top level domains (e.g., .com, .de, .co-uk, etc.). As ‘google.com’ ends with .com, the TLD name server that handles .com domains is queried. This server, in turn, knows the authoritative name server that stores the DNS record for the ‘google.com’ domain and sends the query to it.
IP Address Identified
The authoritative name server identifies the IP address for the domain and sends a response containing the IP back. And, along the way, the caches are updated for future requests.
Open a Socket
Sockets for Data Transfer
Using the IP address and the port number (default for HTTP is 80, and HTTPS is 443), the browser makes a call to the socket()
function to create a ‘socket’ (i.e. a communication endpoint).
There are different types of sockets; however, for the transmission of data over the internet, the SOCK_STREAM
is used. This socket type uses the Transmission Control Protocol (TCP)
and has the following characteristics:
- Sequenced: Recipient receives data in the order sent
- Reliable: There are methods in place to handle packet loss
- Bidirectional: Both endpoints can send and receive data
- Connection Mode: Before data is exchanged, a connection is established (in contrast to connectionless mode)
- Byte Stream: Data is transmitted as stream of bytes
OSI Model
The HTTP GET Request sent by the browser is wrapped (encapsulated) with protocol specific information while passing through the Open Systems Interconnection (OSI) model?This model describes the different layers of abstraction that computers use to communicate over a network layers. (see image below).
High Level to Low Level
The HTTP request originates from the highest OSI model layer, the Application Layer (e.g., the browser). As the request traverses down the OSI layers, it undergoes further processing and encapsulation until it reaches the lowest layer, the Physical Layer, where it is transmitted as binary over the physical network medium.
Server Handles Request
Receiving the Request
The server listens for incoming requests on a specific port, typically port 80 for HTTP and 443 for HTTPS requests. Once a request arrives, it is parsed to extract information, e.g., requested URL, HTTP method, and query parameters.
Handling the GET Request
Servers often run on Apache or nginx for Linux operating systems, and IIS for Windows. There are different types of requests that a server can handle, but by far the most common are:
- GET: Used to retrieve data from a server.
- POST: Used to submit data to be processed by a server.
As mentioned further above, the browser sends a GET request to the server to open ‘google.com’. The server, on the other hand, returns a HTTP response containing the relevant HTML, JavaScript, CSS, images, etc.
The Browser
Browser Architecture
To understand how the browser processes the HTTP response from the server, it’s worth taking a look at it’s components:
- UI: Graphical interface through which users interact with the browser
- Browser Engine: Middleman between UI and rendering engine
- Rendering Engine: Interprets HTML, CSS, and other resources to render webpages visually
- Networking: Handles HTTP requests and responses
- JavaScript Interpreter: Executes JS code
- Data Persistence: Handles storage for cookies, cache, etc.
- UI Backend: Used for drawing basic windows, boxes, buttons in the UI
Document Object Model
The browser begins by parsing the HTML document. It turns the HTML markup into a hierarchical structure known as the Document Object Model (DOM). This parsing process involves identifying HTML tags, attributes, and text content to construct the DOM tree.
Fetch External Resources
While parsing HTML, the browser encounters references to external resources such as CSS stylesheets, JavaScript files, images, fonts, and other files. It initiates separate network requests to fetch these resources from the server.
Cascading Style Sheet Object Model
The browser proceeds to parse CSS stylesheets to create a CSS Object Model (CSSOM). The CSSOM represents the hierarchy of CSS rules and their associated properties, which should be applied to the corresponding HTML elements in the DOM tree.
Render Tree
After the DOM tree and CSSOM are fully constructed, the browser combines them to create a unified representation known as the render tree. This tree represents the final layout of the webpage, including the visual formatting of elements, their positions, and styles. The render tree is then used to paint the content onto the screen.
Rendering to UI
Finally, the website is rendered to the display using the Graphical Processing Unit (GPU) or Central Processing Unit (CPU). And google.com shows in your browser window and the curtains close.