How a Browser Works: From URL to Pixels (Beginner-Friendly Guide)

What happens after I type a URL and press Enter?

This is the exact question that made me curious about browsers.

Every single day I open Chrome, Edge, or Firefox, type a URL, press Enter and within seconds a website appears. It feels instant. Almost magical.

But when I stopped and really thought about it, one thing bothered me:

What actually happens in between?

Not just networking. Not just HTML. But the entire journey from:

a URL → raw code → structured data → painted pixels on my screen

To understand this journey, I had to first clear one misconception.

A browser is not just a “website opener”

Most of us think:

Browser = app that opens websites

That description is technically correct… but very incomplete.

A browser is actually a complex software system whose job is to:

communicate with servers
download resources
understand HTML, CSS, and JavaScript
calculate layouts
draw pixels on the screen

All of this happens every single time a page loads.

So instead of thinking:

Browser = app that opens websites

I now think:

Browser = a machine that converts code into visuals

A browser is not one thing - it’s many parts working together

A browser is built from multiple components, and each component has a very specific responsibility.

At a very high level, these are the main parts:

User Interface
Browser Engine
Rendering Engine
Networking
JavaScript Engine
UI Backend

I don’t need to memorize these names. What matters is understanding what role each one plays.

Let’s go step by step.

User Interface (the part I interact with)

This is the visible part of the browser the part I actually touch.

It includes:

address bar
tabs
back / forward buttons
refresh button
bookmarks

Here’s the important thing to understand:

The UI does NOT render websites.

Its job is only to:

take my input (like typing a URL)
pass it to the browser’s internal system

The moment I press Enter, the UI steps aside.

And now… the real work begins.

Browser Engine vs Rendering Engine (kept simple)

This part confused me a lot initially, so I’ll explain it the way it finally clicked for me.

Browser Engine - the manager

Think of the browser engine as a manager.

Its job is to:

coordinate between the UI and the rendering engine
handle navigation logic
decide when and what should be rendered

It doesn’t draw anything itself. It just makes sure everything happens in the correct order.

Rendering Engine - the artist

The rendering engine is the artist.

Its job is to:

read HTML
understand CSS
build internal structures
calculate layouts
paint pixels on the screen

Popular rendering engines you might hear about:

Blink (Chrome, Edge)
Gecko (Firefox)

I don’t need to dive into their internal code.

The only thing I really need to remember is:

Rendering engine = code → visuals

Networking: how the browser fetches data

Before anything can be rendered, the browser needs data.

This is where networking comes in.

When I enter a URL:

the browser talks to DNS to find the server’s IP
it opens a TCP connection
it sends an HTTP request

The server responds with resources like:

HTML
CSS
JavaScript
images

The networking layer does not care about layouts or visuals.

Its only responsibility is very simple:

Give me the files from the server.

HTML parsing and DOM creation

Now things start getting interesting.

The browser receives HTML but HTML is just text.

The browser can’t work with plain text. It needs structure.

That’s why it starts parsing.

What does parsing mean?

Parsing simply means:

breaking something into meaningful pieces and understanding their relationships

From HTML to DOM

While parsing HTML, the browser:

reads tags one by one
understands parent–child relationships
builds a tree structure

This tree is called the DOM (Document Object Model).

Example:

<html> becomes the root
<body> becomes a child
<div>, <p>, <h1> become branches

DOM is structure only.

No colors. No layout. No visuals.

Just what exists on the page.

CSS parsing and CSSOM creation

HTML gives structure. CSS gives style.

The browser parses CSS separately.

During CSS parsing, the browser:

reads selectors
understands rules
resolves conflicts
builds another tree

This tree is called the CSSOM (CSS Object Model).

CSSOM answers questions like:

what color?
what font?
what size?
what position?

So now the browser has:

DOM → what elements exist
CSSOM → how they should look

How DOM and CSSOM come together

DOM + CSSOM = Render Tree

The render tree:

contains only visible elements
includes computed styles
represents what will actually be drawn

Important things to note:

elements with display: none are excluded
pseudo-elements may be included
structure and style are merged

This is the first time the browser knows:

Exactly what needs to be drawn.

Layout (reflow): calculating positions

Now the browser asks:

Where should this element go?
How wide is it?
How tall is it?

This step is called layout (or reflow).

During layout, the browser:

calculates sizes
determines positions
resolves percentages and units
considers viewport size

Any change in:

window size
font size
content

can trigger reflow.

That’s why layout is considered expensive.

Painting: turning layout into pixels

After layout comes painting.

Painting means:

drawing text
filling colors
drawing borders
placing images

The browser paints everything into layers.

Still… nothing is on the screen yet.

Display: pixels on the screen

Finally:

layers are composited
pixels are pushed to the screen

And I see the webpage.

All of this happens in milliseconds.

That’s why browsers are true engineering masterpieces.

Parsing explained using a simple math example

Before a browser can understand HTML or CSS, it first needs to parse them.

The word parsing sounds scary, so let’s remove the fear.

Consider this expression:

2 + 3 * 4

At first glance, it’s just text.

But to understand what it really means, the browser (or any program) needs structure, not just text.

Step 1: Breaking text into tokens

The parser first breaks the expression into smaller meaningful pieces called tokens:

2 + 3 * 4

Each token has a role:

numbers
operators

This is similar to how a browser breaks HTML into tags and text.

Step 2: Understanding rules and priority

Math has rules.

One important rule is:

multiplication happens before addition

So the parser already knows:

3 * 4 must be handled first

Browsers also follow rules while parsing HTML and CSS.

Step 3: Building a tree

Instead of calculating immediately, the parser builds a tree structure:

For our example, the tree looks like this:

This tree tells the computer:

+ is the main operation
left side is 2
right side is 3 * 4

Only after this structure is created does actual calculation happen.

Step 4: Evaluating using the tree

Now the computer evaluates the tree from bottom to top:

3 * 4 = 12
2 + 12 = 14

Understanding comes before execution

How this connects to browsers

This is exactly how browsers handle HTML and CSS.

HTML parsing → DOM tree

HTML is not used as plain text.
The browser parses it into a DOM tree.

Example:

<body>
  <h1>Hello</h1>
  <p>World</p>
</body>

Becomes:

body
├── h1
│   └── "Hello"
└── p
    └── "World"

CSS parsing → CSSOM tree

CSS is also parsed and converted into a structured tree called CSSOM, where styles are organized and ready to be applied.

Why parsing matters

The browser doesn’t guess.
It doesn’t randomly render things.

It:

parses text
builds trees
understands relationships
then renders pixels

So when I say:

The browser parses HTML

What I really mean is:

The browser converts raw text into a structured tree it can understand.

Final takeaway

Parsing is simply:

turning raw text into a meaningful tree structure

Once this idea clicks:

DOM stops feeling mysterious
CSSOM makes sense
layout and rendering feel logical

And suddenly, browsers don’t feel magical anymore

They feel brilliantly engineered.

How a Browser Works: A Beginner-Friendly Guide to Browser Internals

What happens after I type a URL and press Enter?

A browser is not just a “website opener”

A browser is not one thing - it’s many parts working together

User Interface (the part I interact with)

Browser Engine vs Rendering Engine (kept simple)

Browser Engine - the manager

Rendering Engine - the artist

Networking: how the browser fetches data

HTML parsing and DOM creation

What does parsing mean?

From HTML to DOM

CSS parsing and CSSOM creation

How DOM and CSSOM come together

Layout (reflow): calculating positions

Painting: turning layout into pixels

Display: pixels on the screen

Parsing explained using a simple math example

Step 1: Breaking text into tokens

Step 2: Understanding rules and priority

Step 3: Building a tree

Step 4: Evaluating using the tree

How this connects to browsers

CSS parsing → CSSOM tree

Why parsing matters

Final takeaway

Comments

More from this blog

Sessions vs JWT vs Cookies: Understanding Authentication Approaches

JWT Authentication in Node.js Explained Simply

Storing Uploaded Files and Serving Them in Express

Handling File Uploads in Express with Multer

Why Node.js is Perfect for Building Fast Web Applications

Command Palette

What happens after I type a URL and press Enter?

A browser is not just a “website opener”

A browser is not one thing - it’s many parts working together

User Interface (the part I interact with)

Browser Engine vs Rendering Engine (kept simple)

Browser Engine - the manager

Rendering Engine - the artist

Networking: how the browser fetches data

HTML parsing and DOM creation

What does parsing mean?

From HTML to DOM

CSS parsing and CSSOM creation

How DOM and CSSOM come together

Layout (reflow): calculating positions

Painting: turning layout into pixels

Display: pixels on the screen

Parsing explained using a simple math example

Step 1: Breaking text into tokens

Step 2: Understanding rules and priority

Step 3: Building a tree

Step 4: Evaluating using the tree

How this connects to browsers

CSS parsing → CSSOM tree

Why parsing matters

Final takeaway

Comments

More from this blog