Skip to main content

Command Palette

Search for a command to run...

How a Browser Works: A Beginner-Friendly Guide to Browser Internals

Understanding what really happens after you press Enter

Updated
7 min read
How a Browser Works: A Beginner-Friendly Guide to Browser Internals
N

I build clean and simple web experiences and learn something new every day.

What happens after I type a URL and press Enter?

This is the exact question that made me curious about browsers.

Every single day I open Chrome, Edge, or Firefox, type a URL, press Enter and within seconds a website appears. It feels instant. Almost magical.

But when I stopped and really thought about it, one thing bothered me:

What actually happens in between?

Not just networking. Not just HTML. But the entire journey from:

a URL → raw code → structured data → painted pixels on my screen

To understand this journey, I had to first clear one misconception.


A browser is not just a “website opener”

Most of us think:

Browser = app that opens websites

That description is technically correct… but very incomplete.

A browser is actually a complex software system whose job is to:

  • communicate with servers

  • download resources

  • understand HTML, CSS, and JavaScript

  • calculate layouts

  • draw pixels on the screen

All of this happens every single time a page loads.

So instead of thinking:

Browser = app that opens websites

I now think:

Browser = a machine that converts code into visuals


A browser is not one thing - it’s many parts working together

A browser is built from multiple components, and each component has a very specific responsibility.

At a very high level, these are the main parts:

  • User Interface

  • Browser Engine

  • Rendering Engine

  • Networking

  • JavaScript Engine

  • UI Backend

I don’t need to memorize these names. What matters is understanding what role each one plays.

Let’s go step by step.


User Interface (the part I interact with)

This is the visible part of the browser the part I actually touch.

It includes:

  • address bar

  • tabs

  • back / forward buttons

  • refresh button

  • bookmarks

Here’s the important thing to understand:

The UI does NOT render websites.

Its job is only to:

  • take my input (like typing a URL)

  • pass it to the browser’s internal system

The moment I press Enter, the UI steps aside.

And now… the real work begins.


Browser Engine vs Rendering Engine (kept simple)

This part confused me a lot initially, so I’ll explain it the way it finally clicked for me.

Browser Engine - the manager

Think of the browser engine as a manager.

Its job is to:

  • coordinate between the UI and the rendering engine

  • handle navigation logic

  • decide when and what should be rendered

It doesn’t draw anything itself. It just makes sure everything happens in the correct order.

Rendering Engine - the artist

The rendering engine is the artist.

Its job is to:

  • read HTML

  • understand CSS

  • build internal structures

  • calculate layouts

  • paint pixels on the screen

Popular rendering engines you might hear about:

  • Blink (Chrome, Edge)

  • Gecko (Firefox)

I don’t need to dive into their internal code.

The only thing I really need to remember is:

Rendering engine = code → visuals


Networking: how the browser fetches data

Before anything can be rendered, the browser needs data.

This is where networking comes in.

When I enter a URL:

  • the browser talks to DNS to find the server’s IP

  • it opens a TCP connection

  • it sends an HTTP request

The server responds with resources like:

  • HTML

  • CSS

  • JavaScript

  • images

The networking layer does not care about layouts or visuals.

Its only responsibility is very simple:

Give me the files from the server.


HTML parsing and DOM creation

Now things start getting interesting.

The browser receives HTML but HTML is just text.

The browser can’t work with plain text. It needs structure.

That’s why it starts parsing.

What does parsing mean?

Parsing simply means:

breaking something into meaningful pieces and understanding their relationships

From HTML to DOM

While parsing HTML, the browser:

  • reads tags one by one

  • understands parent–child relationships

  • builds a tree structure

This tree is called the DOM (Document Object Model).

Example:

  • <html> becomes the root

  • <body> becomes a child

  • <div>, <p>, <h1> become branches

DOM is structure only.

No colors. No layout. No visuals.

Just what exists on the page.


CSS parsing and CSSOM creation

HTML gives structure. CSS gives style.

The browser parses CSS separately.

During CSS parsing, the browser:

  • reads selectors

  • understands rules

  • resolves conflicts

  • builds another tree

This tree is called the CSSOM (CSS Object Model).

CSSOM answers questions like:

  • what color?

  • what font?

  • what size?

  • what position?

So now the browser has:

  • DOM → what elements exist

  • CSSOM → how they should look


How DOM and CSSOM come together

DOM + CSSOM = Render Tree

The render tree:

  • contains only visible elements

  • includes computed styles

  • represents what will actually be drawn

Important things to note:

  • elements with display: none are excluded

  • pseudo-elements may be included

  • structure and style are merged

This is the first time the browser knows:

Exactly what needs to be drawn.


Layout (reflow): calculating positions

Now the browser asks:

  • Where should this element go?

  • How wide is it?

  • How tall is it?

This step is called layout (or reflow).

During layout, the browser:

  • calculates sizes

  • determines positions

  • resolves percentages and units

  • considers viewport size

Any change in:

  • window size

  • font size

  • content

can trigger reflow.

That’s why layout is considered expensive.


Painting: turning layout into pixels

After layout comes painting.

Painting means:

  • drawing text

  • filling colors

  • drawing borders

  • placing images

The browser paints everything into layers.

Still… nothing is on the screen yet.


Display: pixels on the screen

Finally:

  • layers are composited

  • pixels are pushed to the screen

And I see the webpage.

All of this happens in milliseconds.

That’s why browsers are true engineering masterpieces.


Parsing explained using a simple math example

Before a browser can understand HTML or CSS, it first needs to parse them.

The word parsing sounds scary, so let’s remove the fear.

Consider this expression:

2 + 3 * 4

At first glance, it’s just text.

But to understand what it really means, the browser (or any program) needs structure, not just text.

Step 1: Breaking text into tokens

The parser first breaks the expression into smaller meaningful pieces called tokens:

2 + 3 * 4

Each token has a role:

  • numbers

  • operators

This is similar to how a browser breaks HTML into tags and text.

Step 2: Understanding rules and priority

Math has rules.

One important rule is:

multiplication happens before addition

So the parser already knows:

3 * 4 must be handled first

Browsers also follow rules while parsing HTML and CSS.

Step 3: Building a tree

Instead of calculating immediately, the parser builds a tree structure:

For our example, the tree looks like this:

        +
       / \
      2   *
         / \
        3   4

This tree tells the computer:

  • + is the main operation

  • left side is 2

  • right side is 3 * 4

Only after this structure is created does actual calculation happen.

Step 4: Evaluating using the tree

Now the computer evaluates the tree from bottom to top:

3 * 4 = 12
2 + 12 = 14

Understanding comes before execution


How this connects to browsers

This is exactly how browsers handle HTML and CSS.

HTML parsing → DOM tree

HTML is not used as plain text.
The browser parses it into a DOM tree.

Example:

<body>
  <h1>Hello</h1>
  <p>World</p>
</body>

Becomes:

body
├── h1
   └── "Hello"
└── p
    └── "World"

CSS parsing → CSSOM tree

CSS is also parsed and converted into a structured tree called CSSOM, where styles are organized and ready to be applied.


Why parsing matters

The browser doesn’t guess.
It doesn’t randomly render things.

It:

  • parses text

  • builds trees

  • understands relationships

  • then renders pixels

So when I say:

The browser parses HTML

What I really mean is:

The browser converts raw text into a structured tree it can understand.


Final takeaway

Parsing is simply:

turning raw text into a meaningful tree structure

Once this idea clicks:

  • DOM stops feeling mysterious

  • CSSOM makes sense

  • layout and rendering feel logical

And suddenly, browsers don’t feel magical anymore

They feel brilliantly engineered.

More from this blog

C

codeXninjaDev

54 posts

I build clean and simple web experiences and learn something new every day.