How a Browser Works: A Beginner-Friendly Guide to Browser Internals
Understanding what really happens after you press Enter

I build clean and simple web experiences and learn something new every day.
What happens after I type a URL and press Enter?
This is the exact question that made me curious about browsers.
Every single day I open Chrome, Edge, or Firefox, type a URL, press Enter and within seconds a website appears. It feels instant. Almost magical.
But when I stopped and really thought about it, one thing bothered me:
What actually happens in between?
Not just networking. Not just HTML. But the entire journey from:
a URL → raw code → structured data → painted pixels on my screen
To understand this journey, I had to first clear one misconception.
A browser is not just a “website opener”
Most of us think:
Browser = app that opens websites
That description is technically correct… but very incomplete.
A browser is actually a complex software system whose job is to:
communicate with servers
download resources
understand HTML, CSS, and JavaScript
calculate layouts
draw pixels on the screen
All of this happens every single time a page loads.
So instead of thinking:
Browser = app that opens websites
I now think:
Browser = a machine that converts code into visuals
A browser is not one thing - it’s many parts working together
A browser is built from multiple components, and each component has a very specific responsibility.
At a very high level, these are the main parts:
User Interface
Browser Engine
Rendering Engine
Networking
JavaScript Engine
UI Backend

I don’t need to memorize these names. What matters is understanding what role each one plays.
Let’s go step by step.
User Interface (the part I interact with)
This is the visible part of the browser the part I actually touch.
It includes:
address bar
tabs
back / forward buttons
refresh button
bookmarks
Here’s the important thing to understand:
The UI does NOT render websites.
Its job is only to:
take my input (like typing a URL)
pass it to the browser’s internal system
The moment I press Enter, the UI steps aside.
And now… the real work begins.
Browser Engine vs Rendering Engine (kept simple)
This part confused me a lot initially, so I’ll explain it the way it finally clicked for me.
Browser Engine - the manager
Think of the browser engine as a manager.
Its job is to:
coordinate between the UI and the rendering engine
handle navigation logic
decide when and what should be rendered
It doesn’t draw anything itself. It just makes sure everything happens in the correct order.
Rendering Engine - the artist
The rendering engine is the artist.
Its job is to:
read HTML
understand CSS
build internal structures
calculate layouts
paint pixels on the screen
Popular rendering engines you might hear about:
Blink (Chrome, Edge)
Gecko (Firefox)
I don’t need to dive into their internal code.
The only thing I really need to remember is:
Rendering engine = code → visuals
Networking: how the browser fetches data
Before anything can be rendered, the browser needs data.
This is where networking comes in.
When I enter a URL:
the browser talks to DNS to find the server’s IP
it opens a TCP connection
it sends an HTTP request
The server responds with resources like:
HTML
CSS
JavaScript
images
The networking layer does not care about layouts or visuals.
Its only responsibility is very simple:
Give me the files from the server.
HTML parsing and DOM creation

Now things start getting interesting.
The browser receives HTML but HTML is just text.
The browser can’t work with plain text. It needs structure.
That’s why it starts parsing.
What does parsing mean?
Parsing simply means:
breaking something into meaningful pieces and understanding their relationships
From HTML to DOM
While parsing HTML, the browser:
reads tags one by one
understands parent–child relationships
builds a tree structure
This tree is called the DOM (Document Object Model).
Example:
<html>becomes the root<body>becomes a child<div>,<p>,<h1>become branches
DOM is structure only.
No colors. No layout. No visuals.
Just what exists on the page.
CSS parsing and CSSOM creation

HTML gives structure. CSS gives style.
The browser parses CSS separately.
During CSS parsing, the browser:
reads selectors
understands rules
resolves conflicts
builds another tree
This tree is called the CSSOM (CSS Object Model).
CSSOM answers questions like:
what color?
what font?
what size?
what position?
So now the browser has:
DOM → what elements exist
CSSOM → how they should look
How DOM and CSSOM come together

DOM + CSSOM = Render Tree
The render tree:
contains only visible elements
includes computed styles
represents what will actually be drawn
Important things to note:
elements with
display: noneare excludedpseudo-elements may be included
structure and style are merged
This is the first time the browser knows:
Exactly what needs to be drawn.
Layout (reflow): calculating positions

Now the browser asks:
Where should this element go?
How wide is it?
How tall is it?
This step is called layout (or reflow).
During layout, the browser:
calculates sizes
determines positions
resolves percentages and units
considers viewport size
Any change in:
window size
font size
content
can trigger reflow.
That’s why layout is considered expensive.
Painting: turning layout into pixels
After layout comes painting.
Painting means:
drawing text
filling colors
drawing borders
placing images
The browser paints everything into layers.
Still… nothing is on the screen yet.
Display: pixels on the screen
Finally:
layers are composited
pixels are pushed to the screen
And I see the webpage.
All of this happens in milliseconds.
That’s why browsers are true engineering masterpieces.
Parsing explained using a simple math example
Before a browser can understand HTML or CSS, it first needs to parse them.
The word parsing sounds scary, so let’s remove the fear.
Consider this expression:
2 + 3 * 4
At first glance, it’s just text.
But to understand what it really means, the browser (or any program) needs structure, not just text.
Step 1: Breaking text into tokens
The parser first breaks the expression into smaller meaningful pieces called tokens:
2 + 3 * 4
Each token has a role:
numbers
operators
This is similar to how a browser breaks HTML into tags and text.
Step 2: Understanding rules and priority
Math has rules.
One important rule is:
multiplication happens before addition
So the parser already knows:
3 * 4 must be handled first
Browsers also follow rules while parsing HTML and CSS.
Step 3: Building a tree

Instead of calculating immediately, the parser builds a tree structure:
For our example, the tree looks like this:
+
/ \
2 *
/ \
3 4
This tree tells the computer:
+is the main operationleft side is
2right side is
3 * 4
Only after this structure is created does actual calculation happen.
Step 4: Evaluating using the tree
Now the computer evaluates the tree from bottom to top:
3 * 4 = 12
2 + 12 = 14
Understanding comes before execution
How this connects to browsers
This is exactly how browsers handle HTML and CSS.
HTML parsing → DOM tree
HTML is not used as plain text.
The browser parses it into a DOM tree.
Example:
<body>
<h1>Hello</h1>
<p>World</p>
</body>
Becomes:
body
├── h1
│ └── "Hello"
└── p
└── "World"
CSS parsing → CSSOM tree
CSS is also parsed and converted into a structured tree called CSSOM, where styles are organized and ready to be applied.
Why parsing matters
The browser doesn’t guess.
It doesn’t randomly render things.
It:
parses text
builds trees
understands relationships
then renders pixels
So when I say:
The browser parses HTML
What I really mean is:
The browser converts raw text into a structured tree it can understand.

Final takeaway
Parsing is simply:
turning raw text into a meaningful tree structure
Once this idea clicks:
DOM stops feeling mysterious
CSSOM makes sense
layout and rendering feel logical
And suddenly, browsers don’t feel magical anymore
They feel brilliantly engineered.

