How the web works

In order to surf the World Wide Web, you need an application called a web browser. You're probably familiar with this, as you might already be using one already such as Google's Chrome, Microsoft's Internet Explorer, Mozilla's Firefox, or Apple's Safari. To fully appreciate how the web works, we'll briefly talk about URLs, domain names, IP addresses, the domain name system, and the HTTP protocol to be fully prepared for web development.

The World Wide Web

A domain name is part of a URL, which is just a human-friendly IP address. Domain name servers connect domain names with IP addresses so you don't have to. Clients and servers communicate using the HTTP protocol over an application called a web browser.

The big picture

What actually happens in between typing in google.com and seeing Google's website? In short, there are a lot of complex things happening and I'll describe this process briefly. In between each browser request and server response are one or multiple pieces of software, most of them speaking to each other using a common protocol called HTTP, or the Hypertext Transfer Protocol.

Let's start with an overview of how the Web works, and then explore each step individually. Every website you've visited started with a client sending a request to a server. A client is anything that can request a resource on the web and these include desktop computers, laptops, phones, and tablets. But even software programs themselves can be clients.

Be it a human user or a software program, the entity making the request is called a client. When you type a URL into the address bar, you are sending a request to a computer server, in particular, a web server.

The Internet is a network of computers all connected together by various cables. It's made up of wires, routers, switches and satellites that connect the network of servers together. These servers can perform a lot of tasks. Web servers host web sites, domain name servers connect domain names with IP addresses, and mail servers send and accept email messages.

All of these things are considered part of the Internet. The files, folders and media that make up webpages are housed on the servers. These files, folders and media are all software and are what we're talking about when we say the web. The web is the software that makes up websites, applications, games, wikis and videos that you can access on a web browser.

As I've already mentioned, there are a lot of different web browsers out there. There are also many types of clients, mobile phones, apps, and games. In order for everyone to play nicely together, they have to speak the same language, and that language is called HTTP, or Hypertext Transfer Protocol.

Nearly all HTTP requests start with you typing in a domain name like google.com. Servers however, use numbers to locate each other, not words. These numbers are called IP addresses and work similarly to how phone numbers work, with each IP address pointing to a particular server on the Web.

The service in the middle, between the domain name you type in and the IP address used by the server, is called a Domain Name Server, or DNS. Once the DNS finds the IP address you asked for with the domain name, it rushes your request along to the server hosting the website.

For now, we can think of this request as a stamped, self-addressed envelope sent to the website's IP address with a return address of your computer's IP address. When the envelope is delivered to the server hosting the website, the server sends the envelope back to you with the website you wanted. When this happens, the HTTP trip is over and you have the thing you asked for, which is the web page of the website that you requested. In our example, it is the Google home page but this could be anything such as an image, text, application or video.

That took a long time to explain! But this happens so quickly that you don't even realize that the website you just visited has its server located on the other side of the planet! When you request a web page, it usually takes less than a couple of seconds and it's getting faster all the time. Now that we're somewhat familiar with the process of what happens when we surf the web, let's break down each step in more detail.

URLs

URL's, domain names, IP addresses and DNS are all quite closely tied to one another. The web consists of so many interconnections and we will discuss each of the main parts in more detail. The term URL stands for Uniform Resource Locator.

Back in the late 1980s, Sir Tim Berners Lee invented the concept of the World Wide Web and in doing so, he standardized many of the processes needed to make computers talk to each other. A URL is one of those processes and it is the way that the web is interconnected. Whenever you want to access something on the web, be it on your mobile device or desktop, you will need to provide a URL for that resource. This resource could be text, audio, video etc.

For instance, you might want to watch some videos on YouTube and to get there, you will type http://www.youtube.com in the address bar in your browser. This is a URL. It shows the method that your browser will use to retrieve this resource, which is the http protocol.

It also shows the network location, in human friendly form, of the resource that you wish to access. One of the most wonderful parts of the web is how everything is connected. Web pages are connected using what's called hyperlinks, and behind every hyperlink is a URL. Hyperlinks often look like the blue, underlined words on the page, but they can also look like a lot of things these days, such as images or buttons.

If the URL starts with http, the next bit in the URL, after the colon and two slashes, is the domain name. Like http://www.youtube.com. Everything after a domain name describes the full path to the resource. However, more and more URLs are becoming shorter. Nowadays, nearly everything is omitted in the address bar, including the http://, and you are left with what is often called the ‘naked’ URL or clean URLs.

However, just because you can't see these details, it doesn't mean that they're not there. Modern browsers have built in features that hide these details from you, but in actuality, they are needed to access any resource on the web. The one detail that's always left is the domain name, like youtube.com. While this domain name will help users find resources on the web, computers don't use names to find things. They use numbers, and these numbers are called IP addresses.

IP Addresses

Since the Internet is comprised of a global network of computers, there needs to be a way to identify each computer and device connected the Internet. This is where the Internet Protocol address (or IP) comes into play. Everything connected to the Internet has an IP address.

This includes computers, servers, cell phones and any other equipment that is connected to the Internet. All of these devices have a unique IP address that identifies them and enables them to connect to the Internet.

Think of an IP address like your home address, so if someone wants to send you a letter, they need to know your full address to post the letter to you. Likewise, all devices that communicate with each other on the Internet use an IP address to send and receive data. An IP address takes the form of four sets of numbers. Each of these numbers must be between 0 and 255 - for example, 77.121.147.234.

The Domain Name System

A domain name system is a special type of server that connects domain names with IP addresses, in other words it’s like a giant telephone book that connects an IP address with a domain name. This means that you don't have to keep your own address book of IPs to your favorite websites. Instead, all you have to do is remember the domain name and you just connect through DNS, which manages a massive database that maps domain names to IP addresses.

As you might be aware, there are millions and millions of websites and it is impossible for one computer server to contain the entire list of websites and their respective IP addresses. The DNS system is a distributed database. This means that portions of the database are divided and spread to many different servers on the Internet. So, if a DNS server does not contain the domain requested, it will redirect the request to another server.

As we’ve already seen, a web address is referred to as a URL, and this URL is divided up into various levels. These levels are: the top-level domain, domain name and host name.

So, if you wanted to go to the Wikipedia website, you would type in www.wikipedia.org – the .org part is the top-level domain, the Wikipedia is the domain and the www is the host name. There are several top-level domain names that you’ve probably come across before, these include: .com, .edu, .gov, .me etc. The inner workings of the DNS system is quite complex, but what is important to understand is that the DNS is the address book for the Internet, and it’s what connects domain names with IP addresses.

HTTP and HTTPS

Every request that you make on the web is sent using the HyperText Transfer Protocol or more commonly called HTTP. HTTP is like the language that clients and servers use to speak to each other. This is a standard way for all devices to communicate, because there are so many different types of clients and servers. HTTP defines the rules that every device, be it a client or server, must conform to in their communication.

In every HTTP communication, there are two pieces of information that are sent back to the client. The first is the status code, and then the requested resource such as a web page or a file. The status code helps identify the cause of the problem when a web page or other resource doesn’t load properly.

There are a whole host of status code numbers, each with a different meaning. When you request a web page and that page loads into your browser, the server would have a sent a status code equal to 200. This means that everything is OK and the resource requested has been sent. You can get a detailed breakdown of all the different status codes here on the Wikipedia page.

Sometimes, in your travels around the web, you'll see a lock icon in the address bar. The lock will usually be all the way to the left and the URL will start with https not http. The s stands for secure and it means that communication between the client and the server is private and encrypted. When surfing the web, don't use sites that require sensitive information, like medical records, banking details or even just username and passwords if they don't have the lock.

What is HTML & CSS?

As you have seen from how the web works, whenever you access a website, the webserver will send you the resource that you requested. This means that you will receive at a bare minimum two documents, a HTML and CSS file. The HTML file contains the structure of the website, just like you would give any document structure such as a title, heading, and paragraphs, while a HTML document contains the same structure that tells your browser how to render the web page.

The second file that you will most certainly receive when you make a request for a website is a CSS file. Whilst the HTML file species the structure of the document, the CSS file tells the browser how that documents looks like from colors to layout. Many years ago, it was quite normal to receive just one file that contained both the HTML and CSS, but those days are long gone and it is best practice to separate the structure of a webpage from its styling.

This means that whenever you are looking at a website, your browser has received both HTML and CSS files from a web server that hosts the site being accessed.

The web browser, this could be any web browser such as Google Chrome, Internet Explorer, Firefox etc, interprets the HTML and CSS code to create the page that you see. Small websites are created using just HTML and CSS, but larger websites make use of other technologies such as databases and advanced server side programming languages. However, HTML and CSS are the gateway to any website or app online, as without these two technologies, no one will be able to use the website.