Making a simple HTTP webserver in C

November 29, 2021 - Jan Pieter Bruins Slot

Introduction

While reading, and working through Operating Systems: Three Easy pieces1 I started with one of the assignments for making a webserver concurrent. The webserver was graciously provided by the authors, and the authors recommend figuring stuff out yourself, and do some more in-depth research. So, I thought it would be a cool idea to implement the webserver myself, as the basis for making it concurrent as the assignment2 asks you to do, and in the process learn more about socket programming. Additionally, I thought it would be a good learning experience, to create a post of the process, and share it.

So, in this post we will go over creating a simple webserver in C, step-by-step. First, we’ll start with some background into webservers, and subsequently add more code to our program. I’ve tried to make it so that you’ll be able to figure out what you need to code before looking at the results. However, if you just want to look at a specific part or just at the end result, then check the Implementation sections throughout the article. You can also look at the resulting code in this repository.

So, let’s get to it!

Basics

First, it might be good idea to get a feel of what we’re trying to create, to investigate what it is that we actually want to make. So, let’s figure out what a webserver actually is, and does.

webserver

A webserver, and in this case on the software side is able to satisfy client requests over HTTP, and other several related protocols. Its primary function is to store, process, deliver files to that client. At minimum this is an HTTP Server, which is a piece of software that understand URLs (Universal Resource Locator) and HTTP (Hypertext Transfer Protocol).

A webserver can be either be, or a combination of, a static webserver which simply serves files “as-is”. Or, a dynamic webserver in which the webserver runs an executable file on the webserver, and the output is returned to the client. It is dynamic because the webserver updates the hosted files before sending them to the client it does this “on-the-fly”.3

HTTP

As mentioned above a webserver in our case should be able to understand the HTTP protocol. What does a protocol mean in this context? Well, it is a set of rules for communication between two computers. In this case it specifies how to transfer hyper text documents, meaning documents that are interconnected by hyperlinks. The protocol is textual and stateless. Textual because that all the commands are plain text, and you’ll be able to read and inspect it. Stateless, because that neither the client nor server remembers previous communications.4

It means on the client side the application (for instance a web browser) needs to speak the same ‘language’ as the webserver in order to communicate. The ‘language’ that is used is HTTP.

A message is constructed that is either a request or a response from either the client or the server. This message needs to be transported, and that is where TCP comes in.

TCP

HTTP presumes an underlying transport layer protocol to establish host-to-host data transfer channels, and manage the data exchange in a client-to-server or peer-to-peer networking model. The protocol that is commonly used for HTTP server is TCP (Transmission Control Protocol), but it can also be adapted to be used with for instance UDP (User Datagram Protocol). However, because of RFC 26165 that states that the transport layer should be reliable, we will be using TCP instead of UDP.67

TCP maintains communications between application processes between hosts (client and server), and they use port numbers to track sessions. HTTP, and TCP protocols are part of a suite of several multiple protocols on each layer of the request/response cycle. This suite is also known as: TCP/IP.

Internet Protocol Suite a.k.a. TCP/IP

HTTP is part of the Internet Protocol Suite and it is called an application layer protocol. The Internet Protocol Suite is a model that is commonly known as TCP/IP because of the foundational protocols that make up the Internet Protocol Suite. Namely, the Transmission Control Protocol (TCP, present on the transport layer), and Internet Protocol (IP, present on the internet layer). 8

This suite is a conceptual model, and it consists out of set of protocols used in internet, and similar computer networks. It specifies how data should be packetized, addressed, transmitted, routed and received. The model is made up of 4 abstraction layers: the application, transport, internet and link layer.

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ APPLICATION LAYER                ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ http, ftp, smtp, ssh, etc.       │
└──────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ TRANSPORT LAYER                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ tcp, udp, etc.                   │
└──────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ INTERNET LAYER                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ ipv4, ipv6, etc.                 │
└──────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ LINK LAYER                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ arp, mac (ethernet, wifi, etc.)  │
└──────────────────────────────────┘

We stated above that: “HTTP presumes an underlying transport layer”. From the Internet Protocol Suite we can see that there are several available, and we already stated that TCP is commonly used. We can also see that the HTTP is in the application layer, next to other protocols that you might be familiar with such as FTP, Telnet, SSH, SMTP, etc. In essence HTTP is layered over TCP and uses it to transport its message data. In turn TCP is layered over IP, to make sure it ends up at the right location.

When data to the application layer is received, by which every program you are using (in the case of HTTP, a browser for instance), it talks to the transport layer through a port. Each port can be assigned to a different protocol in the application layer. In the case of HTTP this is port 80, so that TCP knows where the data is coming from.

Request / Response

As mentioned above: “HTTP is a set of rules for communication”. These rules are implemented in the request and response messages. You’re probably already familiar with its structure. The request message consist out of the following: a request line, the request header fields, an empty line, and an optional message body. In the following diagram you can see how a request and response message is built up when we access the webserver we are going to create using curl.

                                                                        
            ┌─┐                                        ┌─┐              
            └┬┘              ―――――――――――――▶            ╞ │              
            ▔▔▔              ◀―――――――――――――            └─┘              
    client request message                    server response message   
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓              ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ GET / HTTP/1.1            ┃ request line ┃ HTTP/1.0 200 OK           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━┩              ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Host: localhost:8080      │    headers   │ Server: webserver-c       │
│ User-Agent: curl/7.52.1   │              │ Content-type: text/html   │
│ Accept: */*               │              │                           │
└───────────────────────────┘              ├───────────────────────────┤
                                  body     │ <html>hello, world</html> │
                                           └───────────────────────────┘

We can even inspect what curl is sending and receiving, and we can see that everything is just in plain text. Pretty cool!

$ curl -vs http://localhost:8080

* Rebuilt URL to: http://localhost:8080/
*   Trying ::1...
* TCP_NODELAY set
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.52.1
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: webserver-c
< Content-type: text/html
<
<html>hello, world</html>
* Curl_http_done: called premature == 0
* Closing connection 0

When a request is made by the application layer, the message passes through the layers on one side, and back up through layer on the other side (represented by the solid line). Logically, one layer talks to the corresponding layer at the other side (represented by the dashed line).

                                         
      ┌─┐                       ┌─┐      
      └┬┘                       ╞ │      
      ▔▔▔                       └─┘      
     client                    server    
┏━━━━━━━━━━━━━┓╷╷       ╭╮┏━━━━━━━━━━━━━┓
┃ APPLICATION ┃││ ◀---▶ ││┃ APPLICATION ┃
┗━━━━━━━━━━━━━┛││       ││┗━━━━━━━━━━━━━┛
┏━━━━━━━━━━━━━┓│▲       ▼│┏━━━━━━━━━━━━━┓
┃ TRANSPORT   ┃││ ◀---▶ ││┃ TRANSPORT   ┃
┗━━━━━━━━━━━━━┛││       ││┗━━━━━━━━━━━━━┛
┏━━━━━━━━━━━━━┓││       ││┏━━━━━━━━━━━━━┓
┃ INTERNET    ┃││ ◀---▶ ││┃ INTERNET    ┃
┗━━━━━━━━━━━━━┛▼│       │▲┗━━━━━━━━━━━━━┛
┏━━━━━━━━━━━━━┓││       ││┏━━━━━━━━━━━━━┓
┃ LINK        ┃││ ◀---▶ ││┃ LINK        ┃
┗━━━━━━━━━━━━━┛│╰───────╯│┗━━━━━━━━━━━━━┛
               ╰─────────╯               

So, to recap: we’re creating an application (a webserver) that is able to receive/send plain text messages that adhere to rules of HTTP. Those messages are received through a transport layer. Our application will use TCP as this transport layer, and we need to implement that transport layer of HTTP. Both of these protocols are part of the Internet Protocol Suite (TCP/IP) and is provided by the operating system.9

Implementation

We can use the man pages to reference on how to start implementing this. Throughout this article we will be using man pages to get all the information we need to implement our webserver. The first man page we can look at is:

$ man 7 tcp

NAME
       tcp - TCP protocol

SYNOPSIS
       #include <sys/socket.h>
       #include <netinet/in.h>
       #include <netinet/tcp.h>

       tcp_socket = socket(AF_INET, SOCK_STREAM, 0);

DESCRIPTION

       ...

       A newly created TCP socket has no remote or local address and is
       not fully specified. To create an outgoing TCP connection use
       connect(2) to establish a connection to another TCP socket. To
       receive new incoming connections, first bind(2) the socket to a
       local address and port and then call listen(2) to put the socket
       into the listening state. After that a new socket for each
       incoming connection can be accepted using accept(2). A socket
       which has had accept(2) or connect(2) successfully called on it
       is fully specified and may transmit data. Data cannot be
       transmitted on listening or not yet connected sockets.  

       ...

NOTE: the 7 stands for the section number the page is from, and you can check what section it is by typing man man. Typically, man pages referred to using the notation name(section), since the same name can be present in different sections. Throughout this document we will use this notation so that you’ll be able to inspect the man pages. If you’re trying to find a specific man page, you can use the apropos {name} command to find name usage through the man pages.


From this man page we can read that we need to implement a ‘socket’ on which we can ‘listen’ for incoming connections, then we need to ‘bind’ the socket to a local address, and port. Then put the socket in a ‘listen’ state. After that we’re able to ‘accept’ incoming connections, for each accepted connection a new socket will be created, and we will be able to read and write to this socket. The following diagram gives a bit of an overview of what we need to implement.

                                                                                    
┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━━━┓
┃   SOCKET   ┃ ▶ ┃    BIND    ┃ ▶ ┃   LISTEN   ┃ ▶ ┃   ACCEPT   ┃ ▶ ┃  READ/WRITE  ┃
┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━━━┛

The tcp(7) man page also states, as we’ve uncovered from above, that it is layered upon ip(7), so let’s also take a look at the man page for that.

$ man 7 ip

NAME
       ip - Linux IPv4 protocol implementation

SYNOPSIS
       #include <sys/socket.h>
       #include <netinet/in.h>
       #include <netinet/ip.h> /* superset of previous */

       tcp_socket = socket(AF_INET, SOCK_STREAM, 0);
       udp_socket = socket(AF_INET, SOCK_DGRAM, 0);
       raw_socket = socket(AF_INET, SOCK_RAW, protocol);

Ok, cool! This gives us some more information on how to create sockets using other protocols, like udp, and raw. We’ll keep that in mind for further on in this article. But first let’s start with setting up our project.

Setting up

First, let’s start with setting up our environment in which we want to develop. We will start very simple, just to make sure everything is working:

// ./steps/step000.c
#include <stdio.h>

int main() {
    printf("hello, world\n");
    return 0;
}

Next we need to compile it, and we will be using gcc.

$ gcc -Wall webserver.c -o webserver

Let’s check if every worked.

$ ./webserver
hello, world

Ok, now that we’re set up. Lets get started with implementing our socket.

Implement the socket

From what we’ve read from the man page tcp(7), we need to implement a tcp socket. But, let’s also inspect what a socket is, and we can also use the man pages for this.

$ man socket

NAME
       socket - create an endpoint for communication

SYNOPSIS
       #include <sys/types.h>          /* See NOTES */
       #include <sys/socket.h>

       int socket(int domain, int type, int protocol);

DESCRIPTION
       socket() creates an endpoint for communication and returns a file
       descriptor that refers to that endpoint. The file descriptor returned by
       a successful call will be the lowest-numbered file descriptor not
       currently open for the process.
       ...

So, a socket creates an endpoint for communication. Furthermore, we can read that we need to include the <sys/socket.h> header file, and that we can create a socket endpoint to communicate by using the function: socket(2). This function returns a file descriptor which is an integer. The arguments that it accepts are: domain, type, protocol. We will look at the individual arguments and investigate how they need to be set in the following sections.

domain

The argument domain is an integer that specifies a communication domain, and it selects the protocol family that which will be used for communication. These families are defined in <sys/socket.h> these families are defined as constants in the header file, and we can reference them by their name and use them as the domain argument.

See the man page for an overview of what kind formats you’re able to choose. Since we’re creating a webserver that uses TCP we will be using AF_INET, which uses the IPv4 Internet protocols.


NOTE: Now, I wanted to know how this header file looked like, and I was able to inspect it further by installing the POSIX man page. These are described as: “Manual pages about using a POSIX system for development”. And will give more information about the specification of a C standard library for POSIX systems. It’s a specification for a number of routines that should be available in a basic C standard library, and it depends on how this standard C library is implemented on a system. The most commonly used implementation on Linux is the GNU C Library: glibc. With these manpages we can thus reference the specification.10 You can also find these pages on the following site: The Open Group.

How I installed the posix man pages on a debian based distro of linux:

$ sudo apt install manpages-posix-dev

And was able to reference the man page for <sys/socket.h> with the following command:

$ man sys_socket.h

type

The argument type specifies the ‘communication semantics’. So, which socket type do we need to use here? Well, we said we wanted to create a TCP webserver, so which of the options resembles that? Let’s refer back to man pages of tcp(7), and ip(7). There we can see that the valid socket type for a TCP socket is SOCK_STREAM. SOCK_STREAM is a full-duplex byte stream, and it is characterized as a type that ensures that data is not lost or duplicated.

protocol

protocol, according to the man page, is the particular protocol to be used with the socket. It is common that there exists only one protocol that will support a specific socket type. In our case where we are choosing SOCK_STREAM as the type, and as stated by the man 7 ip man page, protocol is the IP protocol in the IP header to be received or sent. And in this case of creating a TCP socket were the valid value is 0 for TCP sockets.

return value

The int socket(int domain, int type, int protocol) function returns an integer which is an file descriptor for the socket.11 The file descriptor is an unique number that identifies an open file, in this case this is our socket and just as a regular file we will be able to read and write to it. When an error occurred it will return the value -1, it will also set an errno which we can use to properly handle errors.

errno

From the last section the return value of the socket function the errno will be set. So what is this errno? Let’s check if there is a man page about it.

$ man errno

NAME
       errno - number of last error

SYNOPSIS
       #include <errno.h>

DESCRIPTION
       The <errno.h> header file defines the integer variable errno, which is
       set by system calls and some library functions in the event of an error
       to indicate what went wrong.

errno is an integer variable that can be set to signify what exactly has gone wrong. And in order to inspect what kind of error was raised, we can use perror(3) to print the error, it will translate the error code that has been set in the variable errno to a human-readable form. Lets check the man page for perror(3).

$ man 3 perror

NAME
       perror - print a system error message

SYNOPSIS
       #include <stdio.h>

       void perror(const char *s);

DESCRIPTION
       The perror() function produces a message on standard error describing
       the last error encountered during a call to a system or library
       function.
       ...

We can use perror(3) and set the argument s with a string, and it then will be appended with an error message that corresponds with the current value of errno.

Implementation

Now that we know how we should implement the socket(2) function, let’s update our file with what we have discussed above. It should resemble something like this:

// ./steps/step001.c
#include <errno.h>
#include <stdio.h>
#include <sys/socket.h>

int main() {
    // Create a socket
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) {
        perror("webserver (socket)");
        return 1;
    }
    printf("socket created successfully\n");

    return 0;
}

Now when we check our diagram from the section Basics we’ve now created the socket, but we need to bind it to an address otherwise no communication can be sent or received to this socket as we have read from the man page tcp(7).

                                                                                  
┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓
┃   SOCKET   ┃ ▶ ┃    BIND    ┃ ▶ ┃   LISTEN   ┃ ▶ ┃   ACCEPT   ┃ ▶ ┃ READ/WRITE ┃
┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛
 ───────────▶                                                                     

So our next step will be to bind the socket to a local address and port.

Bind the socket to an address

The socket is created and exists in a namespace (an address family, the AF in AF_INET stands for address family), and we need to bind the socket to a local address, in order for the socket to receive connections. We need to be using the bind(2) function for this, so let’s check out the bind(2) man pages on how we need to implement this.

$ man bind

NAME
       bind - bind a name to a socket

SYNOPSIS
       #include <sys/types.h>          /* See NOTES */
       #include <sys/socket.h>

       int bind(int sockfd, const struct sockaddr *addr,
                socklen_t addrlen);

DESCRIPTION                                                              
       When a socket is created with socket(2), it exists in a name space
       (address family) but has no address assigned to it. bind() assigns the
       address specified by addr to the socket referred to by the file
       descriptor sockfd. addrlen specifies the size, in bytes, of the address
       structure pointed to by addr. Traditionally, this operation is called
       “assigning a name to a socket”.
    
       It is normally necessary to assign a local address using bind() before
       a SOCK_STREAM socket may receive connections (see accept(2)).

       ...

We can see that bind(2) is included in the header file, <sys/socket.h>. And, on success it will return zero. It accepts as arguments: sockfd, *addr, and addrln. Let’s go over the arguments, and make sense of what we need to do in order to implement it.

sockfd

This is the file descriptor that we’ve created with socket(2) in the last section. And we need to use that here as the first argument.

addr

This defines the address structure to which we want to bind the socket to, and it depends on the address family we’re using. So let’s check what addr needs to look like. We can inspect the rules used in the name binding, by referencing the man page of the communication domain we’re using: AF_INET.

$ man 7 ip

From the section ‘Address format’ we can see an example. The address structure will look like the following:

struct sockaddr_in {
   sa_family_t    sin_family; /* address family: AF_INET */
   in_port_t      sin_port;   /* port in network byte order */
   struct in_addr sin_addr;   /* internet address */
};

/* Internet address. */
struct in_addr {
   uint32_t       s_addr;     /* address in network byte order */
};

sin_family is always set to AF_INET, sin_port contains the port in network byte order. Network byte order represents how bytes are arranged when sending data over a network, an order must be chosen to make sure that on both ends, the machines interpret the numbers the same way independent of the cpu architecture.

For example an integer value of 1 represented as 4 bytes would be represented on ‘big endian’ machines as 0 0 0 1, on a ‘little endian’ machines this would be 1 0 0 0. The value of 0 0 0 1 of the ‘big endian’ machine would then be interpreted by the ‘little endian’ machine as the value 16777216, and vice versa.1213

And, as such, like the man page states, we need to call htons(3) on the number that is assigned to the port. Like so: htons(8080). It will convert the host byte order to network byte order. See the man page at for htons(3) for more information.

sin_addr contains the host interface address in network byte order, and it is a member of the struct named in_addr. The man page states that it should be one of the INADDR_* values. These are defined as symbolic constants in the header file <netinet/in.h>, or can set it by using one of the inet_aton(3), inet_addr(3), or inet_makeaddr(3) library functions, to specify a specific address. We can also inspect the POSIX man page to see how the header file should be implemented on systems: man netinet_in.h.

We will make use of the symbolic constant INADDR_ANY, and it means ‘any address’, which translates to the 0.0.0.0. INADDRY_ANY is already in network byte order, so we don’t really have to convert it. The man page advices us to convert, so lets just implement it. We do this by calling htonl(3) on the address. But why are we using 0.0.0.0, here? This is just your machine’s IP address. Your machine will have one IP address for each network interface. When your machine has for example Wi-Fi, and an ethernet connection, then that machine will have two addresses, one for each interface. When we don’t care what interface is going to be used we use the special address for this, 0.0.0.0 which is defined in the symbolic constant INADDR_ANY translates to this address.

addrlen

The addrlen argument specifies the size of the address structure addr in bytes. To get this we can use the sizeof() operator (it looks like a function, but it is an operator like &&, ||, etc.). The argument is of type socklen_t which is an integer type, and we can get some background on specific type by inspecting the man page of accept(2).

In the original BSD sockets implementation (and on other older systems)
the third argument of accept() was declared as an int *. A POSIX.1g
draft standard wanted to change it into a size_t *C; later POSIX
standards and glibc 2.x have socklen_t *.

return value

On success the return value for bind(2) will be zero, when an error occurred it will return -1, and errno will also be set.

Implementation

From what we have seen we’re able to implement the bind(2) function. First, we will create the address structure, and then we can bind it to the socket. The updated code will look something like this:

// ./steps/step002.c
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <sys/socket.h>

#define PORT 8080

int main() {
    // Create a socket
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) {
        perror("webserver (socket)");
        return 1;
    }
    printf("socket created successfully\n");

    // Create the address to bind the socket to
    struct sockaddr_in host_addr;
    int host_addrlen = sizeof(host_addr);

    host_addr.sin_family = AF_INET;
    host_addr.sin_port = htons(PORT);
    host_addr.sin_addr.s_addr = htonl(INADDR_ANY);

    // Bind the socket to the address
    if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
        perror("webserver (bind)");
        return 1;
    }
    printf("socket successfully bound to address\n");

    return 0;
}

Note that we are typecasting addr to the struct pointer struct sockaddr * in the argument of the bind(2) function. Since addr is of the type struct sockaddr_in we need to cast it to struct sockaddr *. From the man page bind(2) we can read: “The only purpose of this structure (sockaddr) is to cast the structure pointer passed in addr in order to avoid compiler warnings.” In essence what we are doing here is: whatever addr is pointing to, act like a sockaddr.14

                                                                                  
┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓
┃   SOCKET   ┃ ▶ ┃    BIND    ┃ ▶ ┃   LISTEN   ┃ ▶ ┃   ACCEPT   ┃ ▶ ┃ READ/WRITE ┃
┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛
 ───────────▶     ───────────▶                                                    

Referring back to our diagram, we’ve now also bound the socket to a specific address. Now we can are ready to listen for incoming connections. So lets implement that.

Listen

We’ve created a socket and bounded it to a local address, now we need to make sure that the socket is listening for incoming connection. We do that by using listen(2) function. This will make the socket available for incoming connections. Let’s see what the man pages can show us on how to use listen(2).

$ man 2 listen

NAME
       listen - listen for connections on a socket

SYNOPSIS
       #include <sys/types.h>          /* See NOTES */
       #include <sys/socket.h>

       int listen(int sockfd, int backlog);

DESCRIPTION
       listen() marks the socket referred to by sockfd as a passive socket,
       that is, as a socket that will be used to accept incoming connection
       requests using accept(2).
       ...

As we can see, the listen(2) function will put the socket into ‘passive’ mode. Stream sockets are often ‘active’ or ‘passive’.

  • When a socket is created with the socket(2) function, it is set to active. This socket can then be used in the connect(2) function to establish a connection to a ‘passive’ socket.

  • A passive socket can allow incoming connections by passing it to the listen(2) function.

In most applications that use stream sockets, the servers we will perform the so called ‘passive socket open’, and on the client an ‘active socket open’. Since we’re creating a http webserver, and using the listen(2) function in order to listen for incoming connections, the socket that we’ve created will be a passive socket, and will be used to accept connect connections from other (active) sockets. 15

sockfd

Again, this is the file descriptor of the socket, and thus we will use the sockfd that we’ve created in section 3.

backlog

This integer will define how many pending connections will be queued up for sockfd socket, before it will be refused. For now, we will set this to 128. Further connection requests block until a pending connection is accepted. So, it defines the number of connections that are accepted, but not yet handled by the application, until accept(2) gets it off the queue.

From listen(2) the ‘NOTES’ section:

...

The behavior of the backlog argument on TCP sockets changed with Linux 2.2. Now
it specifies the queue length for completely established sockets waiting to be
accepted, instead of the number of incomplete connection requests. The maximum
length of the queue for incomplete sockets can be set using
/proc/sys/net/ipv4/tcp_max_syn_backlog. When syncookies are enabled there is no
logical maximum length and this setting is ignored. See tcp(7) for more
information.

If the backlog argument is greater than the value in
/proc/sys/net/core/somaxconn, then it is silently truncated to that value;
the default value in this file is 128. In kernels before 2.4.25, this limit was
a hard coded value, SOMAXCONN, with the value 128.

...

The symbolic constant SOMAXCONN in <sys/socket.h> is defined by our system (128 in the case of Linux), and we can use it to set the backlog argument (man sys_socket.h).

return value

On success, zero will be returned, on failure -1 will be returned, and as before errno will also set, so we can check and handle it accordingly.

Implementation

With the above information we are able to implement the listen(2) function, so let’s update our code:

// ./steps/step003.c
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <sys/socket.h>

#define PORT 8080

int main() {
    // Create a socket
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) {
        perror("webserver (socket)");
        return 1;
    }
    printf("socket created successfully\n");

    // Create the address to bind the socket to
    struct sockaddr_in host_addr;
    int host_addrlen = sizeof(host_addr);

    host_addr.sin_family = AF_INET;
    host_addr.sin_port = htons(PORT);
    host_addr.sin_addr.s_addr = htonl(INADDR_ANY);

    // Bind the socket to the address
    if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
        perror("webserver (bind)");
        return 1;
    }
    printf("socket successfully bound to address\n");

    // Listen for incoming connections
    if (listen(sockfd, SOMAXCONN) != 0) {
        perror("webserver (listen)");
        return 1;
    }
    printf("server listening for connections\n");

    return 0;
}

Referring back to our diagram, we’ve created a socket, bound it to a local address, and we’ve put the socket into ‘passive’ mode. Now we can listen for incoming connections.

                                                                                  
┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓
┃   SOCKET   ┃ ▶ ┃    BIND    ┃ ▶ ┃   LISTEN   ┃ ▶ ┃   ACCEPT   ┃ ▶ ┃ READ/WRITE ┃
┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛
 ───────────▶     ───────────▶     ───────────▶                                   

So, on to accept those connections.

Accept

Now we’re ready to make sure the socket will accept connections. We need to use the accept(2) function, and let’s check the man pages again on how we need to implement this.

$ man 2 accept

NAME
       accept, accept4 - accept a connection on a socket

SYNOPSIS
       #include <sys/types.h>          /* See NOTES */
       #include <sys/socket.h>

       int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

DESCRIPTION
       The accept() system call is used with connection-based socket types
       (SOCK_STREAM, SOCK_SEQ‐ PACKET). It extracts the first connection
       request on the queue of pending connections for the listening socket,
       sockfd, creates a new connected socket, and returns a new file
       descriptor referring to that socket. The newly created socket is not in
       the listening state. The original socket sockfd is unaffected by this
       call.
       ...

So the accept(2) function will get the first connection from the queue of the listening socket sockfd. Then it will create a new connected socket, and the file descriptor that points to that socket will be returned. The newly created socket is however not in a listening state, and thus the original socket is not affected by this call and can be used to accept other connections. When there are no pending connections when the accept(2) function is called, the call blocks until a new connection arrives.

Again, let’s look at the arguments that we need to provide to accept connections.

sockfd

Like before, we will use the original socket that was created in Implement the socket and here sockfd is the file descriptor of the socket.

addr

The argument addr is a pointer that refers to a sockaddr struct, this needs to be the address of the original socket that we’ve created, and we need the pointer to that struct here.

addrlen

The addrlen is a value result argument, it points to the size of the buffer pointed to by the argument addr. Because accept() can accept multiple protocol families we need to provide the size of the address that we are using. A pointer is used because: “the caller must initialize it to contain the size (in bytes) of the structure pointed to by addr; on return it will contain the actual size of the peer address.” The kernel then knows how much space is available to return the socket address. Upon return from the accept(2) function, the value of addrlen is set to indicate the number of bytes of data actually stored by the kernel in the socket address structure. 16

When binding our socket (Bind the socket to an address), we’ve already created our addrlen variable with the size of the sockaddr struct, so we can just pass it to the accept(2) function. However, the original variable was an int, so we need to typecast it to socklen_t * to make it work.

return value

It will return a non-negative integer that is a file descriptor for the accepted socket. On error, it will return -1, and errno will be set.

Implementation

Because we want to continue accepting new connections we will put the accept(2) function in a continuous loop. Important to note is that we also need to close the file descriptor we’ve created by using accept(2). We can close the socket by calling the close(2) function.

$ man 2 close

NAME
       close - close a file descriptor

SYNOPSIS
       #include <unistd.h>

       int close(int fd);

DESCRIPTION
       close() closes a file descriptor, so that it no longer refers to any
       file and may be reused. Any record locks (see fcntl(2)) held on the file
       it was associated with, and owned by the process, are removed
       (regardless of the file descriptor that was used to obtain the lock).
       ... 

When we’re done with the socket we can just use it as the argument fd in the function close(2) this will close the file descriptor, so that it no longer refers to any file and may be reused. When we update our code and implement the accept(2) function it should resemble the following:

// ./steps/step004.c
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <sys/socket.h>
#include <unistd.h>

#define PORT 8080

int main() {
    // Create a socket
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) {
        perror("webserver (socket)");
        return 1;
    }
    printf("socket created successfully\n");

    // Create the address to bind the socket to
    struct sockaddr_in host_addr;
    int host_addrlen = sizeof(host_addr);

    host_addr.sin_family = AF_INET;
    host_addr.sin_port = htons(PORT);
    host_addr.sin_addr.s_addr = htonl(INADDR_ANY);

    // Bind the socket to the address
    if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
        perror("webserver (bind)");
        return 1;
    }
    printf("socket successfully bound to address\n");

    // Listen for incoming connections
    if (listen(sockfd, SOMAXCONN) != 0) {
        perror("webserver (listen)");
        return 1;
    }
    printf("server listening for connections\n");

    for (;;) {
        // Accept incoming connections
        int newsockfd = accept(sockfd, (struct sockaddr *)&host_addr,
                               (socklen_t *)&host_addrlen);
        if (newsockfd < 0) {
            perror("webserver (accept)");
            continue;
        }
        printf("connection accepted\n");

        close(newsockfd);
    }

    return 0;
}

And when we check our diagram again, we can that we’ve implemented the accept(2) function.

                                                                                  
┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓
┃   SOCKET   ┃ ▶ ┃    BIND    ┃ ▶ ┃   LISTEN   ┃ ▶ ┃   ACCEPT   ┃ ▶ ┃ READ/WRITE ┃
┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛
 ───────────▶     ───────────▶     ───────────▶     ───────────▶                  

Now, and we’re now ready to starting reading and writing to the socket.

Read

If we recall back from man page of socket(2) we read the following:

...

A connection to another socket is created with a connect(2) call. Once
connected, data may be transferred using read(2) and write(2) calls or some
variant of  the send(2) and recv(2) calls.  When a session has been completed
a close(2) may be performed.

...

We can read and write by using the read(2) and write(2) functions, or some variant of send(2) and recv(2) calls. From the man page of send(2) we can read that send(2) provides extra flags that we might use. In this case we won’t be using those, and as such we can stick with read(2) and write(2). 17

Because we’ve setup a connection between the client and the server we can read the request of the client. Since, we got a file descriptor with the accept(2) function, we will be able to use the read(2) function to read the data that has been sent by the client. Let’s check the man pages on how we’re able to use the read(2) function.

$ man 2 read

NAME
       read - read from a file descriptor

SYNOPSIS
       #include <unistd.h>

       ssize_t read(int fd, void *buf, size_t count);

DESCRIPTION:
       read() attempts to read up to count bytes from file descriptor fd into
       the buffer starting at buf.
       ...

The function read(2) will read up to count bytes from the file descriptor fd into the buffer *buf. On success it will return the number of bytes that were read, and the file position is advanced by this number. On error -1 is returned and errno will be set as well.

The file position keeps track of where in the file the next character is to be read or written. This is ‘offset’ being recorded by the kernel. 18 On all POSIX.1 systems, the file position is an integer representing the number of bytes from the beginning of the file. The file position is normally set to the beginning of the file when it is opened, and each time a character is read or written, the file position is incremented sequentially. 19

fd

This argument fd needs to be the file descriptor, this is the new socket that was returned by the accept(2) function.

buf

The buf argument needs to be a pointer to the address of the memory buffer that you want the contents of the file descriptor fd to be read into as a temporary storage. This buffer must be at least count bytes long. In our case we will be creating a buffer that will be an array of the type char. And because the array name is converted to pointer, we can use the variable name of the buffer as the argument.

count

We need to provide how many bytes we want to ready from the file descriptor fd into the buffer. This depends on how large of a buffer you’re creating. In this example we’ll create an array of 2048 characters.

The type size_t is a unsigned integer type. It is commonly used by the standard library to represent sizes and counts. Its specific size is platform dependent.

return value

The return value is the number of bytes that were read into the buffer, or 0 if the end of the file has been reached. On error, -1 is returned, and errno is set appropriately. The ssize_t is a ‘signed’ integer type, again it is commonly used by the standard library to represent sizes and counts, and it holds the byte count of what was read into the buffer.

Implementation

First, we need to implement the buffer, we will do this as soon as the program begins. For now, we will create an array of the type char with a size of 2048. (each char is 1 byte). Don’t forget to include the header file uninstd.h.

You’ll also be able to check the number of bytes that are read, and continue reading when the limit of the buffer is reached. However for now we’ll keep it simple.

Additionally, because the requests is likely sent using the HTTP protocol. We can expect a certain format of the request, as we’ve uncovered that in the beginning. Here you’ll be able to inspect the contents of the request and return data based on the request, like return specific requested html file, handling GET, POST requests, as well as error handling for when the request wasn’t using the HTTP protocol for instance, etc.

// ./steps/step005.c
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <sys/socket.h>
#include <unistd.h>

#define PORT 8080
#define BUFFER_SIZE 1024

int main() {
    char buffer[BUFFER_SIZE];

    // Create a socket
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) {
        perror("webserver (socket)");
        return 1;
    }
    printf("socket created successfully\n");

    // Create the address to bind the socket to
    struct sockaddr_in host_addr;
    int host_addrlen = sizeof(host_addr);

    host_addr.sin_family = AF_INET;
    host_addr.sin_port = htons(PORT);
    host_addr.sin_addr.s_addr = htonl(INADDR_ANY);

    // Bind the socket to the address
    if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
        perror("webserver (bind)");
        return 1;
    }
    printf("socket successfully bound to address\n");

    // Listen for incoming connections
    if (listen(sockfd, SOMAXCONN) != 0) {
        perror("webserver (listen)");
        return 1;
    }
    printf("server listening for connections\n");

    for (;;) {
        // Accept incoming connections
        int newsockfd = accept(sockfd, (struct sockaddr *)&host_addr,
                               (socklen_t *)&host_addrlen);
        if (newsockfd < 0) {
            perror("webserver (accept)");
            continue;
        }
        printf("connection accepted\n");

        // Read from the socket
        int valread = read(newsockfd, buffer, BUFFER_SIZE);
        if (valread < 0) {
            perror("webserver (read)");
            continue;
        }

        close(newsockfd);
    }

    return 0;
}

Write

Now that we are able to read the message that the client has sent to us, we also want to relay something back to the client again. Because we’re implementing a webserver, we’re going to return a simple webpage. Again, because we are implementing a HTTP webserver we need to adhere to the HTTP protocol. That means that we need to structure our response to these rules.

We will be using the same socket the accept(2) function that we’ve just read from. Because this socket is a file descriptor we will, just as with read(2), be able to write to this socket using the write(2) function.

$ man 2 write

NAME
       write - write to a file descriptor

SYNOPSIS
       #include <unistd.h>

       ssize_t write(int fd, const void *buf, size_t count);

       write() writes up to count bytes from the buffer starting at buf to the
       file referred to by the file descriptor fd.
       ...

The function write(2) will write bytes up to count from the buffer pointed to by buf to the file referenced by the file descriptor fd. On success it will return number of bytes written. On error -1 is returned and errno will be set appropriately.

fd

As mentioned above the argument fd is the file descriptor that references the socket we’ve created by calling the accept(2) function. This is the also the same file descriptor from which we read the request with the function read(2).

buf

The buf arguments need to be a pointer to what we want to write to as a response. For now we will use a pre-defined string that we will add to the server. We can eventually extend this webserver to serve actual html files. But for now we will add the following:

char resp[] = "HTTP/1.0 200 OK\r\n"
"Server: webserver-c\r\n"
"Content-type: text/html\r\n\r\n"
"<html>hello, world</html>\r\n";

Note, that the string is formatted following the HTTP protocol. (See: Basics) We start with the request line, followed by the headers, and it ends with the body. The escape code \r\n is used to separate the different sections of the request. The escape code \r stands for carriage return and will set the cursor at the beginning of the line, and \n for new line and will move the cursor to a new line.

count

The argument count is the number of bytes we need write to the file fd from buffer buf. Because we want to write the complete contents we need to know how many bytes there are in the buffer. We do that by using strlen().

$ man 3 strlen

NAME
       strlen - calculate the length of a string

SYNOPSIS
       #include <string.h>

       size_t strlen(const char *s);

DESCRIPTION
       The  strlen() function calculates the length of the string pointed to by
       s, excluding the terminating null byte ('\0').

RETURN VALUE
       The strlen() function returns the number of characters in the string
       pointed to by s.

So, we can provide the s argument and we will get the number of characters we provided in the string point to by s.

return value

The return value of the write(2) function will be the number of bytes written to the file and its type is ssize_t. From before we’ve noted that size_t was used to represent sizes and counts, this version is the signed version of size_t, which means that it can hold values less than zero. So, in this case a value of less than zero, -1 is means an error occurred and errno will be set appropriately. When the return value is zero, it indicates that nothing was written. It will not mean that an error occurred when this number is smaller than the number of bytes that were requested.

Implementation

We will implement this first by creating the buffer, and we will follow the instructions we’ve mentioned above. Because, we are done with writing to the newly created socket we need to also close it. We’ve already called close(2) in the last section when we used the read(2) function. Now before we close it though, we want to write to it. So put the write(2) function above the close(2) function.

#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>

#define PORT 8080
#define BUFFER_SIZE 1024

int main() {
    char buffer[BUFFER_SIZE];
    char resp[] = "HTTP/1.0 200 OK\r\n"
                  "Server: webserver-c\r\n"
                  "Content-type: text/html\r\n\r\n"
                  "<html>hello, world</html>\r\n";

    // Create a socket
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) {
        perror("webserver (socket)");
        return 1;
    }
    printf("socket created successfully\n");

    // Create the address to bind the socket to
    struct sockaddr_in host_addr;
    int host_addrlen = sizeof(host_addr);

    host_addr.sin_family = AF_INET;
    host_addr.sin_port = htons(PORT);
    host_addr.sin_addr.s_addr = htonl(INADDR_ANY);

    // Bind the socket to the address
    if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
        perror("webserver (bind)");
        return 1;
    }
    printf("socket successfully bound to address\n");

    // Listen for incoming connections
    if (listen(sockfd, SOMAXCONN) != 0) {
        perror("webserver (listen)");
        return 1;
    }
    printf("server listening for connections\n");

    for (;;) {
        // Accept incoming connections
        int newsockfd = accept(sockfd, (struct sockaddr *)&host_addr,
                               (socklen_t *)&host_addrlen);
        if (newsockfd < 0) {
            perror("webserver (accept)");
            continue;
        }
        printf("connection accepted\n");

        // Read from the socket
        int valread = read(newsockfd, buffer, BUFFER_SIZE);
        if (valread < 0) {
            perror("webserver (read)");
            continue;
        }

        // Write to the socket
        int valwrite = write(newsockfd, resp, strlen(resp));
        if (valwrite < 0) {
            perror("webserver (write)");
            continue;
        }

        close(newsockfd);
    }

    return 0;
}

And that concludes the implementation of all the steps from the section Basics.

                                                                                  
┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓   ┏━━━━━━━━━━━━┓
┃   SOCKET   ┃ ▶ ┃    BIND    ┃ ▶ ┃   LISTEN   ┃ ▶ ┃   ACCEPT   ┃ ▶ ┃ READ/WRITE ┃
┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛   ┗━━━━━━━━━━━━┛
 ───────────▶     ───────────▶     ───────────▶     ───────────▶     ───────────▶ 

All what is left to do is to compile and run the program!

Let’s run it!

We’ll go to the command line and run the following command:

$ gcc -Wall webserver.c -o webserver
$ ./webserver

Now, you should be able to open your browser and check: http://localhost:8080, and you should be greeted by the ‘hello, world’ message. Let’s also try to implement some logging to the terminal, so we can see who is making the request and what the request was.

Client address

In order to get the client address information we can use the function getsockname(2). Let’s see what the man page says:

NAME
       getsockname - get socket name

SYNOPSIS
       #include <sys/socket.h>

       int getsockname(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

DESCRIPTION
       getsockname() returns the current address to which the socket sockfd is
       bound, in the buffer pointed to by addr. The addrlen argument should be
       initialized to indicate the amount of space (in bytes) pointed to by
       addr. On return it contains the actual size of the socket address.

       The returned address is truncated if the buffer provided is too small;
       in this case, addrlen will return a value greater than was supplied to
       the call.

RETURN VALUE
       On success, zero is returned. On error, -1 is returned, and errno is
       set appropriately.

sockfd

So from the man page we read that we can get the current address to which the socket sockfd is bound. We got a new connected socket with the client from accept that is called newsockfd so we can use that in the getsockname(2) function.

addr

This argument should be a pointer to a struct sockaddr structure, and should looking familiar since we’ve used it before when we used bind(2). So we will use the same structure.

addrlen

Like the addr argument we’ve used in the bind(2) function, this argument will also be a pointer to a socklen_t struct. Again we will be doing the same as we did with bind(2).

Implementation

When we’ve implemented the getsockname(2) we will be able to relay the client’s ip address and port. These are available from the sin_addr and sin_port fields of the struct sockaddr_in structure.

We need to convert them to a string representation, that we can use to print. For that we will use the inet_ntoa(3)function (Internet host address, given in network byte order, to a string in IPv4 dotted decimal notation), and the ntohs(3) (network byte order to short integer byte order)

// ./steps/step007.c
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>

#define PORT 8080
#define BUFFER_SIZE 1024

int main() {
    char buffer[BUFFER_SIZE];
    char resp[] = "HTTP/1.0 200 OK\r\n"
                  "Server: webserver-c\r\n"
                  "Content-type: text/html\r\n\r\n"
                  "<html>hello, world</html>\r\n";

    // Create a socket
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) {
        perror("webserver (socket)");
        return 1;
    }
    printf("socket created successfully\n");

    // Create the address to bind the socket to
    struct sockaddr_in host_addr;
    int host_addrlen = sizeof(host_addr);

    host_addr.sin_family = AF_INET;
    host_addr.sin_port = htons(PORT);
    host_addr.sin_addr.s_addr = htonl(INADDR_ANY);

    // Create client address
    struct sockaddr_in client_addr;
    int client_addrlen = sizeof(client_addr);

    // Bind the socket to the address
    if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
        perror("webserver (bind)");
        return 1;
    }
    printf("socket successfully bound to address\n");

    // Listen for incoming connections
    if (listen(sockfd, SOMAXCONN) != 0) {
        perror("webserver (listen)");
        return 1;
    }
    printf("server listening for connections\n");

    for (;;) {
        // Accept incoming connections
        int newsockfd = accept(sockfd, (struct sockaddr *)&host_addr,
                               (socklen_t *)&host_addrlen);
        if (newsockfd < 0) {
            perror("webserver (accept)");
            continue;
        }
        printf("connection accepted\n");

        // Get client address
        int sockn = getsockname(newsockfd, (struct sockaddr *)&client_addr,
                                (socklen_t *)&client_addrlen);
        if (sockn < 0) {
            perror("webserver (getsockname)");
            continue;
        }

        // Read from the socket
        int valread = read(newsockfd, buffer, BUFFER_SIZE);
        if (valread < 0) {
            perror("webserver (read)");
            continue;
        }
        printf("[%s:%u]\n", inet_ntoa(client_addr.sin_addr),
               ntohs(client_addr.sin_port));

        // Write to the socket
        int valwrite = write(newsockfd, resp, strlen(resp));
        if (valwrite < 0) {
            perror("webserver (write)");
            continue;
        }

        close(newsockfd);
    }

    return 0;
}

Get request headers

We’ve read the request from the client from the socket with read(2), now we can print the contents of the request message from the buffer, and we should see how the request of the client looks like. Since the client is sending the request adhering the HTTP protocol, we can use sscanf(3) to parse the request. Since the request line is the first line of the request, and is structured as follows: <method> <path> <version>, we can use sscanf(3) to parse the request line, and get the method, path and version.

Let’s check the man page for sscanf(3):

NAME
       scanf, fscanf, sscanf, vscanf, vsscanf, vfscanf - input format conversion

SYNOPSIS
       #include <stdio.h>

       int sscanf(const char *str, const char *format, ...);

DESCRIPTION

        ...
        The scanf() function reads input from the standard input stream
        stdin, fscanf() reads input from the stream pointer stream, and
        sscanf() reads its input from the character string pointed to by
        str.
        ...

str

This argument is a pointer to a character string that contains the input data. We will thus use the buffer that we’ve read from the socket.

format

The format argument is a string that specifies the format of the input data. And we can use string formatting to parse the request line.

return value

On success, the number of input items successfully converted and assigned is returned. On failure, the return value is EOF, and no characters are assigned to the arguments.

Implementation

// webserver.c
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>

#define PORT 8080
#define BUFFER_SIZE 1024

int main() {
    char buffer[BUFFER_SIZE];
    char resp[] = "HTTP/1.0 200 OK\r\n"
                  "Server: webserver-c\r\n"
                  "Content-type: text/html\r\n\r\n"
                  "<html>hello, world</html>\r\n";

    // Create a socket
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) {
        perror("webserver (socket)");
        return 1;
    }
    printf("socket created successfully\n");

    // Create the address to bind the socket to
    struct sockaddr_in host_addr;
    int host_addrlen = sizeof(host_addr);

    host_addr.sin_family = AF_INET;
    host_addr.sin_port = htons(PORT);
    host_addr.sin_addr.s_addr = htonl(INADDR_ANY);

    // Create client address
    struct sockaddr_in client_addr;
    int client_addrlen = sizeof(client_addr);

    // Bind the socket to the address
    if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
        perror("webserver (bind)");
        return 1;
    }
    printf("socket successfully bound to address\n");

    // Listen for incoming connections
    if (listen(sockfd, SOMAXCONN) != 0) {
        perror("webserver (listen)");
        return 1;
    }
    printf("server listening for connections\n");

    for (;;) {
        // Accept incoming connections
        int newsockfd = accept(sockfd, (struct sockaddr *)&host_addr,
                               (socklen_t *)&host_addrlen);
        if (newsockfd < 0) {
            perror("webserver (accept)");
            continue;
        }
        printf("connection accepted\n");

        // Get client address
        int sockn = getsockname(newsockfd, (struct sockaddr *)&client_addr,
                                (socklen_t *)&client_addrlen);
        if (sockn < 0) {
            perror("webserver (getsockname)");
            continue;
        }

        // Read from the socket
        int valread = read(newsockfd, buffer, BUFFER_SIZE);
        if (valread < 0) {
            perror("webserver (read)");
            continue;
        }

        // Read the request
        char method[BUFFER_SIZE], uri[BUFFER_SIZE], version[BUFFER_SIZE];
        sscanf(buffer, "%s %s %s", method, uri, version);
        printf("[%s:%u] %s %s %s\n", inet_ntoa(client_addr.sin_addr),
               ntohs(client_addr.sin_port), method, version, uri);

        // Write to the socket
        int valwrite = write(newsockfd, resp, strlen(resp));
        if (valwrite < 0) {
            perror("webserver (write)");
            continue;
        }

        close(newsockfd);
    }

    return 0;
}

Conclusion / Next Steps

And that concludes our implementation of a simple HTTP server. We’ve covered the basics of implementing a simple HTTP web server, doing so we’ve also learned a bit about TCP/IP, HTTP, socket programming, and system calls. The resulting code gives you a basis to start implementing a more complex HTTP webserver that can handle multiple concurrent connections, and that can serve html files from the filesystem based on the path of the request.

References

Arpaci-Dusseau, Remzi H., and Andrea C. Arpaci-Dusseau. 2018. Operating Systems: Three Easy Pieces. Arpaci-Dusseau Books. http://pages.cs.wisc.edu/~remzi/OSTEP/.
Github. 2020. “Remzi-Arpacidusseau - Ostep Concurrency Webserver - Github.” 2020. https://github.com/remzi-arpacidusseau/ostep-projects/tree/master/concurrency-webserver.
Kerrisk, N. 2010. The Linux Programming Interface: a Linux and UNIX system programming handbook. No Starch Press.
Mozilla. 2020. “What Is a Webserver - Mozilla.” 2020. https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_web_server.
“The GNU C Library Reference Manual.” n.d. https://www.gnu.org/software/libc/manual/.
Wikipedia. 2020. “Webserver - Wikipedia, The Free Encyclopedia.” 2020. https://en.wikipedia.org/wiki/Web_server.

  1. Arpaci-Dusseau and Arpaci-Dusseau (2018)↩︎

  2. https://github.com/remzi-arpacidusseau/ostep-projects/tree/master/concurrency-webserver↩︎

  3. Github (2020); Mozilla (2020); Wikipedia (2020)↩︎

  4. Mozilla (2020)↩︎

  5. https://tools.ietf.org/html/rfc2616↩︎

  6. https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol↩︎

  7. https://en.wikipedia.org/wiki/Application_layer↩︎

  8. https://en.wikipedia.org/wiki/Internet_protocol_suite↩︎

  9. https://github.com/remzi-arpacidusseau/ostep-projects/tree/master/concurrency-webserver#http-background↩︎

  10. https://en.wikipedia.org/wiki/C_POSIX_library↩︎

  11. https://en.wikipedia.org/wiki/File_descriptor↩︎

  12. https://tools.ietf.org/html/draft-newman-network-byte-order-01↩︎

  13. The terms derive from Jonathan Swift’s 1726 satirical novel Gulliver’s Travels, in which the terms refer to opposing political factions who open their boiled eggs at opposite ends. [Kerrisk (2010); ch. 59.2]↩︎

  14. http://www.cplusplus.com/forum/general/14828/↩︎

  15. Kerrisk (2010); ch. 56.5↩︎

  16. Kerrisk (2010); ch. 56.5.2↩︎

  17. Kerrisk (2010); ch. 4, ch. 56.1↩︎

  18. Kerrisk (2010); ch. 4.7↩︎

  19. “The GNU C Library Reference Manual” (n.d.); ch. 11.1.2↩︎