Making a simple HTTP webserver in C
November 29, 2021 - Jan Pieter Bruins Slot
Introduction
While reading, and working through Operating Systems: Three Easy pieces1 I started with one of the assignments for making a webserver concurrent. The webserver was graciously provided by the authors, and the authors recommend figuring stuff out yourself, and do some more in-depth research. So, I thought it would be a cool idea to implement the webserver myself, as the basis for making it concurrent as the assignment2 asks you to do, and in the process learn more about socket programming. Additionally, I thought it would be a good learning experience, to create a post of the process, and share it.
So, in this post we will go over creating a simple webserver in C, step-by-step. First, we’ll start with some background into webservers, and subsequently add more code to our program. I’ve tried to make it so that you’ll be able to figure out what you need to code before looking at the results. However, if you just want to look at a specific part or just at the end result, then check the Implementation sections throughout the article. You can also look at the resulting code in this repository.
So, let’s get to it!
Basics
First, it might be good idea to get a feel of what we’re trying to create, to investigate what it is that we actually want to make. So, let’s figure out what a webserver actually is, and does.
webserver
A webserver, and in this case on the software side is able to satisfy client requests over HTTP, and other several related protocols. Its primary function is to store, process, deliver files to that client. At minimum this is an HTTP Server, which is a piece of software that understand URLs (Universal Resource Locator) and HTTP (Hypertext Transfer Protocol).
A webserver can be either be, or a combination of, a static webserver which simply serves files “as-is”. Or, a dynamic webserver in which the webserver runs an executable file on the webserver, and the output is returned to the client. It is dynamic because the webserver updates the hosted files before sending them to the client it does this “on-the-fly”.3
HTTP
As mentioned above a webserver in our case should be able to understand the HTTP protocol. What does a protocol mean in this context? Well, it is a set of rules for communication between two computers. In this case it specifies how to transfer hyper text documents, meaning documents that are interconnected by hyperlinks. The protocol is textual and stateless. Textual because that all the commands are plain text, and you’ll be able to read and inspect it. Stateless, because that neither the client nor server remembers previous communications.4
It means on the client side the application (for instance a web browser) needs to speak the same ‘language’ as the webserver in order to communicate. The ‘language’ that is used is HTTP.
A message is constructed that is either a request or a response from either the client or the server. This message needs to be transported, and that is where TCP comes in.
TCP
HTTP presumes an underlying transport layer protocol to establish host-to-host data transfer channels, and manage the data exchange in a client-to-server or peer-to-peer networking model. The protocol that is commonly used for HTTP server is TCP (Transmission Control Protocol), but it can also be adapted to be used with for instance UDP (User Datagram Protocol). However, because of RFC 26165 that states that the transport layer should be reliable, we will be using TCP instead of UDP.67
TCP maintains communications between application processes between hosts (client and server), and they use port numbers to track sessions. HTTP, and TCP protocols are part of a suite of several multiple protocols on each layer of the request/response cycle. This suite is also known as: TCP/IP.
Internet Protocol Suite a.k.a. TCP/IP
HTTP is part of the Internet Protocol Suite and it is called an application layer protocol. The Internet Protocol Suite is a model that is commonly known as TCP/IP because of the foundational protocols that make up the Internet Protocol Suite. Namely, the Transmission Control Protocol (TCP, present on the transport layer), and Internet Protocol (IP, present on the internet layer). 8
This suite is a conceptual model, and it consists out of set of protocols used in internet, and similar computer networks. It specifies how data should be packetized, addressed, transmitted, routed and received. The model is made up of 4 abstraction layers: the application, transport, internet and link layer.
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ APPLICATION LAYER ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ http, ftp, smtp, ssh, etc. │
└──────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ TRANSPORT LAYER ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ tcp, udp, etc. │
└──────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ INTERNET LAYER ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ ipv4, ipv6, etc. │
└──────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ LINK LAYER ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ arp, mac (ethernet, wifi, etc.) │
└──────────────────────────────────┘
We stated above that: “HTTP presumes an underlying transport layer”. From the Internet Protocol Suite we can see that there are several available, and we already stated that TCP is commonly used. We can also see that the HTTP is in the application layer, next to other protocols that you might be familiar with such as FTP, Telnet, SSH, SMTP, etc. In essence HTTP is layered over TCP and uses it to transport its message data. In turn TCP is layered over IP, to make sure it ends up at the right location.
When data to the application layer is received, by which every program you are using (in the case of HTTP, a browser for instance), it talks to the transport layer through a port. Each port can be assigned to a different protocol in the application layer. In the case of HTTP this is port 80, so that TCP knows where the data is coming from.
Request / Response
As mentioned above: “HTTP is a set of rules for communication”. These rules are implemented in the request and response messages. You’re probably already familiar with its structure. The request message consist out of the following: a request line, the request header fields, an empty line, and an optional message body. In the following diagram you can see how a request and response message is built up when we access the webserver we are going to create using curl.
┌─┐ ┌─┐
└┬┘ ―――――――――――――▶ ╞ │
▔▔▔ ◀――――――――――――― └─┘
client request message server response message
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ GET / HTTP/1.1 ┃ request line ┃ HTTP/1.0 200 OK ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Host: localhost:8080 │ headers │ Server: webserver-c │
│ User-Agent: curl/7.52.1 │ │ Content-type: text/html │
│ Accept: */* │ │ │
└───────────────────────────┘ ├───────────────────────────┤
body │ <html>hello, world</html> │
└───────────────────────────┘
We can even inspect what curl is sending and receiving, and we can see that everything is just in plain text. Pretty cool!
$ curl -vs http://localhost:8080
* Rebuilt URL to: http://localhost:8080/
* Trying ::1...
* TCP_NODELAY set
* connect to ::1 port 8080 failed: Connection refused
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.52.1
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: webserver-c
< Content-type: text/html
<
<html>hello, world</html>
* Curl_http_done: called premature == 0
* Closing connection 0
When a request is made by the application layer, the message passes through the layers on one side, and back up through layer on the other side (represented by the solid line). Logically, one layer talks to the corresponding layer at the other side (represented by the dashed line).
┌─┐ ┌─┐
└┬┘ ╞ │
▔▔▔ └─┘
client server
┏━━━━━━━━━━━━━┓╷╷ ╭╮┏━━━━━━━━━━━━━┓
┃ APPLICATION ┃││ ◀---▶ ││┃ APPLICATION ┃
┗━━━━━━━━━━━━━┛││ ││┗━━━━━━━━━━━━━┛
┏━━━━━━━━━━━━━┓│▲ ▼│┏━━━━━━━━━━━━━┓
┃ TRANSPORT ┃││ ◀---▶ ││┃ TRANSPORT ┃
┗━━━━━━━━━━━━━┛││ ││┗━━━━━━━━━━━━━┛
┏━━━━━━━━━━━━━┓││ ││┏━━━━━━━━━━━━━┓
┃ INTERNET ┃││ ◀---▶ ││┃ INTERNET ┃
┗━━━━━━━━━━━━━┛▼│ │▲┗━━━━━━━━━━━━━┛
┏━━━━━━━━━━━━━┓││ ││┏━━━━━━━━━━━━━┓
┃ LINK ┃││ ◀---▶ ││┃ LINK ┃
┗━━━━━━━━━━━━━┛│╰───────╯│┗━━━━━━━━━━━━━┛
╰─────────╯
So, to recap: we’re creating an application (a webserver) that is able to receive/send plain text messages that adhere to rules of HTTP. Those messages are received through a transport layer. Our application will use TCP as this transport layer, and we need to implement that transport layer of HTTP. Both of these protocols are part of the Internet Protocol Suite (TCP/IP) and is provided by the operating system.9
Implementation
We can use the man pages to reference on how to start implementing this. Throughout this article we will be using man pages to get all the information we need to implement our webserver. The first man page we can look at is:
$ man 7 tcp
NAME
tcp - TCP protocol
SYNOPSIS
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
tcp_socket = socket(AF_INET, SOCK_STREAM, 0);
DESCRIPTION
...
A newly created TCP socket has no remote or local address and is
not fully specified. To create an outgoing TCP connection use
connect(2) to establish a connection to another TCP socket. To
receive new incoming connections, first bind(2) the socket to a
local address and port and then call listen(2) to put the socket
into the listening state. After that a new socket for each
incoming connection can be accepted using accept(2). A socket
which has had accept(2) or connect(2) successfully called on it
is fully specified and may transmit data. Data cannot be
transmitted on listening or not yet connected sockets.
...
NOTE: the 7
stands for the section
number the page is from, and you can check what section it is by typing
man man
. Typically, man pages referred to using the
notation name(section)
, since the same name can be present
in different sections. Throughout this document we will use this
notation so that you’ll be able to inspect the man pages. If you’re
trying to find a specific man page, you can use the
apropos {name}
command to find name usage through the man
pages.
From this man page we can read that we need to implement a ‘socket’ on which we can ‘listen’ for incoming connections, then we need to ‘bind’ the socket to a local address, and port. Then put the socket in a ‘listen’ state. After that we’re able to ‘accept’ incoming connections, for each accepted connection a new socket will be created, and we will be able to read and write to this socket. The following diagram gives a bit of an overview of what we need to implement.
┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━━━┓
┃ SOCKET ┃ ▶ ┃ BIND ┃ ▶ ┃ LISTEN ┃ ▶ ┃ ACCEPT ┃ ▶ ┃ READ/WRITE ┃
┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━━━┛
The tcp(7)
man page also states, as we’ve uncovered from
above, that it is layered upon ip(7)
, so let’s also take a
look at the man page for that.
$ man 7 ip
NAME
ip - Linux IPv4 protocol implementation
SYNOPSIS
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/ip.h> /* superset of previous */
tcp_socket = socket(AF_INET, SOCK_STREAM, 0);
udp_socket = socket(AF_INET, SOCK_DGRAM, 0);
raw_socket = socket(AF_INET, SOCK_RAW, protocol);
Ok, cool! This gives us some more information on how to create sockets using other protocols, like udp, and raw. We’ll keep that in mind for further on in this article. But first let’s start with setting up our project.
Setting up
First, let’s start with setting up our environment in which we want to develop. We will start very simple, just to make sure everything is working:
// ./steps/step000.c
#include <stdio.h>
int main() {
("hello, world\n");
printfreturn 0;
}
Next we need to compile it, and we will be using
gcc
.
$ gcc -Wall webserver.c -o webserver
Let’s check if every worked.
$ ./webserver
hello, world
Ok, now that we’re set up. Lets get started with implementing our socket.
Implement the socket
From what we’ve read from the man page tcp(7)
, we need
to implement a tcp socket. But, let’s also inspect what a socket is, and
we can also use the man pages for this.
$ man socket
NAME
socket - create an endpoint for communication
SYNOPSIS
#include <sys/types.h> /* See NOTES */
#include <sys/socket.h>
int socket(int domain, int type, int protocol);
DESCRIPTION
socket() creates an endpoint for communication and returns a file
descriptor that refers to that endpoint. The file descriptor returned by
a successful call will be the lowest-numbered file descriptor not
currently open for the process.
...
So, a socket creates an endpoint for communication. Furthermore, we
can read that we need to include the <sys/socket.h>
header file, and that we can create a socket endpoint to communicate by
using the function: socket(2)
. This function returns a file
descriptor which is an integer. The arguments that it accepts are:
domain
, type
, protocol
. We will
look at the individual arguments and investigate how they need to be set
in the following sections.
domain
The argument domain
is an integer that specifies a
communication domain, and it selects the protocol family that which will
be used for communication. These families are defined in
<sys/socket.h>
these families are defined as
constants in the header file, and we can reference them by their name
and use them as the domain argument.
See the man page for an overview of what kind formats you’re able to
choose. Since we’re creating a webserver that uses TCP we will be using
AF_INET
, which uses the IPv4 Internet protocols.
NOTE: Now, I wanted to know how this header file
looked like, and I was able to inspect it further by installing the
POSIX man page. These are described as: “Manual pages about using a
POSIX system for development”. And will give more information about the
specification of a C standard library for POSIX systems. It’s a
specification for a number of routines that should be available in a
basic C standard library, and it depends on how this standard C library
is implemented on a system. The most commonly used implementation on
Linux is the GNU C Library: glibc
. With
these manpages we can thus reference the specification.10
You can also find these pages on the following site: The
Open Group.
How I installed the posix man pages on a debian based distro of linux:
$ sudo apt install manpages-posix-dev
And was able to reference the man page for
<sys/socket.h>
with the following command:
$ man sys_socket.h
type
The argument type
specifies the ‘communication
semantics’. So, which socket type do we need to use here? Well, we said
we wanted to create a TCP webserver, so which of the options resembles
that? Let’s refer back to man pages of tcp(7)
, and
ip(7)
. There we can see that the valid socket type for a
TCP socket is SOCK_STREAM
. SOCK_STREAM
is a
full-duplex byte stream, and it is characterized as a type that ensures
that data is not lost or duplicated.
protocol
protocol
, according to the man page, is the particular
protocol to be used with the socket. It is common that there exists only
one protocol that will support a specific socket type. In our case where
we are choosing SOCK_STREAM
as the type
, and
as stated by the man 7 ip
man page, protocol
is the IP protocol in the IP header to be received or sent. And in this
case of creating a TCP socket were the valid value is 0
for
TCP sockets.
return value
The int socket(int domain, int type, int protocol)
function returns an integer which is an file descriptor for the
socket.11 The file descriptor is an unique
number that identifies an open file, in this case this is our socket and
just as a regular file we will be able to read and write to it. When an
error occurred it will return the value -1
, it will also
set an errno
which we can use to properly handle
errors.
errno
From the last section the return value of the socket
function the errno
will be set. So what is this
errno
? Let’s check if there is a man page about it.
$ man errno
NAME
errno - number of last error
SYNOPSIS
#include <errno.h>
DESCRIPTION
The <errno.h> header file defines the integer variable errno, which is
set by system calls and some library functions in the event of an error
to indicate what went wrong.
errno
is an integer variable that can be set to signify
what exactly has gone wrong. And in order to inspect what kind of error
was raised, we can use perror(3)
to print the error, it
will translate the error code that has been set in the variable
errno
to a human-readable form. Lets check the man page for
perror(3)
.
$ man 3 perror
NAME
perror - print a system error message
SYNOPSIS
#include <stdio.h>
void perror(const char *s);
DESCRIPTION
The perror() function produces a message on standard error describing
the last error encountered during a call to a system or library
function.
...
We can use perror(3)
and set the argument s
with a string, and it then will be appended with an error message that
corresponds with the current value of errno
.
Implementation
Now that we know how we should implement the socket(2)
function, let’s update our file with what we have discussed above. It
should resemble something like this:
// ./steps/step001.c
#include <errno.h>
#include <stdio.h>
#include <sys/socket.h>
int main() {
// Create a socket
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd == -1) {
("webserver (socket)");
perrorreturn 1;
}
("socket created successfully\n");
printf
return 0;
}
Now when we check our diagram from the section Basics we’ve now created the socket, but we need to
bind it to an address otherwise no communication can be sent or received
to this socket as we have read from the man page
tcp(7)
.
┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓
┃ SOCKET ┃ ▶ ┃ BIND ┃ ▶ ┃ LISTEN ┃ ▶ ┃ ACCEPT ┃ ▶ ┃ READ/WRITE ┃
┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛
───────────▶
So our next step will be to bind the socket to a local address and port.
Bind the socket to an address
The socket is created and exists in a namespace (an address family,
the AF
in AF_INET
stands for address family),
and we need to bind the socket to a local address, in order for the
socket to receive connections. We need to be using the
bind(2)
function for this, so let’s check out the
bind(2)
man pages on how we need to implement this.
$ man bind
NAME
bind - bind a name to a socket
SYNOPSIS
#include <sys/types.h> /* See NOTES */
#include <sys/socket.h>
int bind(int sockfd, const struct sockaddr *addr,
socklen_t addrlen);
DESCRIPTION
When a socket is created with socket(2), it exists in a name space
(address family) but has no address assigned to it. bind() assigns the
address specified by addr to the socket referred to by the file
descriptor sockfd. addrlen specifies the size, in bytes, of the address
structure pointed to by addr. Traditionally, this operation is called
“assigning a name to a socket”.
It is normally necessary to assign a local address using bind() before
a SOCK_STREAM socket may receive connections (see accept(2)).
...
We can see that bind(2)
is included in the header file,
<sys/socket.h>
. And, on success it will return zero.
It accepts as arguments: sockfd
, *addr
, and
addrln
. Let’s go over the arguments, and make sense of what
we need to do in order to implement it.
sockfd
This is the file descriptor that we’ve created with
socket(2)
in the last section. And we need to use that here
as the first argument.
addr
This defines the address structure to which we want to bind the
socket to, and it depends on the address family we’re using. So let’s
check what addr
needs to look like. We can inspect the
rules used in the name binding, by referencing the man page of the
communication domain we’re using: AF_INET
.
$ man 7 ip
From the section ‘Address format’ we can see an example. The address structure will look like the following:
struct sockaddr_in {
sa_family_t sin_family; /* address family: AF_INET */
in_port_t sin_port; /* port in network byte order */
struct in_addr sin_addr; /* internet address */
};
/* Internet address. */
struct in_addr {
uint32_t s_addr; /* address in network byte order */
};
sin_family
is always set to AF_INET
,
sin_port
contains the port in network byte order. Network
byte order represents how bytes are arranged when sending data over a
network, an order must be chosen to make sure that on both ends, the
machines interpret the numbers the same way independent of the cpu
architecture.
For example an integer value of 1 represented as 4 bytes would be
represented on ‘big endian’ machines as 0 0 0 1
, on a
‘little endian’ machines this would be 1 0 0 0
. The value
of 0 0 0 1
of the ‘big endian’ machine would then be
interpreted by the ‘little endian’ machine as the value
16777216
, and vice versa.1213
And, as such, like the man page states, we need to call
htons(3)
on the number that is assigned to the port. Like
so: htons(8080)
. It will convert the host byte order to
network byte order. See the man page at for htons(3)
for
more information.
sin_addr
contains the host interface address in network
byte order, and it is a member of the struct
named
in_addr
. The man page states that it should be one of the
INADDR_*
values. These are defined as symbolic constants in
the header file <netinet/in.h>
, or can set it by
using one of the inet_aton(3)
, inet_addr(3)
,
or inet_makeaddr(3)
library functions, to specify a
specific address. We can also inspect the POSIX man page to see how the
header file should be implemented on systems:
man netinet_in.h
.
We will make use of the symbolic constant INADDR_ANY
,
and it means ‘any address’, which translates to the
0.0.0.0
. INADDRY_ANY
is already in network
byte order, so we don’t really have to convert it. The man page advices
us to convert, so lets just implement it. We do this by calling
htonl(3)
on the address. But why are we using
0.0.0.0
, here? This is just your machine’s IP address. Your
machine will have one IP address for each network interface. When your
machine has for example Wi-Fi, and an ethernet connection, then that
machine will have two addresses, one for each interface. When we don’t
care what interface is going to be used we use the special address for
this, 0.0.0.0
which is defined in the symbolic constant
INADDR_ANY
translates to this address.
addrlen
The addrlen
argument specifies the size of the address
structure addr
in bytes. To get this we can use the
sizeof()
operator (it looks like a function, but it is an
operator like &&
, ||
, etc.). The
argument is of type socklen_t
which is an integer type, and
we can get some background on specific type by inspecting the man page
of accept(2)
.
In the original BSD sockets implementation (and on other older systems)
the third argument of accept() was declared as an int *. A POSIX.1g
draft standard wanted to change it into a size_t *C; later POSIX
standards and glibc 2.x have socklen_t *.
return value
On success the return value for bind(2)
will be zero,
when an error occurred it will return -1
, and
errno
will also be set.
Implementation
From what we have seen we’re able to implement the
bind(2)
function. First, we will create the address
structure, and then we can bind it to the socket. The updated code will
look something like this:
// ./steps/step002.c
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <sys/socket.h>
#define PORT 8080
int main() {
// Create a socket
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd == -1) {
("webserver (socket)");
perrorreturn 1;
}
("socket created successfully\n");
printf
// Create the address to bind the socket to
struct sockaddr_in host_addr;
int host_addrlen = sizeof(host_addr);
.sin_family = AF_INET;
host_addr.sin_port = htons(PORT);
host_addr.sin_addr.s_addr = htonl(INADDR_ANY);
host_addr
// Bind the socket to the address
if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
("webserver (bind)");
perrorreturn 1;
}
("socket successfully bound to address\n");
printf
return 0;
}
Note that we are typecasting addr
to the
struct
pointer struct sockaddr *
in the
argument of the bind(2)
function. Since addr
is of the type struct sockaddr_in
we need to cast it to
struct sockaddr *
. From the man page bind(2)
we can read: “The only purpose of this structure (sockaddr
)
is to cast the structure pointer passed in addr
in order to
avoid compiler warnings.” In essence what we are doing here is: whatever
addr
is pointing to, act like a sockaddr
.14
┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓
┃ SOCKET ┃ ▶ ┃ BIND ┃ ▶ ┃ LISTEN ┃ ▶ ┃ ACCEPT ┃ ▶ ┃ READ/WRITE ┃
┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛
───────────▶ ───────────▶
Referring back to our diagram, we’ve now also bound the socket to a specific address. Now we can are ready to listen for incoming connections. So lets implement that.
Listen
We’ve created a socket and bounded it to a local address, now we need
to make sure that the socket is listening for incoming connection. We do
that by using listen(2)
function. This will make the socket
available for incoming connections. Let’s see what the man pages can
show us on how to use listen(2)
.
$ man 2 listen
NAME
listen - listen for connections on a socket
SYNOPSIS
#include <sys/types.h> /* See NOTES */
#include <sys/socket.h>
int listen(int sockfd, int backlog);
DESCRIPTION
listen() marks the socket referred to by sockfd as a passive socket,
that is, as a socket that will be used to accept incoming connection
requests using accept(2).
...
As we can see, the listen(2)
function will put the
socket into ‘passive’ mode. Stream sockets are often ‘active’ or
‘passive’.
When a socket is created with the
socket(2)
function, it is set to active. This socket can then be used in theconnect(2)
function to establish a connection to a ‘passive’ socket.A passive socket can allow incoming connections by passing it to the
listen(2)
function.
In most applications that use stream sockets, the servers we will
perform the so called ‘passive socket open’, and on the client an
‘active socket open’. Since we’re creating a http webserver, and using
the listen(2)
function in order to listen for incoming
connections, the socket that we’ve created will be a passive socket, and
will be used to accept connect connections from other (active) sockets.
15
sockfd
Again, this is the file descriptor of the socket, and thus we will
use the sockfd
that we’ve created in section 3.
backlog
This integer will define how many pending connections will be queued
up for sockfd
socket, before it will be refused. For now,
we will set this to 128
. Further connection requests block
until a pending connection is accepted. So, it defines the number of
connections that are accepted, but not yet handled by the application,
until accept(2)
gets it off the queue.
From listen(2)
the ‘NOTES’ section:
...
The behavior of the backlog argument on TCP sockets changed with Linux 2.2. Now
it specifies the queue length for completely established sockets waiting to be
accepted, instead of the number of incomplete connection requests. The maximum
length of the queue for incomplete sockets can be set using
/proc/sys/net/ipv4/tcp_max_syn_backlog. When syncookies are enabled there is no
logical maximum length and this setting is ignored. See tcp(7) for more
information.
If the backlog argument is greater than the value in
/proc/sys/net/core/somaxconn, then it is silently truncated to that value;
the default value in this file is 128. In kernels before 2.4.25, this limit was
a hard coded value, SOMAXCONN, with the value 128.
...
The symbolic constant SOMAXCONN
in
<sys/socket.h>
is defined by our system
(128
in the case of Linux), and we can use it to set the
backlog
argument (man sys_socket.h
).
return value
On success, zero will be returned, on failure -1
will be
returned, and as before errno
will also set, so we can
check and handle it accordingly.
Implementation
With the above information we are able to implement the
listen(2)
function, so let’s update our code:
// ./steps/step003.c
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <sys/socket.h>
#define PORT 8080
int main() {
// Create a socket
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd == -1) {
("webserver (socket)");
perrorreturn 1;
}
("socket created successfully\n");
printf
// Create the address to bind the socket to
struct sockaddr_in host_addr;
int host_addrlen = sizeof(host_addr);
.sin_family = AF_INET;
host_addr.sin_port = htons(PORT);
host_addr.sin_addr.s_addr = htonl(INADDR_ANY);
host_addr
// Bind the socket to the address
if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
("webserver (bind)");
perrorreturn 1;
}
("socket successfully bound to address\n");
printf
// Listen for incoming connections
if (listen(sockfd, SOMAXCONN) != 0) {
("webserver (listen)");
perrorreturn 1;
}
("server listening for connections\n");
printf
return 0;
}
Referring back to our diagram, we’ve created a socket, bound it to a local address, and we’ve put the socket into ‘passive’ mode. Now we can listen for incoming connections.
┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓
┃ SOCKET ┃ ▶ ┃ BIND ┃ ▶ ┃ LISTEN ┃ ▶ ┃ ACCEPT ┃ ▶ ┃ READ/WRITE ┃
┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛
───────────▶ ───────────▶ ───────────▶
So, on to accept those connections.
Accept
Now we’re ready to make sure the socket will accept connections. We
need to use the accept(2)
function, and let’s check the man
pages again on how we need to implement this.
$ man 2 accept
NAME
accept, accept4 - accept a connection on a socket
SYNOPSIS
#include <sys/types.h> /* See NOTES */
#include <sys/socket.h>
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
DESCRIPTION
The accept() system call is used with connection-based socket types
(SOCK_STREAM, SOCK_SEQ‐ PACKET). It extracts the first connection
request on the queue of pending connections for the listening socket,
sockfd, creates a new connected socket, and returns a new file
descriptor referring to that socket. The newly created socket is not in
the listening state. The original socket sockfd is unaffected by this
call.
...
So the accept(2)
function will get the first connection
from the queue of the listening socket sockfd
. Then it will
create a new connected socket, and the file descriptor that
points to that socket will be returned. The newly created socket is
however not in a listening state, and thus the original socket
is not affected by this call and can be used to accept other
connections. When there are no pending connections when the
accept(2)
function is called, the call blocks until a new
connection arrives.
Again, let’s look at the arguments that we need to provide to accept connections.
sockfd
Like before, we will use the original socket that was created in Implement the socket and here
sockfd
is the file descriptor of the socket.
addr
The argument addr
is a pointer that refers to a
sockaddr
struct, this needs to be the address of the
original socket that we’ve created, and we need the pointer to that
struct here.
addrlen
The addrlen
is a value result argument, it points to the
size of the buffer pointed to by the argument addr
. Because
accept()
can accept multiple protocol families we need to
provide the size of the address that we are using. A pointer is used
because: “the caller must initialize it to contain the size (in bytes)
of the structure pointed to by addr
; on return it will
contain the actual size of the peer address.” The kernel then knows how
much space is available to return the socket address. Upon return from
the accept(2)
function, the value of addrlen
is set to indicate the number of bytes of data actually stored by the
kernel in the socket address structure. 16
When binding our socket (Bind the socket to an
address), we’ve already created our addrlen
variable
with the size of the sockaddr
struct, so we can just pass
it to the accept(2)
function. However, the original
variable was an int
, so we need to typecast it to
socklen_t *
to make it work.
return value
It will return a non-negative integer that is a file descriptor for
the accepted socket. On error, it will return -1
, and
errno
will be set.
Implementation
Because we want to continue accepting new connections we will put the
accept(2)
function in a continuous loop. Important to note
is that we also need to close the file descriptor we’ve created by using
accept(2)
. We can close the socket by calling the
close(2)
function.
$ man 2 close
NAME
close - close a file descriptor
SYNOPSIS
#include <unistd.h>
int close(int fd);
DESCRIPTION
close() closes a file descriptor, so that it no longer refers to any
file and may be reused. Any record locks (see fcntl(2)) held on the file
it was associated with, and owned by the process, are removed
(regardless of the file descriptor that was used to obtain the lock).
...
When we’re done with the socket we can just use it as the argument
fd
in the function close(2)
this will close
the file descriptor, so that it no longer refers to any file and may be
reused. When we update our code and implement the accept(2)
function it should resemble the following:
// ./steps/step004.c
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <sys/socket.h>
#include <unistd.h>
#define PORT 8080
int main() {
// Create a socket
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd == -1) {
("webserver (socket)");
perrorreturn 1;
}
("socket created successfully\n");
printf
// Create the address to bind the socket to
struct sockaddr_in host_addr;
int host_addrlen = sizeof(host_addr);
.sin_family = AF_INET;
host_addr.sin_port = htons(PORT);
host_addr.sin_addr.s_addr = htonl(INADDR_ANY);
host_addr
// Bind the socket to the address
if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
("webserver (bind)");
perrorreturn 1;
}
("socket successfully bound to address\n");
printf
// Listen for incoming connections
if (listen(sockfd, SOMAXCONN) != 0) {
("webserver (listen)");
perrorreturn 1;
}
("server listening for connections\n");
printf
for (;;) {
// Accept incoming connections
int newsockfd = accept(sockfd, (struct sockaddr *)&host_addr,
(socklen_t *)&host_addrlen);
if (newsockfd < 0) {
("webserver (accept)");
perrorcontinue;
}
("connection accepted\n");
printf
(newsockfd);
close}
return 0;
}
And when we check our diagram again, we can that we’ve implemented
the accept(2)
function.
┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓
┃ SOCKET ┃ ▶ ┃ BIND ┃ ▶ ┃ LISTEN ┃ ▶ ┃ ACCEPT ┃ ▶ ┃ READ/WRITE ┃
┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛
───────────▶ ───────────▶ ───────────▶ ───────────▶
Now, and we’re now ready to starting reading and writing to the socket.
Read
If we recall back from man page of socket(2)
we read the
following:
...
A connection to another socket is created with a connect(2) call. Once
connected, data may be transferred using read(2) and write(2) calls or some
variant of the send(2) and recv(2) calls. When a session has been completed
a close(2) may be performed.
...
We can read and write by using the read(2)
and
write(2)
functions, or some variant of send(2)
and recv(2)
calls. From the man page of
send(2)
we can read that send(2)
provides
extra flags that we might use. In this case we won’t be using those, and
as such we can stick with read(2)
and
write(2)
. 17
Because we’ve setup a connection between the client and the server we
can read the request of the client. Since, we got a file descriptor with
the accept(2)
function, we will be able to use the
read(2)
function to read the data that has been sent by the
client. Let’s check the man pages on how we’re able to use the
read(2)
function.
$ man 2 read
NAME
read - read from a file descriptor
SYNOPSIS
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
DESCRIPTION:
read() attempts to read up to count bytes from file descriptor fd into
the buffer starting at buf.
...
The function read(2)
will read up to count
bytes from the file descriptor fd
into the buffer
*buf
. On success it will return the number of bytes that
were read, and the file position is advanced by this number. On error
-1
is returned and errno
will be set as
well.
The file position keeps track of where in the file the next character is to be read or written. This is ‘offset’ being recorded by the kernel. 18 On all POSIX.1 systems, the file position is an integer representing the number of bytes from the beginning of the file. The file position is normally set to the beginning of the file when it is opened, and each time a character is read or written, the file position is incremented sequentially. 19
fd
This argument fd
needs to be the file descriptor, this
is the new socket that was returned by the accept(2)
function.
buf
The buf
argument needs to be a pointer to the address of
the memory buffer that you want the contents of the file descriptor
fd
to be read into as a temporary storage. This buffer must
be at least count
bytes long. In our case we will be
creating a buffer that will be an array of the type char
.
And because the array name is converted to pointer, we can use the
variable name of the buffer as the argument.
count
We need to provide how many bytes we want to ready from the file
descriptor fd
into the buffer. This depends on how large of
a buffer you’re creating. In this example we’ll create an array of 2048
characters.
The type size_t
is a unsigned integer type. It is
commonly used by the standard library to represent sizes and counts. Its
specific size is platform dependent.
return value
The return value is the number of bytes that were read into the
buffer, or 0 if the end of the file has been reached. On error, -1 is
returned, and errno
is set appropriately. The
ssize_t
is a ‘signed’ integer type, again it is commonly
used by the standard library to represent sizes and counts, and it holds
the byte count of what was read into the buffer.
Implementation
First, we need to implement the buffer, we will do this as soon as
the program begins. For now, we will create an array of the type
char
with a size of 2048
. (each
char
is 1 byte). Don’t forget to include the header file
uninstd.h
.
You’ll also be able to check the number of bytes that are read, and continue reading when the limit of the buffer is reached. However for now we’ll keep it simple.
Additionally, because the requests is likely sent using the HTTP
protocol. We can expect a certain format of the request, as we’ve
uncovered that in the beginning. Here you’ll be able to inspect the
contents of the request and return data based on the request, like
return specific requested html file, handling GET
,
POST
requests, as well as error handling for when the
request wasn’t using the HTTP protocol for instance, etc.
// ./steps/step005.c
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <sys/socket.h>
#include <unistd.h>
#define PORT 8080
#define BUFFER_SIZE 1024
int main() {
char buffer[BUFFER_SIZE];
// Create a socket
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd == -1) {
("webserver (socket)");
perrorreturn 1;
}
("socket created successfully\n");
printf
// Create the address to bind the socket to
struct sockaddr_in host_addr;
int host_addrlen = sizeof(host_addr);
.sin_family = AF_INET;
host_addr.sin_port = htons(PORT);
host_addr.sin_addr.s_addr = htonl(INADDR_ANY);
host_addr
// Bind the socket to the address
if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
("webserver (bind)");
perrorreturn 1;
}
("socket successfully bound to address\n");
printf
// Listen for incoming connections
if (listen(sockfd, SOMAXCONN) != 0) {
("webserver (listen)");
perrorreturn 1;
}
("server listening for connections\n");
printf
for (;;) {
// Accept incoming connections
int newsockfd = accept(sockfd, (struct sockaddr *)&host_addr,
(socklen_t *)&host_addrlen);
if (newsockfd < 0) {
("webserver (accept)");
perrorcontinue;
}
("connection accepted\n");
printf
// Read from the socket
int valread = read(newsockfd, buffer, BUFFER_SIZE);
if (valread < 0) {
("webserver (read)");
perrorcontinue;
}
(newsockfd);
close}
return 0;
}
Write
Now that we are able to read the message that the client has sent to us, we also want to relay something back to the client again. Because we’re implementing a webserver, we’re going to return a simple webpage. Again, because we are implementing a HTTP webserver we need to adhere to the HTTP protocol. That means that we need to structure our response to these rules.
We will be using the same socket the accept(2)
function
that we’ve just read from. Because this socket is a file descriptor we
will, just as with read(2)
, be able to write to this socket
using the write(2)
function.
$ man 2 write
NAME
write - write to a file descriptor
SYNOPSIS
#include <unistd.h>
ssize_t write(int fd, const void *buf, size_t count);
write() writes up to count bytes from the buffer starting at buf to the
file referred to by the file descriptor fd.
...
The function write(2)
will write bytes up to
count
from the buffer pointed to by buf
to the
file referenced by the file descriptor fd
. On success it
will return number of bytes written. On error -1
is
returned and errno
will be set appropriately.
fd
As mentioned above the argument fd
is the file
descriptor that references the socket we’ve created by calling the
accept(2)
function. This is the also the same file
descriptor from which we read the request with the function
read(2)
.
buf
The buf
arguments need to be a pointer to what we want
to write to as a response. For now we will use a pre-defined string that
we will add to the server. We can eventually extend this webserver to
serve actual html files. But for now we will add the following:
char resp[] = "HTTP/1.0 200 OK\r\n"
"Server: webserver-c\r\n"
"Content-type: text/html\r\n\r\n"
"<html>hello, world</html>\r\n";
Note, that the string is formatted following the HTTP protocol. (See:
Basics) We start with the request line, followed
by the headers, and it ends with the body. The escape code
\r\n
is used to separate the different sections of the
request. The escape code \r
stands for carriage return and
will set the cursor at the beginning of the line, and \n
for new line and will move the cursor to a new line.
count
The argument count
is the number of bytes we need write
to the file fd
from buffer buf
. Because we
want to write the complete contents we need to know how many bytes there
are in the buffer. We do that by using strlen()
.
$ man 3 strlen
NAME
strlen - calculate the length of a string
SYNOPSIS
#include <string.h>
size_t strlen(const char *s);
DESCRIPTION
The strlen() function calculates the length of the string pointed to by
s, excluding the terminating null byte ('\0').
RETURN VALUE
The strlen() function returns the number of characters in the string
pointed to by s.
So, we can provide the s
argument and we will get the
number of characters we provided in the string point to by
s
.
return value
The return value of the write(2)
function will be the
number of bytes written to the file and its type is
ssize_t
. From before we’ve noted that size_t
was used to represent sizes and counts, this version is the signed
version of size_t
, which means that it can hold values less
than zero. So, in this case a value of less than zero, -1
is means an error occurred and errno
will be set
appropriately. When the return value is zero, it indicates that nothing
was written. It will not mean that an error occurred when this number is
smaller than the number of bytes that were requested.
Implementation
We will implement this first by creating the buffer, and we will
follow the instructions we’ve mentioned above. Because, we are done with
writing to the newly created socket we need to also close it. We’ve
already called close(2)
in the last section when we used
the read(2)
function. Now before we close it though, we
want to write to it. So put the write(2)
function above the
close(2)
function.
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>
#define PORT 8080
#define BUFFER_SIZE 1024
int main() {
char buffer[BUFFER_SIZE];
char resp[] = "HTTP/1.0 200 OK\r\n"
"Server: webserver-c\r\n"
"Content-type: text/html\r\n\r\n"
"<html>hello, world</html>\r\n";
// Create a socket
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd == -1) {
("webserver (socket)");
perrorreturn 1;
}
("socket created successfully\n");
printf
// Create the address to bind the socket to
struct sockaddr_in host_addr;
int host_addrlen = sizeof(host_addr);
.sin_family = AF_INET;
host_addr.sin_port = htons(PORT);
host_addr.sin_addr.s_addr = htonl(INADDR_ANY);
host_addr
// Bind the socket to the address
if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
("webserver (bind)");
perrorreturn 1;
}
("socket successfully bound to address\n");
printf
// Listen for incoming connections
if (listen(sockfd, SOMAXCONN) != 0) {
("webserver (listen)");
perrorreturn 1;
}
("server listening for connections\n");
printf
for (;;) {
// Accept incoming connections
int newsockfd = accept(sockfd, (struct sockaddr *)&host_addr,
(socklen_t *)&host_addrlen);
if (newsockfd < 0) {
("webserver (accept)");
perrorcontinue;
}
("connection accepted\n");
printf
// Read from the socket
int valread = read(newsockfd, buffer, BUFFER_SIZE);
if (valread < 0) {
("webserver (read)");
perrorcontinue;
}
// Write to the socket
int valwrite = write(newsockfd, resp, strlen(resp));
if (valwrite < 0) {
("webserver (write)");
perrorcontinue;
}
(newsockfd);
close}
return 0;
}
And that concludes the implementation of all the steps from the section Basics.
┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓
┃ SOCKET ┃ ▶ ┃ BIND ┃ ▶ ┃ LISTEN ┃ ▶ ┃ ACCEPT ┃ ▶ ┃ READ/WRITE ┃
┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛
───────────▶ ───────────▶ ───────────▶ ───────────▶ ───────────▶
All what is left to do is to compile and run the program!
Let’s run it!
We’ll go to the command line and run the following command:
$ gcc -Wall webserver.c -o webserver
$ ./webserver
Now, you should be able to open your browser and check: http://localhost:8080, and you should be greeted by the ‘hello, world’ message. Let’s also try to implement some logging to the terminal, so we can see who is making the request and what the request was.
Client address
In order to get the client address information we can use the
function getsockname(2)
. Let’s see what the man page
says:
NAME
getsockname - get socket name
SYNOPSIS
#include <sys/socket.h>
int getsockname(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
DESCRIPTION
getsockname() returns the current address to which the socket sockfd is
bound, in the buffer pointed to by addr. The addrlen argument should be
initialized to indicate the amount of space (in bytes) pointed to by
addr. On return it contains the actual size of the socket address.
The returned address is truncated if the buffer provided is too small;
in this case, addrlen will return a value greater than was supplied to
the call.
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and errno is
set appropriately.
sockfd
So from the man page we read that we can get the current address to
which the socket sockfd
is bound. We got a new connected
socket with the client from accept
that is called
newsockfd
so we can use that in the
getsockname(2)
function.
addr
This argument should be a pointer to a struct sockaddr
structure, and should looking familiar since we’ve used it before when
we used bind(2)
. So we will use the same structure.
addrlen
Like the addr
argument we’ve used in the
bind(2)
function, this argument will also be a pointer to a
socklen_t
struct. Again we will be doing the same as we did
with bind(2)
.
Implementation
When we’ve implemented the getsockname(2)
we will be
able to relay the client’s ip address and port. These are available from
the sin_addr
and sin_port
fields of the
struct sockaddr_in
structure.
We need to convert them to a string representation, that we can use
to print. For that we will use the inet_ntoa(3)
function
(Internet host address, given in network byte order, to a string in IPv4
dotted decimal notation), and the ntohs(3)
(network byte
order to short integer byte order)
// ./steps/step007.c
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>
#define PORT 8080
#define BUFFER_SIZE 1024
int main() {
char buffer[BUFFER_SIZE];
char resp[] = "HTTP/1.0 200 OK\r\n"
"Server: webserver-c\r\n"
"Content-type: text/html\r\n\r\n"
"<html>hello, world</html>\r\n";
// Create a socket
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd == -1) {
("webserver (socket)");
perrorreturn 1;
}
("socket created successfully\n");
printf
// Create the address to bind the socket to
struct sockaddr_in host_addr;
int host_addrlen = sizeof(host_addr);
.sin_family = AF_INET;
host_addr.sin_port = htons(PORT);
host_addr.sin_addr.s_addr = htonl(INADDR_ANY);
host_addr
// Create client address
struct sockaddr_in client_addr;
int client_addrlen = sizeof(client_addr);
// Bind the socket to the address
if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
("webserver (bind)");
perrorreturn 1;
}
("socket successfully bound to address\n");
printf
// Listen for incoming connections
if (listen(sockfd, SOMAXCONN) != 0) {
("webserver (listen)");
perrorreturn 1;
}
("server listening for connections\n");
printf
for (;;) {
// Accept incoming connections
int newsockfd = accept(sockfd, (struct sockaddr *)&host_addr,
(socklen_t *)&host_addrlen);
if (newsockfd < 0) {
("webserver (accept)");
perrorcontinue;
}
("connection accepted\n");
printf
// Get client address
int sockn = getsockname(newsockfd, (struct sockaddr *)&client_addr,
(socklen_t *)&client_addrlen);
if (sockn < 0) {
("webserver (getsockname)");
perrorcontinue;
}
// Read from the socket
int valread = read(newsockfd, buffer, BUFFER_SIZE);
if (valread < 0) {
("webserver (read)");
perrorcontinue;
}
("[%s:%u]\n", inet_ntoa(client_addr.sin_addr),
printf(client_addr.sin_port));
ntohs
// Write to the socket
int valwrite = write(newsockfd, resp, strlen(resp));
if (valwrite < 0) {
("webserver (write)");
perrorcontinue;
}
(newsockfd);
close}
return 0;
}
Get request headers
We’ve read the request from the client from the socket with
read(2)
, now we can print the contents of the request
message from the buffer
, and we should see how the request
of the client looks like. Since the client is sending the request
adhering the HTTP protocol, we can use sscanf(3)
to parse
the request. Since the request line is the first line of the request,
and is structured as follows:
<method> <path> <version>
, we can use
sscanf(3)
to parse the request line, and get the method,
path and version.
Let’s check the man page for sscanf(3)
:
NAME
scanf, fscanf, sscanf, vscanf, vsscanf, vfscanf - input format conversion
SYNOPSIS
#include <stdio.h>
int sscanf(const char *str, const char *format, ...);
DESCRIPTION
...
The scanf() function reads input from the standard input stream
stdin, fscanf() reads input from the stream pointer stream, and
sscanf() reads its input from the character string pointed to by
str.
...
str
This argument is a pointer to a character string that contains the
input data. We will thus use the buffer
that we’ve read
from the socket.
format
The format
argument is a string that specifies the
format of the input data. And we can use string formatting to parse the
request line.
return value
On success, the number of input items successfully converted and assigned is returned. On failure, the return value is EOF, and no characters are assigned to the arguments.
Implementation
// webserver.c
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>
#define PORT 8080
#define BUFFER_SIZE 1024
int main() {
char buffer[BUFFER_SIZE];
char resp[] = "HTTP/1.0 200 OK\r\n"
"Server: webserver-c\r\n"
"Content-type: text/html\r\n\r\n"
"<html>hello, world</html>\r\n";
// Create a socket
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd == -1) {
("webserver (socket)");
perrorreturn 1;
}
("socket created successfully\n");
printf
// Create the address to bind the socket to
struct sockaddr_in host_addr;
int host_addrlen = sizeof(host_addr);
.sin_family = AF_INET;
host_addr.sin_port = htons(PORT);
host_addr.sin_addr.s_addr = htonl(INADDR_ANY);
host_addr
// Create client address
struct sockaddr_in client_addr;
int client_addrlen = sizeof(client_addr);
// Bind the socket to the address
if (bind(sockfd, (struct sockaddr *)&host_addr, host_addrlen) != 0) {
("webserver (bind)");
perrorreturn 1;
}
("socket successfully bound to address\n");
printf
// Listen for incoming connections
if (listen(sockfd, SOMAXCONN) != 0) {
("webserver (listen)");
perrorreturn 1;
}
("server listening for connections\n");
printf
for (;;) {
// Accept incoming connections
int newsockfd = accept(sockfd, (struct sockaddr *)&host_addr,
(socklen_t *)&host_addrlen);
if (newsockfd < 0) {
("webserver (accept)");
perrorcontinue;
}
("connection accepted\n");
printf
// Get client address
int sockn = getsockname(newsockfd, (struct sockaddr *)&client_addr,
(socklen_t *)&client_addrlen);
if (sockn < 0) {
("webserver (getsockname)");
perrorcontinue;
}
// Read from the socket
int valread = read(newsockfd, buffer, BUFFER_SIZE);
if (valread < 0) {
("webserver (read)");
perrorcontinue;
}
// Read the request
char method[BUFFER_SIZE], uri[BUFFER_SIZE], version[BUFFER_SIZE];
(buffer, "%s %s %s", method, uri, version);
sscanf("[%s:%u] %s %s %s\n", inet_ntoa(client_addr.sin_addr),
printf(client_addr.sin_port), method, version, uri);
ntohs
// Write to the socket
int valwrite = write(newsockfd, resp, strlen(resp));
if (valwrite < 0) {
("webserver (write)");
perrorcontinue;
}
(newsockfd);
close}
return 0;
}
Conclusion / Next Steps
And that concludes our implementation of a simple HTTP server. We’ve covered the basics of implementing a simple HTTP web server, doing so we’ve also learned a bit about TCP/IP, HTTP, socket programming, and system calls. The resulting code gives you a basis to start implementing a more complex HTTP webserver that can handle multiple concurrent connections, and that can serve html files from the filesystem based on the path of the request.
References
https://github.com/remzi-arpacidusseau/ostep-projects/tree/master/concurrency-webserver↩︎
https://github.com/remzi-arpacidusseau/ostep-projects/tree/master/concurrency-webserver#http-background↩︎
https://tools.ietf.org/html/draft-newman-network-byte-order-01↩︎
The terms derive from Jonathan Swift’s 1726 satirical novel Gulliver’s Travels, in which the terms refer to opposing political factions who open their boiled eggs at opposite ends. [Kerrisk (2010); ch. 59.2]↩︎