NATWork

Kiran Ghag,Swapnil Kadam
v1.0, 6th August 2002

Index

  1. Introduction

  2. Problem Definition

  3. Network Address Translation made by Proxy Servers

    1. Fetching a page from webserver

    2. Fetching a page from a proxy web server

  4. Idea behind NATWork

  5. Known Bugs

  6. Download Source

  7. Authors


Introduction

NATWork is a piece of code which allows a whole LAN to share internet connection through a Application layer gateway. NATWork is written entirely in 'C' on Linux platform. It may as well run on other compatible platforms.

NATWork runs on mutli homed machine. One of the interface is connected to internet while other belongs to local network. NATWork listens for incoming HTTP requests for internet web servers. It connects to internet on behalf of requesting clients and fetches the desired documents.

The process is not much different compared to traditional proxies but implementation follows a bit different approach as explained below.

Top

Problem Definition

Currently IP version 4 is used on internet. IPv4 uses 32 bit addresses. Theoratically, this means that there can be total 2^32 (4294967296) hosts on the internet. But out of this, many addresses are reserved for special purposes (broadcast, private networks etc...). Also, addresses are assigned in a continous slot which may not be utiised completely by a company. Such unutilised addresses cannot be assigned individually to others, as this requires cumbersome and inefficient routing procedures. And with number of PCs on the internet increasing rapidly, we will soon be out of IP addresses that can be assigned.

HyperText Transfer Protocol (HTTP) is used on internet to transfer webpages from a web server (e.g. www.kiranghag.com) to a client (web browser). HTTP belongs to application layer of Internet Protocol (IP). Naturally, any machine that is connected to internet and wants to retrieve webpages from a server, requires a distinct IP address. With help of the above paragraph, we can readily observe that there is a limit to number of clients that can be hooked to the internet.

Permanent solution to this problem is to use IPv6. But the switchover from IPv4 to IPv6 will take time as they are not directly compatible at various lavels. Thre are many temporary solutions which help us to delay the moment that shortage starts. Network Address Translation (NAT) is one of them.

Top

Network Address Translation made by Proxy Servers

Network Address Translation (NAT) works by aggregating more than one IP hosts under a single IP address. NAT can be implemented with either a hardware device or a software running on a multi-homed machine.

(Note: NAT and proxy servers are not limited to use of web proxying. It is used by many other applications like FTP and TELNET proxies. Discussion here is centered around Web proxying since NATWork is a web proxy.)

NAT is often used by application level gateways (Proxy Servers). Such servers allow clients on a LAN to avail application level services from internet hosts using just a single IP address. For example, a HTTP proxy server allows different internal clients to connect to same (or different) web servers and fetch webpages. The clients are assigned private IP addresses which cannot be used on the internet. But to the webservers, it appears as if all requests came from a single public address (assigned to the Proxy server).

Various cases are now possible. viz.

The proxy server is responsible for separating data streams of different clients. To understand how it is done, we first see how exactly a page is fetched in a ideal environment.

Top

Fetching a page from webserver

Typical Packet Exchange between web server and client

The process is similar to any typical TCP/IP connection. Following steps are carried out after a user types a URL in the browser's address bar or clicks any link.

  1. The browser first identifies the server name and the desired document to be retrieved
  2. DNS resolution is performed to obtain IP address of the server
  3. TCP connection is initated using following information
    1. IP Address of the current machine
    2. IP Address of the webserver (taken from address bar or HTML HREF tag within current document
    3. A randomly chosen TCP Port for the webserver to communicate back to this browser process
    4. Destination TCP port (HTTP servers typically run on port number 80)
  4. The destination web server participates in the TCP handshake and completes it to extablish a TCP data stream
  5. Once the connection is established, the above mentioned tupple of 4 entities uniquely identifies the stream on internet
  6. Now the browser process sends HTTP Request in the form which contains the filename and other protocol specific information
  7. The server process parses the request, idntifies the filename and checks whether it is available to serve. If yes, then the content of the files are transferred using the established connection
  8. After the file is retrieved, the browser formats the HTML document and presents the output into browser window to the user
  9. The browser may now request another file that belongs to the same server or terminate the connection
  10. Here the fact that a connection is identified using a 4 valued tupple is quite important and that forms the basis of Network Address translation made by HTTP Proxy servers.
Top

Fetching a page from Proxy Webservers

When a HTTP proxy server is installed on the network a small change is required to be made into the browser's configuration. The browser must be told the IP address of the proxy server and the port number on which the server process is listening. Once this is done, the browser sends all HTTP request on the specified IP and Port pair, no matter where its is destined for.

The process of retrieving document using a proxy server is outined below

  1. The client browser opens a connection to the proxy server on the known listening port. This information is specified by the user or system administrator in the browser configuration panel. With this request, source IP and (random) TCP source port.
  2. The proxy server accepts the connection and starts receiving data.
  3. Once the connection is accepted by the proxy server, the browser sends HTTP GET request alongwith the URL of desired file and other HTTP Request headers.
  4. The proxy server parses this information and identifies the target host and filename.
  5. Now the proxy server is ready to connect with the target host with following information
    1. Source IP Address: This is of the proxy machine itself
    2. Destination IP Address: This is obtained from the client HTTP headers in the TCP data payload
    3. Source TCP Port: This is changed by the proxy server. Original port number present on the client's frame is stored in the table explained below.
    4. Destination TCP Port (80 since target is HTTP Server)
  6. The proxy server opens a connection to the web server and sends the same HTTP GET request (obtained from the client's connection) to the server.
  7. The web server again parses the request and sends corresponding response
  8. The Proxy server recieves the response and diverts it to corresponding client process

Now this last step needs explanation. Critical requirement here is to correctly determine which file/datagram belongs to which server. Difficulties arise due to many reasons including but not limited to follows...

To solve this problem the proxy server maintains a State Table (or linked list) containing following fields

Source IP Original Source Port New Source Port Destination Server IP

The above table can help proxy server to correctly identify the intranet host to which an incoming packet (from the internet) belongs. When a packet comes in from a web server, the proxy server checks the destination (from web server's viewpoint) port. This port number and web server IP combination is lookup in 3rd and 4th field of the above list. If found then the packet is delieverd to corresponding IP address and port combination from the 1st and 2nd column of the same row.

Let's see how the conversion is handled with example. Here's a sample network configuration

Case: Host 192.168.0.1 wishes to fetch a page (home.htm) from www.kiranghag.com (Intranet IP: 63.1.1.54 / Internet IP: 200.1.1.18). The proxy web server for Intranet (192.168.0.x) is 192.168.0.254 listening at port 8080.

1. Client opens connection to proxy server and sends a HTTP request packet similar to following

Source IP Destination IP

Source Port

(Randomly Chosen)

Destination Port TCP Data Payload (Simplified)
192.168.0.1 192.168.0.254 3456 8080

GET /home.htm

Host: http://www.kiranghag.com

2. The proxy server determines the hostname and filename from the tcp data payload. It also resolves the hostname to IP address using its DNS server.

3. The proxy server assigns a new port number to this connection (say 5678) and creates a new entry in the State Table as follows

Source IP Original Source Port New Source Port Destination Server IP
192.168.0.1 3456 5678 63.1.1.54

4. Proxy server sends the HTTP Request to www.kiranghag.com

Source IP Destination IP Source Port Destination Port TCP Data Payload (Simplified)
200.1.1.18 63.1.1.54 5678 80

GET /home.htm

Host: http://www.kiranghag.com

5. The web server sees that it has the requested file. It responds with the data

Source IP Destination IP Source Port Destination Port TCP Data Payload (truncated)
63.1.1.54 200.1.1.18 80 5678

<html><head><title>Kir@Net - Kiran on the Web</title><meta http-equiv="Content-type" content="text/html; charset=iso-8859-1"></head><body bgcolor="#FFFFFF" text="...

6. The proxy server looks up the pair (5678, 63.1.1.54) in the 3rd and 4th column of the state table. The row tells that the packet has to be diverted to the host 192.168.0.1 on port 3456. Above packet is now sent as follows

Source IP Destination IP Source Port Destination Port TCP Data Payload (truncated)
192.168.0.254 192.168.0.1 8080 3456

<html><head><title>Kir@Net - Kiran on the Web</title><meta http-equiv="Content-type" content="text/html; charset=iso-8859-1"></head><body bgcolor="#FFFFFF" text="...

In this way the transaction is complete. The client's browser now reads the HTML code and formats the stream for he user.

Modifications illustrated above are usually made at the IP and TCP headers of the packets moving to/from the proxy server's interfaces. Also some modifications are made in the TCP data payload to remove proxy server specific header fields. This usually requires a kernel level implementation of proxy server.

Top

Idea Behind NATWork

NATWork differes due to the fact that it is written as a User Space code. This implies that the code is written using library routines provided by the kernel. The code does not modify the kernel in any way. Kernel cannot find any difference between a normal networking program (e.g. browser, telnet) and NATWork.

But writing the code in user space has certain restrictions. The program cannot modify any packet headers directly. If the code generates any custom packets, they do not go through kernel tables. This causes a problem when the program opens a socket to a host with arbitary source port number. The replies are given to both kernel and the program. Now since the kernel is not aware of the process that belongs to the incoming packet's port number, it resets the connection. Obviously the user space program cannot be used to write programs like True Network Address Translators.

NATWork overcomes this problem in a simple but interesting way. Instead of translating addresses on each packet NATWork has been designed as a stream connector.

When a client connects to NATWork, it forks a new process to do following jobs

Here each process is aware of the incoming client connection. Also the kernel is aware of the process listening on the new port used to make internet connection. This is possible because the forked process has the connected TCP byte stream and has opened a new legal connection to the server. The new process just moves tcp payload to and fro from the two byte streams. In this sense the table management is entirely shifted to the kernel.

The data flow diagram of the same can be drawn as shown below

Top

Known Bugs

The code was written as a small project for college submission hence the code is useful just to demonstrate the concept and limited use. Authors are not responsible for damage in any form caused by the use.

Since the code is written in user space it is not recommended to be run on high traffic, production servers. The code is not checked for thread-safeness. Some problems were faced with Konquerer on Redhat Linux related to web submission using POST request.

Top

Download Source

NATWork.tgz

Top

Authors

Main Idea by: Kiran Ghag

Coding by: Swapnil Kadam