Breaking out of a proxy jail: How proxies work
Introduction
I've written this page to explain a little more about what proxies are and how they work, as I've had a few general questions that could be more easily answered with a better background to refer to.
A normal connection: You -> Remote server
Normally when you request a web page, your computer will send a request directly to the remote server (via the internet). The remote server will then read the request, and respond with the requested page directly back to you (the client).
This looks something like this:
In the diagram, the web request has been coloured in green. Although this may seem obvious, the colours get more complicated later so I figured I would introduce the idea early on.
A proxy in the picture: You -> Proxy -> Remote server
When you make a request via a proxy your machine sends the request to the proxy instead of the remote server. If the proxy has the page in it's cache (an archive of store pages it's already downloaded) it will give you it's copy - without making a connection the the remote server. This saves on bandwidth, and allows for faster page responses. If the proxy doesn't have your page, it will pass on your request to the remote server. The remote server will then respond to the proxy with the page, which will then be forwarded by the proxy to you.
Normally, if you try to make a request directly to the server, you will be blocked by your local area network's (LAN) firewall.
This looks something like this:
The web request has been coloured green, and the failed request in red. The firewall is the large burning brick thing between you and the internet.
This wouldn't be a problem, but generally proxies only forward HTTP requests (and sometimes FTP). This means you will be unable to connect to servers using protocols like IRC for chat, or SSH.
Enter, desproxy: You -> Proxy -> Remote server (but in style)
Some proxy servers do allow forwarding of protocols other than HTTP and FTP. You just have to know how to ask them.
A normal web request uses either the GET or POST command. But some proxy servers also offer the CONNECT command. This allows you to make a normal connection to a remote server, via the proxy.
However, most applications don't know how to make use of the CONNECT command. What desproxy does, is act as a translater between a normal application, an the proxy server.
At this point, I need to explain what a SOCKS server is. Desproxy runs a proxy server of it's own, using the SOCKS protocol. SOCKS is an abbreviation of "sockets". It's given this name, because it proxies an type of network connection - not just HTTP (hence the name, HTTP proxy). Most applications that make use of a network have the ability to use a SOCKS proxy written into them. Normally, the SOCKS server wouldn't be running on the same machine as the application: it would run in place of the HTTP proxy server. But in this case, it provides an easy way to interface two applications that don't understand each other, but do understand the SOCKS proxy. It's like trying to get two people, someone who speaks English and French, and someone who speaks Spanish and French. Although they can't communicate in their native language, the can both speak in French. The SOCKS server is the French in that analogy.
This looks something like this:
The "real" connection (as opposed to just HTTP) is marked in orange, and the failed connection in red.
In this diagram, the SOCKS server sits in the middle of the first orange connection - between the client and proxy.
SOCKS via HTTP: You -> Proxy -> Home server -> Remote server
Because not all proxy servers offer the CONNECT command, usual as a security precaution, sometimes SOCKS via HTTP is required.
From the proxies point of view, the client requests a page on a server (the home server), which is responded to and can be forwarded. What is interesting, is what the request actually contains. The request includes information for the home server to make a real connection to the remote server. So, although the information passes through the proxy server it is never really aware of it as it is the home server is handling the actual connection. It's as if the proxy is being proxied.
This looks something like this:
In this diagram the web request has been marked in green, the real connection has been marked in orange, and the failed attempt in red.
The internal workings of SOCKS via HTTP are kind of complex. It's best described in a diagram.
Internal connections (not via a network, but between applications) have been marked in purple, web requests in green, the real connection in orange. Green clouds are applications and blue clouds are network connections.
Hopefully these descriptions have helped in explaining how it's possible to bypass a web proxy to achive an (effectively) full internet connection.