In these days one of the major peer-to-peer networks (Edonkey-Emule) has recieved a severe blow, when one of its major servers (RazorBack 2) had been shut down by the belgian police. This causes some concern by people utilizing those technologies: the majority of people using peer-to-peer technologies and software doesn’t really know how they work, neither they do know what kind of information and data they are REALLY sharing over the network.
In this short article, I will show you some of the “technologies” involved in those project: where are you connecting? What are you sharing? What kind of info do you broadcast over the network? Are you anonymous? Are you allowing third-party content (illegal, copyrighted, whatever!) to be dropped on your PC? We are going to know a little more what is the application logic of two big known p2p clients (Emule and Bittorrent), and two less known clients (Freenet and Waste).
Emule/Edonkey
Emule p2p client adopts a “server-based” architecture. Like its ancestor (Napster), the edonkey network is composed by multiple servers that contain listings of the objects in the network. When you want to use Emule, you select a server from a list and you connect to it. Once accepted in the network, you send your shared folder info to the server (so it’s searchable and fetchable by others) and you gain access to the server’s object list: you can query it to find where the object you want are located.
The actual data (the files) are not stored on Emule’s servers. They are stored by the peers of the network. Emule server is only a resource index, it acts like a “WhitePage”: if you need A, you query the server for A. If the server has an entry in its listings about A, it reads the host where A is located and tells you to contact that host. From that point, the data excange and the A file fetching only occours between you and the host who is providing A. Files can be partitioned so that it’s possible to download/upload chunks of them from/to other peers. This speeds up the process when there are many incomplete files around the network.
Multiple requests of an object are put in a queue. Emule has a “credit system” so that people who upload more than they download get a priority in the queue. This encourages the sharing of the files (and discourages “network parasites”).
So, your requests always pass trought the server. this is a problem because: If the server fails, you cannot connect to the network (or you can choose another server …). All request can be potentially logged by the server (for example, the server can log that host X requested file A at time T). Servers can be subject of DOS attacks and other service misuse or tampering.
This is not really a distribuited approach. And if you want to remain anonymous, you can’t. (Well, you can … but the “average user” won’t).
Emule also give access to another network, with a very different approach (Kad-network), based on Kadmelia algorithm (DHT – Distribuited Hash Tables). The Kadmelia algorithm is based on “distance of nodes” concept. Every node in the KAD network gets a ID. The “distance” is calculated as the XOR (exclusive OR function) of 2 IDs.When searching for some key, the algorithm explores the network in several steps, each step approaching closer to the searched key, until the contacted node returns the value, or no more closer nodes are found. Notice that we’re not talking about “servers”: there aren’t any servers here!
In Emule implementation we calculate an hash from each file. The hash is calculated also for keywords, to enable keyword-based searches. Then searches are perfomed on a “closest hash” basis. KAD network is more robust than Edonkey one. There are no servers so no single points of failure (or single points being closed by the police 
The distribuited/decentralized approach is great because it speeds up the searching of files. Anyway, while the KAD network is slightly more secure than the edonkey centralized network (server log), you’re never sure that your actions are being tracked or logged by someone.
Bittorrent
Bittorrent client works in a rather different way. Shared files are divided in chunks, and downloaders can share the portions they’ve already downloaded with each other. The main concept on which Bittorrent relies is “crowd flash”: popularity of files exibit temporal locality. For example, when an interesting video (not copyrighted, of course) is released, all the people wants to download it in a relatively small time span. It’s like the “dugg effect” from Digg.com when one article goes to the homepage: it becomes popular and it recieves a lot of requests in a relatively short span of time, which can cause problems when the webserver is not very powerful (main flaw of centralized approach). The main focus of Bittorrent is efficient fetching.
In Bittorrent file search is not done by the software itself. To enter in the network, you must find a torrent tracker. Usually this is done using search engines. When you get the torrent tracker, you have the list of all the peers owning chunks of that file you are interested in, so you can contact them. Peers are divided in seeds and leechs (which together make a swarm). Seed are peers with the complete file who still offer it, leechs are peers with incomplete downloads of that file. Even if you have no seeds in the system, you can still get the complete file if the parts are complemented. Bittorrent enforces a “Fair Share”: you are encouraged to upload your content to other leechs, because in doing so you will be allowed to download at the best speeds! Periodically, your Bittorrent Client will check if there are any better peers to download from, and will select them if they’re available (More likely to be if you are sharing content with that peer).
A Bittorrent improvement utilizes DHT (Distribuited Hash Tables) algorithm to eliminate the tracker. This is called Bittorrent Trackerless. In this flavour, every node acts like a “lightweight tracker”. Using Kadmelia-like logic, it finds the data chunks without the need of the tracker (which stays on a web-server, so it’s kinda sort of “centralized bottleneck”). More info at this page.
Some considerations
Well, we have seen the good and bad things of these approaches. Centralized server solutions (edonkey/Emule-like) seems to be the worst: the centralization lead to a performance bottleneck and is subsceptible of logging.
But bear this in mind: the distribuited approachs based on DHT may lead to efficiency, but DOES NOT lead to anonimity and immunity. You can still be TRACKED and your traffic can be ISPECTIONED. You are not anonymous, and you are
leaving a lot of footprints around! If you want to solve these issues, you have to use a different P2P method for sharing your sensitive data. Let’s see a couple of em.
Freenet
Freenet is an application biased toward “freedom of communication of internet”. This allows anybody to publish and read informations with complete anonimity. It provides a reasonable level of security for producers and consumers of information, in a distribuited fashion. The protocol (without entering in details) is very scalable and allow to efficiently distribuite the data. Unpopular data gets deleted from the network automatically, while popular data migrates in area of the network where it’s requested.
Anonimity is enforced as the messages are routed hop-by-hop and a single node can’t tell neither the origin nor the destination of the message. Security & Tampering resistance is obtained with digital key signatures and link-level encryption. The traffic analysis is still possible, but it becomes a huge task! Document-level encryption and verification prevents the node from knowing the data that he is hosting in his space.
WASTE
If you want to create a small scale network with your friends, you should check this. WASTE is an anonymous, secure, and encryped collaboration tool which allows users to both share ideas through the chat interface and share data through the download system. It behaves like a VPN: you connect a small number of nodes (10-50) and use heavy encryption to secure all the data exchange between the peers. Unauthorized third party data encryption is very unlikely to happen. The encrypted connection is used for all the activities:transmit and receive instant messages, chat, and files, maintain the connection, and browse and search. Furthermore, you can use the SATURATE option to add random noise to the traffic to make even more difficult the traffic analysis. The nodes automatically perform load balance to determine
routes with lowest latencies (and this make the connection even more secure because messages always take different routes). Individuals can connect to WASTE networks by sharing thier RSA public key. The WASTE keypair is then generated by the system by getting random info on mouse movement.
Conclusions
So, we have seen some of the aspects of the P2P systems. Recent news suggest to move toward a more secure network, but this has some drawbacks. P2P based on secure networks are not “user friendly” as traditional P2P systems.
But maybe it’s a worthy change: do you prefer acting freely or do you prefer having a lot of data to choose from? It’s up to you
Is the net people who decide what is “cool” and what is not.
Update: I’ve discovered this interesting article which explain in details privacy issues with filesharing software.





bittorent image from wikipedia:
http://upload.wikimedia.org/wikipedia/en/3/3d/Torrentcomp_small.gif