Wikipedia defines computer science as the systematic study of algorithmic processes that describe and transform information. But for me, it is simply as F-U-N!
CSI: Computer Science & I
Bytes and Pieces of Me
Yet another blog of mine - this time it logs not just errors but also learnings as I master the world of computer science. And yes, this is getting so serious.
Look Up! It’s a Chord!
The introduction of the paper, Looking Up Data in P2P Systems [1], presents several features why P2P systems as decentralized facilities become attractive. These reasons include minimal efforts to start the system, utilizations of the tremendous computation and storage resources on computers across the Internet, and fault-tolerant. Thus, without any centralized servers or hierarchy, how can these be achieved and solved the lookup problem.
The research to address the lookup problem leads to the implementation of distributed hash tables where in each node that contains the IP address of its neighbors where it can be retrieved by single hash function.
Before DHT, there were several implementations to address the critical common problem in P2P systems. Napster uses a central database to map the file location. Scalable Domain Name System uses hierarchy wherein traversal of path down to the node containing the desired data is happening. However, these schemes post resilience problems. Thus, asymmetric is favorable since it allow nodes to self-organize and since no node is more important than any other node as far as the lookup process if concerned. These asymmetric methods were discussed in the previous blog post.
The next section of the paper discusses the DHT in detail. Naming process of the published file includes conversion of the name to a numeric key using hash functions such as SHA-1. This file shall be stored at the responsible node(s) for that key. Thus, in order for the reader to retrieve the published file, he needs to obtain its name, convert it to a key and call lookup(key).
Issues should be addressed in implementing DHTs. These are load-balancing, where IDs should be close to the key in the ID space, forwarding, distance or closeness function and adaptive rebuilding of routing tables. More challenges are presented in designing the lookup algorithms at the latter part of the paper – operation costs, fault tolerance and concurrent changes, proximity, malicious nodes and efficient use of indexing and keyword search.
The following routing algorithm are then suggested to support DHTs. These are skiplist-like routing such as Chord, tree-like data structure traversal such as Tapestry and Kademlia, and routing in multiple d-dimensional cartesian coordinate space such as CAN.
To conclude, designing the lookup algorithm considers lots of factors. However, in my opinion, the algorithm should prioritize scalability or the adaptability of the system to stabilize dynamic network structures. Computational and operational costs will be negligible as computers become more capable of solving complex functions.
[1] Looking Up Data in P2P Systems, Hari Balakrishnan, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica.
Possibly Related Posts:
- Chord-ially Inviting
- Relationship Between ASes: It’s Complicated
- Internalizing Interdomain Internet Routing
- Fly, DARPA, Fly!
- Does The End Justify The Means?
Leave a Reply
Any fool can write code that a computer can understand. Good programmers write code that humans can understand.
- M. Fowle