1、PDF外文:http:/ 中文 3957 字 出处: SNCNW 2003, Swedish National Computer Networking Workshop, Stockholm. 2003 英文原文: A Distributed Instant Messaging Architecture based on the Pastry Peer-To-Peer Routing Substrate Henrik Lundgren, Richard Gold, Erik Nordstr om, Mattias Wiggberg Department of Infor
2、mation Technology, Uppsala University, Sweden. ABSTRACT Current Instant Messaging systems rely on archi-tectures that include central, or only partly dis-tributed, server(s). Although some systems can ex-change messages peer-to-peer, the user registration and lookup are still based on a centra
3、lized solution. We have implemented a fully distributed instant mes-saging system by utilizing the Pastry peer-to-peer routing substrate. We use the Pastry object inser-tion functionality to insert user information when a user joins the network and a novel search engine to effectively perform distri
4、buted user lookup. Cur-rently, our system runs in the FreePastry simulator and supports joining/leaving the network, searching for users and message exchange. 1. INTRODUCTION The increasing availability of cable and DSL services has made it possible for users to be continuously connected to t
5、he Internet. Users now want to announce their presence to their friends so that they can “meet” online and communi-cate. E-mail is still the common way to exchange messages on the Internet. It is a store-and-forward system where mes-sages are stored at intermediate nodes (mail servers) and only retr
6、ieved at user request. With increased user pres-ence, Instant Messaging (IM) is gaining popularity. IM sys-tems deliver messages, as the name implies, instantly i.e., messages are sent directly to the recipient without any in-termediate storage and thus allow for real-time text-based communication.
7、There exist several IM systems today and depending on their architectures, messages can be forwarded via dedicated servers or, preferably, directly between the communicating parties (peer-to-peer). Table 1 shows a side by side com-parison of the most common IM systems today. Among all IM systems, no
8、ne is both fully distributed and decentral-ized in terms of registration, lookup and message delivery. Although some of them, like Jabber and IRC (with DCC), have some of the desired characteristics, they all suffer from a single point of failure where a malfunction will close the service, either by
9、 making it impossible to find clients or de-liver messages. Peer-to-peer systems provide powerful platforms for a va-riety of decentralized services, such  
10、; 2 as network storage and content distribution. We have designed a distributed instant messaging archi-tecture (DIMA) and implemented a proof-of-concept pro-totype. The DIMA system runs on top of Pastry 4 and includes a search engi
11、ne for the peer-to-peer network. With our system it is possible to perform all necessary operations (join/leave, lookup, message exchange) without any central-ized servers. Table 1: Comparison of registration, lookup and message exchange mechanisms in different IM pro-tocols. 2. PASTRY Pastry
12、4 is a generic peer-to-peer object location and routing substrate. It is a scalable, decentralized and self-organizing overlay network that automatically adapts to ar-rival, departure and failure of nodes. In this section we present a short overview of the Pastry design to provide the necessary back
13、ground for understanding our DIMA system. 2.1 A Pastry Node Each node joining the Pastry overlay network is assigned a random 128-bit node identifier (nodeId) within circular space 0 to 2128-1. The distribution of nodeIds is assumed to be uniform, which can be achieved by basing the nodeIds on
14、 cryptographic hashes of the nodes public keys or IP ad-dresses. Each node maintains a routing table which is organized into log16N rows with 15 columns, where N is the number of nodes in the overlay 1 . The entries in a row n each refer to a node whose nodeId shares the n first digits but not
15、 the n+1th digit with the present node. Associated with each routing entry is the IP address of the closest node (e.g., ac-cording to round trip time) that have the appropriate prefix. If no such node is known, the entry is left empty. Additional to the routing table, each node maintains a lea
16、f set. It contains IP-addresses of the L=2 numerically closest strictly, the number of rows and columns are log 2 b N and 2 b -1 respectively, where normally b=4. larger nodeIds and the L=2 closest smaller nodeIds, relative to the present node (L typically has the value 16). The leaf set is used dur
17、ing message routing as described in the next subsection. 2.2 Routing In Pastry, messages are routed according to a longest pre-fix match principle on the destinations Pastry key. If the key of a message is in range of the present nodes leaf set, then the message is routed to the node whose nodeId is
18、 clos-est to the key. If the key is not covered by the nodes leaf set, it looks up in the routing table a node whose nodeId shares a longer prefix with the key than its own nodeId and routes the message to this node. If there is no such node the message is routed to a node that shares the same lengt
19、h prefix with the present node, but is numerically closer to the destination address.  
20、; 3 2.3 Node Arrival and Node Departure A new node joining the overlay network has to initialize its state tables (i.e., the routing table and the leaf set) and then inform others of its presence. Assume a node with nodeId X. When node X wants to join the Pastry network it first locates a node in th
21、e network using for example “expanding ring IP multicast”. Assume the found node has nodeId A. Now node X sends a special join message with key X to node A. As with normal messages, node A routes this message to node Z which is numerically closest to X. Each node along the path to node Z sends one r
22、ow from its routing table to node X. The i:th node sends its i:th row. Node Z sends its leaf set to node X. Finally, node X informs any node that needs to be aware of its arrival. As nodes may fail or depart without warning, neighboring nodes in the key space (i.e., nodes in the leaf set) peri
23、odically exchange keep-alive messages. If a node is unresponsive for a time T, it is considered to be failed and all the members of the failed nodes leaf set update their leaf sets. Stale entries in the routing table are repaired by first trying to find a new route via its downstream node, and if no
24、t successful, starting its route table management mechanism. 2.4 APastry-based Store-and-Forward system POST 2 is a messaging system based on Pastry which provides message storage, per-user meta-data (e.g., relating messages to user), and notification. In POST, messages are objects that are being in
25、serted into to the peer-to-peer net-work and “message storage”-nodes are responsible for noti-fying the recipient when it has any messages pending. Such a system is well-suited for email, but does not fulfill the re-quirements for being a true instant messaging system as it is a store-and-forward ar
26、chitecture. 3. DISTRIBUTED INSTANT MESSAGING ARCHITECTURE (DIMA) We have designed a Distributed Instant Messaging Archi-tecture (DIMA) using Pastry and SCAN (Searching in Con-tent Addressable Networks) 3. The Pastry overlay network is used for node insertion and message routing. SCAN is a search met
27、hod for structured peer-to-peer networks, cur-rently being developed at Uppsala University. The current version of SCAN is implemented on top of Pastry, but could equally well be implemented on top of other routing sub- strates like for example Chord 5. We start this section with a short overview of
28、 SCAN. 3.1 SCAN SCAN implements a search mechanism for Pastry net-works that is capable of identifying the nodes holding the most relevant information. A classification of relevance of each node that is returned by the system is made on the basis of the number of keywords successfully matched.  
29、;3.1.1 Key Generation The idea behind SCAN is that content is addressable. In SCAN, a Pastry key does not correspond to one physical host, but to a piece of content on a host. Therefore a host may insert more than one nodeId into the Pastry network. Content meta-data e.g., keywords, are encoded usin
30、g the ASCII table, into a set of Pastry keys for NodeIds that are inserted into the network. A salt is added to the end of the Pastry key to make sure that similar/equal keywords do not generate the same Pastry key (figure 1). However, similar Pastry keys (generated from the same keyword) are inserted close to each other in the Pastry network since they share the same prefix. Figure 1: A Pastry key in SCAN.