
A Stable Approach for Routing Queries in Unstructured P2P Networks
Abstract
Introduction
Peer-to-peer (P2P) systems continue to find increasing and diverse uses as a distributed, scalable and robust framework to deliver services, e.g., file sharing, video streaming, expert/advicesharing,sensornetworks,databases,etc.Oneofthe basic functions of such systems is that of efficiently resolving queries or discovering files/resources. This is the problem addressed in this paper. There is a considerable body of work exploring the design of efficient search/routing mechanisms in structured and unstructured P2P networks, see e.g., [1]–[10]. In structured networks, peers/files/resources are organized to form overlays with specific topologies and properties. Search mechanisms that perform name resolution based on distributed hash table (DHT) coordinate systems can be devised to achieve good forwarding-delay properties, see e.g., [2]. In such systems, the query traffic may depend on how keys are assigned. So, load balancing requires proactive/reactive assignments of keys to peers and data/service objects, e.g., [11], and possibly exploiting network hierarchies [10]. Fundamentally, in such networks the difficulty of search/discovery is shifted to that of maintaining the structural invariants required to achieve efficient query resolution particularly in dynamic settings with peer/contentchurnorwhenreactiveloadbalancingisrequired.
Unstructured networks, by contrast, are easier to setup and maintain, but their mostly ad hoc overlay topologies make realizing efficient searches challenging. In a purely unstructured P2P network, a node only knows its overlay neighbors. With such limited information, search techniques for unstructured networks have mostly been based on limited-scope flooding, simulated random walks, and their variants [3]–[5]. Much research in this area has focused on evaluating these search techniques based on the contact time, i.e., number of hops required to find the target, using the spectral theory of Markov chains on (random) graphs, see e.g., [4]–[6]. Unfortunately in heterogenous settings where service capacity or resolution likelihoods vary across peers, such search techniques perform poorly under high query loads. The inefficiencies of purely unstructured networks can be partially addressed by hybrid P2P systems, e.g., FastTrack and Gnutella2. Such systems use a simple two-level hierarchy wheresomepeersserveas‘super-peers.’Thesearehighdegree nodes which are well connected to other super-peers and to a set of subordinate nodes in a hub-and-spoke manner [12].
Though such systems have advantages in terms of scalability, proposed search techniques are still based on variants of flooding and random walks. The work of [7] proposes an approach where peers cache the outcomes of past queries as informed by reverse-path forwarding. The idea is to learn, from past experience, the best way to forward certain classes of queries, i.e., to intelligently “bias” their forwarding decisions by correlating classes of queries with neighbors who can best resolve them. This approach involves considerable overhead, is not load sensitive, and has not yet given guarantees on performance. Although, as will be clear in the sequel, our results are not exclusive to hybrid P2P networks, these will serve as the focus of the paper. We assume that each super-peer contributes a possibly heterogenous amount of processing resource for resolving queries for the network – incentives for doing so are outsideofthescopeofthispaper,seee.g.,[8],[9].Super-peers serve their subordinates by resolving queries, or forwarding them to other super-peers. Super-peers can resolve queries by checking the files/resources they have, as well as those of their subordinate community. In our approach we also introduce a notion of query classes. These might, for example, represent types of content, such as music, films, animations, documents, or some other classification of files/resources relevant to the application at hand.
The idea is that such a grouping of queries into classes can be used as a low overhead approach to make useful inferences on how to relay queries. Given a hybrid P2P topology and query classification, we propose a novel query resolution mechanism which stabilizes the system for all query loads within a ‘capacity region’, i.e., the set of loads for which stability is feasible. Essentially, our policy is a biased random walk where forwarding decision for each query is based on instantaneous query loads at super-peers. To balance the load across heterogeneous superpeers, the policy aims at reducing the differential backlog at neighboring super-peers, while taking into account the class and history information to improve the query’s resolvability. Our policy draws upon standard backpressure routing algorithm, which is used to achieve stability in packet switching networks, e.g., see [13], [14]. In previously studied backpressure based systems, the goal is to deliver packets to the correspondingdestinations.Bycontrast,our aimis to provide a grade of service in resolving querieswithnofixeddestinations.
The random nature of the location of query resolution in the network leads us to deal with expected queue backlog instead of current queue backlog. Further, in P2P systems, the probability of resolution of a query at a given node depends on the query’s history, i.e., the path that led it to the current node. These characteristics of P2P systems are not captured in previous works on backpressure by Tassiulas and Ephremides [13] and the subsequent enhancements, see e.g., [14]–[21]. To summarize, our approach differs from standard work on backpressure in that we incorporate the following different issues that arise in P2P search: (a) we model the uncertainty in the locations where a query may be resolved depending upon where the file/object of interest are placed, (b) we guarantee a grade of service to each query under such uncertainties, (c) we incorporate the information about a query’s resolvability available through the knowledge of its history. We also propose several natural enhancements to our backpressure based query routing policy. By contrast to previous works on backpressure such as [15]–[19] and citations therein, these enhancements are also driven by P2P query routing setting. For example, in order to reduce delays previous works developalgorithmswhichprefershorterpathsoverlongerones by explicitly accounting for the hop length of various paths [15], or by finding good routes towards destinations using artificial ‘shadow’ queues which operate at larger loads to build gradients [17]. In our P2P query routing setting the destination of a query is not known a priori. We reduce delays via a simple ‘work conserving’ policy which efficiently uses available resources in routing queries at each node. We further propose a state aggregation policy aimed at reducing the complexity arising from the need to track the history of currently unresolved search queries.
Our Contributions. The main contributions of this paper are as follows. We propose a query forwarding mechanism for unstructured (hybrid) P2P networks with the following properties.
1. It dynamically accounts for heterogeneity in super-peer’s ‘service rate,’ reflecting their altruism, and query loads across the network. To the best of our knowledge, this is the first work to rigorously account for such heterogeneity in devising a search mechanism for P2P networks.
2. It is based on classifying queries into classes. This classification serves as a type of name aggregation, which enables nodes to inferthelikelihoods of resolving class queries,which, in turn, are used in learning how to forward queries.
3. Our approach is fully distributed in that it involves information sharing only amongst neighbors, and achieves stability subject to a Grade of Service (GoS) constraint on query resolution. The GoS constraint corresponds to guaranteeing that each query class follows a route over which it has a reasonable ‘chance’ of being resolved.
4. We provide and evaluate several interesting variations on our stable mechanism that help significantly improve the delay performance, and further reduce the complexity making it amenable to implementation.
Specifically, we formally show that backpressure with aggregated queues, where aggregation is based on queries’ histories, is stable for fully connected super-peer networks. This provides a basis for substantially reducing complexity by approximations, e.g., in the case where content is randomly placed.
Proposed System
- Given a hybrid P2P topology and query classification, we propose a novel query resolution mechanism which stabilizes the system for all query loads within a ‘capacity region’, i.e., the set of loads for which stability is feasible.
- Essentially, our policy is a biased random walk where forwarding decision for each query is based on instantaneous query loads at super-peers.
- To balance the load across heterogeneous super-peers, the policy aims at reducing the differential backlog at neighboring super-peers, while taking into account the class and history information to improve the query’s resolvability.
- Our policy draws upon standard backpressure routing algorithm, which is used to achieve stability in packet switching networks,
- We propose a query forwarding mechanism for unstructured (hybrid) P2P networks with the following properties.
- It dynamically accounts for heterogeneity in super-peer’s ‘service rate,’ reflecting their altruism, and query loads across the network. To the best of our knowledge, this is the first work to rigorously account for such heterogeneity in devising a search mechanism for P2P networks.
- It is based on classifying queries into classes. This classification serves as a type of name aggregation, which enables nodes to infer the likelihoods of resolving class queries, which, in turn, are used in learning how to forward queries.
- Our approach is fully distributed in that it involves information sharing only amongst neighbors, and achieves stability subject to a Grade of Service (GoS) constraint on query resolution. The GoS constraint corresponds to guaranteeing that each query class follows a route over which it has a reasonable ‘chance’ of being resolved.
- We provide and evaluate several interesting variations on our stable mechanism that help significantly improve the delay performance, and further reduce the complexity making it amenable to implementation.
Advantages
- Reducing complexity
- Estimating parameters, and adaptation to class-based query resolution probabilities and traffic loads are studied.
- Stable Policies
- Estimating Query Resolution Probabilities
- Alternate Grades of Service Strategies
- It is based on classifying queries into classes
- The GoS constraint corresponds to guaranteeing that each query class follows a route over which
- It has a reasonable ‘chance’ of being resolved
- This provides a basis for substantially reducing complexity by approximations

Implementation
Estimating query resolution probabilities So far we have assumed that resolution probabilities for queries of different types are known. In practice they can be easily estimated. In order to ensure unbiased estimates can be obtained at each node, suppose a small fraction of all queries is marked ‘RW’, forwarded via the random walk policy with a large TTL, and given scheduling priority over other queries. With a sufficiently large TTLthisensuresthateachnodewillseearandomsample of all query and types it could see and thus allow for unbiased estimates. All queries which are not marked ‘RW’ are treated according to our backpressure policy based on the estimated query resolution probabilities. A node I receives O(tE) ‘RW’ marked samples in time t. ReducingcomplexityNotunlikestandardbackpressurebased routing our policies suffer from a major drawback: each node needs to share the state of its potentially large number of non-empty queues with its neighbors. For backpressure-based routing the number of queues per node corresponds to the number of flows (commodities) in the network. In our context, the number of queues per node corresponds to number of query types it could see, i.e., worst case . In this section we propose simple modification and approximations that considerably reduce the overheads, albeit with some penalty in the performance. The key idea is to define equivalence classes of query types that share a ‘similar’ history, in the sense that they have similar conditional probabilities of resolution, and have them share a queue. For example, all query types of class c which have visited the same number of nodes k mightbegroupedtogether,reducingthenumberofqueues to (|C||N|) or better. Alternatively we will show one can further reduce overheads by approximately grouping similar query types based on their classes cand the cumulative number of class c files/resources they have seen in nodes in H(T ), reducing the number of queues to (|C|L) where L is a set of quantization levels. Intuitively such queries have seen similar opportunities if files/resources are randomely made available in the network. Alternate grades of service strategies Till now we provided grade of service based on a priori probability( ). We provide below, in brief, some alternate strategies for providing GoS which may be more suited to some applications. Multiple responses for each query: For certain applications, it may be beneficial to provide multiple responses to the source generating the query. For example, for applications requiring downloading a huge file, among the available options, the client may want to choose the server which is least loaded or is physically closest.
Conclusion
A Stable Approach for Routing Queries in Unstructured P2P Networks,To summarize, we provided a novel, distributed, and reliable search policy for unstructured peer-to-peer networks with super-peers. Our backpressure based policy can provide capacity gains of as large as 68 over traditional random walk techniques. We also provided modifications to the algorithm that make it amenable to implementation.