gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...


From: Hermanni Hyytiälä
Subject: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...
Date: Tue, 25 Mar 2003 08:37:05 -0500

CVSROOT:        /cvsroot/gzz
Module name:    gzz
Changes by:     Hermanni Hyytiälä <address@hidden>      03/03/25 08:37:05

Modified files:
        Documentation/misc/hemppah-progradu: masterthesis.tex 

Log message:
        Is it done !?

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/masterthesis.tex.diff?tr1=1.185&tr2=1.186&r1=text&r2=text

Patches:
Index: gzz/Documentation/misc/hemppah-progradu/masterthesis.tex
diff -u gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.185 
gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.186
--- gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.185      Tue Mar 
25 07:10:22 2003
+++ gzz/Documentation/misc/hemppah-progradu/masterthesis.tex    Tue Mar 25 
08:37:05 2003
@@ -1181,34 +1181,34 @@
 Directed BFS \cite{yang02improvingsearch} optimizes the original 
 BFS in a way that a peer selects the neighbors which have provided many 
quality results in the past, 
 thereby maintaining the quality of costs and decreasing the amount
-of messages sent to network. Alpine \cite{alpineurl} and NeuroGrid 
\cite{joseph02neurogrid} 
-are Peer-to-Peer systems which use somewhat similar method when performing 
data lookups.
+of messages sent to network. 
 
-Local indices \cite{yang02improvingsearch} is a variation of active caching. 
-In this scheme, each peer maintains an index over the data of all peers within 
+In the local indices techique \cite{yang02improvingsearch}, each peer 
maintains an index over the data of all peers within 
 $h$ hops of itself, where $h$ is a system-wide variable, called radius of the
 index\footnote{In the normal BFS case, the value of $h$ is 0, as a peer only 
has index
-over its local content.}. Mutual index caching architecture, as proposed in 
-\cite{osokine02distnetworks}, is a variation of local indices technique. 
+over its local content.}. Thus, when a peer receives a data lookup request, it 
can
+process the request on behalf of every node within $r$ hops. Compared to 
original BFS, 
+the local indices technique keeps data lookup cost low while maintaining the 
same
+number of search results.
 
 In the random walk approach \cite{lv02searchreplication}, a peer forwards 
query to a 
 randomly selected neighbor. The basic random walk approach
 has a poor response time but it doesn't generate as much network traffic as 
 the original BFS. As suggested in \cite{lv02searchreplication}, the
 random walk approach can be made more effective by introducing
-multiple ''walkers''. 
-
-Freenet \cite{clarke00freenet} uses random walk searches in data lookups. 
Freenet's data lookup model resembles
+multiple simultaneously working ''walkers''. Freenet \cite{clarke00freenet} 
uses 
+random walk searches in data lookups. Freenet's data lookup model resembles
 Depth-First-Search (DFS) and peers' routing tables are dynamically built
 using caching. This is an outcome of Freenet's main design principles, 
anonymity. 
 Another property of the Freenet's data lookup model is that
-it adapts well with varying usage patterns. Improvements to Freenet's data 
lookup using
-the ''small-world phenomenon'' have been proposed by Zhang et al. 
\cite{zhang02using}.
+it adapts well with varying usage patterns (e.g., searching for popular data 
items in the overlay). 
+Improvements to Freenet's data lookup using
+the ''small-world'' techniques have been proposed by Zhang et al. 
\cite{zhang02using}.
 
 Since tightly structured systems have an efficient data lookup at the 
application level overlay,
 current research efforts are focused on the proximity-based data lookup. In 
the proximity-based data lookup, 
 peers try to choose entries of routing-tables referring to other peers that 
are \emph{nearby} in the 
-underlying network. In this way, tightly structured systems are able to 
decrease actual 
+underlying network. In this way, tightly structured systems are able to 
decrease the actual 
 lookup \emph{latency}. CAN \cite{ratnasamy01can}, Kademlia 
\cite{maymounkov02kademlia}, 
 Pastry \cite{rowston01pastry} and Tapestry \cite{zhao01tapestry} have advanced 
heuristics for
 the proximity-based routing. Additionally, most recent version of Chord uses 
proximity-based 
@@ -1216,109 +1216,111 @@
 uses a combination of proximity and application level overlay routing when 
performing data 
 lookups. Authors call this feature as a \emph{constrained load balancing}.
 
-Additional research related to proximity-based routing include 
\cite{karger02findingnearest, hildrum02distributedobject, 
-brinkmann02compactplacement, rhea02probabilistic, castro02networkproximity, 
ng02predicting, pias03lighthouse}.
-
 \subsection{Fast and usable search}
 
 To make Peer-to-Peer systems even more popular (and usable), these systems 
have to support flexible, efficient
-and easy search methods. For instance, Internet's perhaps the most important 
feature
-is the ability to perform keyword searches (e.g., Google \cite{googleurl}). 
Currently, only loosely
-structured systems are able to carry out this requirement. Unfortunately, as 
discussed in this text,
-the data lookup model of the loosely structured approach doesn't scale. Thus, 
research efforts have
-been focused towards tightly structured systems. The main problem with tightly 
structured systems is the 
-fact that tightly structured algorithms perform data lookups based on a 
globally unique identifier (key). 
+and simple search methods \cite{li03feasibility}. Currently, loosely 
structured systems are able to carry out the flexibility and
+simplicity requirements and tightly structured systems are able to fulfill the 
efficiency requirement. 
+Research efforts have been focused to make security methods in tightly 
structured systems more usable since
+the data lookup model of the loosely structured approach doesn't scale. 
However, studies
+show that combining flexible search methods and tightly structured systems may 
not be trivial \cite{harren02complex,
+ansaryefficientbroadcast03}.  The main problem with tightly structured systems 
is the 
+fact that tightly structured algorithms perform data lookups based on a 
globally unique identifier thereby
+making efficient keyword searches hard to implement. 
+
+Some studies have been concentrated on SQL-like queries \cite{harren02complex}
+in tightly structured overlays. It is unknown, however, if this approach is 
realizable to implement, since
+initial analysis have shown that this approach is rather complex. Other 
approaches include adaption of the data lookup model of the loosely 
+structured approach into tightly structured systems 
\cite{ansaryefficientbroadcast03, chord:om_p-meng}.
+Work in \cite{ansaryefficientbroadcast03} seems quite promising. Authors' work 
is based on insight that
+performing data lookup in the overlay resembles regular tree-like search, 
where trees' data structure
+is distributed throughout the overlay. Some studies suggest additional layer 
upon overlay network \cite{kronfol02fasdsearch, 
+joseph02p2players}, which use metadata to implement search methods. The 
feasibility of implementing additional 
+search layer on top of the network layer is questionable, especially if the 
search layer and the network
+layer have different assumptions about the participating peers (e.g., the 
network layer supports heterogeneity
+of peers, but the search layer doesn't). Andrzejak et al. propose range 
queries \cite{andrzejak02rangequeries}
+to be used with tightly structured overlays. In this technique, it is feasible 
to perform data lookups
+using ranges of keys thereby covering larger amount of possible data items. 
Currently their prototype
+is designed for the CAN system \cite{ratnasamy01can}.
 
 Recent study has been focused on the feasibility of Peer-to-Peer Web-like 
indexing and searching 
 on top of tightly structured overlays \cite{li03feasibility} . Authors argue, 
that it is possible to implement 
 Peer-to-Peer Web-like search with certain compromises. First, Peer-to-Peer 
search engine may need to 
 decrease the result quality in order to make searching more efficient. Second, 
Peer-to-Peer systems must 
-consult the properties of underlying network for better performance. 
-
-Some studies have been concentrated on SQL-like queries \cite{harren02complex}
-in tightly structured overlays. Other approaches include adaption of the data 
lookup model of the loosely 
-structured approach into tightly structured systems 
\cite{ansaryefficientbroadcast03, chord:om_p-meng}.
-Some studies suggest additional layer upon overlay network 
\cite{kronfol02fasdsearch, joseph02p2players} 
-and range queries \cite{andrzejak02rangequeries}.
+consult the properties of underlying network for better performance.
 
 Many techniques have been developed in order to provide more efficient search 
indexing. As 
 several studies show, the popularity of queries in the Internet follow 
Zipf-like 
 distributions\footnote{Zipf-distribution is a variant of power-law function. 
 Zipf-distribution can be used in observation of frequency of occurrence event 
$E$, as a function of the rank 
 $i$ when the rank is determined by the frequency of occurrence, is a power-law 
function $E_i \sim \frac{1}{i^{a}}$, 
-where the exponent $a$ is close to unity.} (e.g., 
\cite{breslau98implications}). Therefore, caching and pre-computation
-can be done for optimizing search indices \cite{li03feasibility}. Regular 
compression algorithms,
-Bloom filters \cite{362692}, vector space models 
\cite{CuencaAcuna2002DSIWorkshop} and view 
-trees \cite{Bhattacharjee03resultcache} can be used for even better 
optimizations. Authors 
-in \cite{li03feasibility} use Gap compression \cite{wittengigabytes}, Adaptive 
Set Intersection \cite{338634}  
-and clustering with their search optimizations.
+where the exponent $a$ is close to unity.} (e.g., 
\cite{breslau98implications}).
+Therefore, according to \cite{li03feasibility}, caching and pre-computation 
can be done for optimizing search indices. 
+Authors in \cite{li03feasibility} use Gap compression \cite{wittengigabytes}, 
Adaptive Set Intersection \cite{338634}  
+and clustering with their search optimizations. Regular compression 
algorithms, Bloom filters \cite{362692}, vector 
+space models \cite{CuencaAcuna2002DSIWorkshop} and view trees 
\cite{Bhattacharjee03resultcache} can be used for even 
+better optimizations. 
 
 While it is expected that web-like searches can be layered on a top of tightly 
structured overlay, much
 more research is required to make indexing and searching more efficient.
 
-
 \subsection{System management}
 
 Adaptive system management and self-organization are essential properties
 of any Peer-to-Peer system, since centralized control over the system is 
missing. Loosely structured
-systems require less system management properties than tightly structured 
systems; in a loosely
-structured system, peers join and leave the system constantly without any 
restrictions. On the
-other hand, however, peers in tightly structured system join and leave the 
system but have less freedom, 
+systems require less system management properties than tightly structured 
systems: in a loosely
+structured system, peers join and leave the system constantly without any 
restrictions \cite{saroiu02measurementstudyp2p}. 
+On the other hand, however, peers in tightly structured system join and leave 
the system but have less freedom, 
 i.e., overlay chooses peer's neighbors on behalf of the peer itself and maps 
data items randomly 
-throughout the overlay network. 
-
-All presented algorithms of the tightly structured approach have been analyzed 
under static simulation 
-environments. Furthermore, proposed tightly structured overlays are configured 
statically to achieve 
+throughout the overlay network \cite{balakrishanarticle03lookupp2p}. Almost 
all presented algorithms 
+for the tightly structured systems have been analyzed under static simulation 
+environments \cite{libennowell01observations}. Furthermore, proposed tightly 
structured overlays are configured statically to achieve 
 the desired reliability even in a uncommon and adverse environment 
\cite{rowston03controlloingreliability}. 
-The most important factor for future research is to get real-life experiences 
from tightly structured 
-systems, when there are frequent joins and leaves in the system.
+Thus, one of the most important factors for future research is to get 
real-life experiences from tightly structured 
+systems, when there are frequent joins and leaves of peers in the system.
 
-The concept of ''half-life'' was introduced by Liben-Nowell 
\cite{libennowell01observations} since Peer-to-Peer 
-system is \emph{never} in the ''ideal'' state as Peer-to-Peer system is 
continiously evolving system. Half-life is defined
-as follows: let there be $N$ live peers at time $t$. The doubling from time 
$t$ is the time that pass before
-$N$ new additional peers arrive into the system. The halving time from time 
$t$ is the time
-required for half of the living peers at time $t$ to leave the system. The 
half-life from 
-time $t$ is smaller of the properties stated above. The half-life of the 
entire system is the 
-minimum half-life over all times $t$. Concept of half-time can be used as a 
basis for developing
-more powerful analytical tools for modelling complex Peer-to-Peer systems. 
+As mentioned before, an implicit assumption of almost every tightly structured 
system is that there is a random, uniform 
+distribution of peer and key identifiers. Even if participating peers are 
extremely heterogeneous, e.g., in
+computing power or network bandwidth, all data items are distributed 
uniformly. Clearly, this is
+a serious problem of tightly structured overlays in face of performance and 
load balancing \cite{rao03loadbalancing}. 
+Measurement study by Saroiu et al. show that there is a extreme heterogeneity 
among participating peers in already deployed Peer-to-Peer 
+systems \cite{saroiu02measurementstudyp2p}. Symphony \cite{gurmeet03symphony} 
seems to be the first tightly structured overlay system 
+which supports heterogeneity. Zhao et al. have proposed a secondary layer on 
top of a structured overlay
+to support heterogeneity better \cite{zhao02brocade}. 
 
 Some research has been done with regard to load balancing properties of 
tightly structured
 overlays. Byers et al. suggest an idea of ''power of two choices'' whereby 
data item is stored at the less loaded 
 of two (or more) random peer alternatives \cite{byers03dhtbalancing}. Rao et 
al. use virtual servers
 to control the load balance in a Peer-to-Peer system 
\cite{rao03loadbalancing}. Their work rests on the
 idea which was originally introduced by Chord \cite{stoica01chord} system.
+Ledlie et al. propose techniques for forming and maintaining groups in a 
highly dynamic environment 
+\cite{ledlie02selfp2p}. Their work relies on the idea that
+participating peers would create multiple hierarchical groups. It is not clear 
whether this approach
+is scalable or fault tolerant and suitable for Peer-to-Peer environment. More 
promising work has been done by Rowston et al.
+in \cite{rowston03controlloingreliability}. Authors propose techniques for 
self-tuning, dealing with
+uncommon conditions (e.g., network partition and high failure rates). 
Moreover, authors argue that
+with these techniques, the concerns over tightly structured overlay 
maintenance costs are no more
+an open issue. 
 
 Also, query and routing hot spots may be an issue in tightly structured 
overlays \cite{ratnasamy02routing}. 
 Hot spots happen, when a specific key is being requested extremely often in 
tightly structured overlays. Recent study 
 by Freedman et al. tries to reduce hot spots in the system by performing 
\emph{sloppy} hashing 
 \cite{sloppy:iptps03}. Authors' technique is especially suitable for the DOLR 
abstraction of tightly structured overlays. 
-With Sloppy hashing, we are able to reduce the generation of query hot spots. 
Sloppy hashing enables to 
+They arque that with Sloppy hashing, the generation of query hot spots can be 
reduces and peers are able 
 locate nearby data without looking up data from distant peers. Moreover, 
authors' 
 proposal for self-organizing clusters using network diameters may be useful, 
-especially within small groups of working people. Thus, with Sloppy hashing
-we can provide locality properties the system.
-
-
-
-As mentioned before, an implicit assumption of almost every tightly structured 
system is that there is a random, uniform 
-distribution of peer and key identifiers. Even if participating peers are 
extremely heterogeneous, e.g., in
-computing power or network bandwidth, all data items are distributed 
uniformly. Clearly, this is
-a serious problem of tightly structured overlays in face of performance and 
load balancing. Measurement study 
-by Saroiu et al. show that there is a extreme heterogeneity among 
participating peers in already deployed Peer-to-Peer 
-systems \cite{saroiu02measurementstudyp2p}. Symphony \cite{gurmeet03symphony} 
seems to be the first tightly structured overlay system 
-which supports heterogeneity. Zhao et al. have proposed a secondary layer on 
top of a structured overlay
-to support heterogeneity better \cite{zhao02brocade}. 
+especially within small groups of working people.
 
-Research has been done on self-organization. Ledlie et al. propose techniques 
for forming and maintaining
-groups in a highly dynamic environment \cite{ledlie02selfp2p}. Unfortunately 
their work relies on the idea that
-participating peers would create multiple hierarchical groups. It is not clear 
whether this approach
-is fault tolerant and suitable for Peer-to-Peer environment. More promising 
work has been done by Rowston et al.
-in \cite{rowston03controlloingreliability}. Authors propose techniques for 
self-tuning, dealing with
-uncommon conditions (e.g., network partition and high failure rates). 
Moreover, authors argue that
-with these techniques, the concerns over tightly structured overlay 
maintenance costs are no more
-an open issue.
+The concept of ''half-life'' was introduced by Liben-Nowell 
\cite{libennowell01observations} since Peer-to-Peer 
+system is \emph{never} in the ''ideal'' state as Peer-to-Peer system is 
continiously evolving system. Half-life is defined
+as follows: let there be $N$ live peers at time $t$. The doubling from time 
$t$ is the time that pass before
+$N$ new additional peers arrive into the system. The halving time from time 
$t$ is the time
+required for half of the living peers at time $t$ to leave the system. The 
half-life from 
+time $t$ is smaller of the properties stated above. The half-life of the 
entire system is the 
+minimum half-life over all times $t$. Concept of half-time can be used as a 
basis for developing
+more powerful analytical tools for modelling complex Peer-to-Peer systems.   
 
-Finally, little research has been done regarding self-monitoring and data 
availability. Zhang et al.
+Finally, little research has been done regarding self-monitoring. Zhang et al.
 describe an arbitrary data structure on top of a tightly structured overlay 
\cite{zhang03somo}. Authors
 call their technique as a \emph{data overlay}, since it supports several 
fundamental data structures.
 Authors have used this data overlay when building a Self-Organized Meta data 
Overlay (SOMO), which can be used




reply via email to

[Prev in Thread] Current Thread [Next in Thread]