gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...


From: Hermanni Hyytiälä
Subject: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...
Date: Tue, 04 Mar 2003 10:02:10 -0500

CVSROOT:        /cvsroot/gzz
Module name:    gzz
Changes by:     Hermanni Hyytiälä <address@hidden>      03/03/04 10:02:10

Modified files:
        Documentation/misc/hemppah-progradu: masterthesis.tex 

Log message:
        Typos and updates

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/masterthesis.tex.diff?tr1=1.110&tr2=1.111&r1=text&r2=text

Patches:
Index: gzz/Documentation/misc/hemppah-progradu/masterthesis.tex
diff -u gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.110 
gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.111
--- gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.110      Tue Mar 
 4 07:25:17 2003
+++ gzz/Documentation/misc/hemppah-progradu/masterthesis.tex    Tue Mar  4 
10:02:09 2003
@@ -34,7 +34,7 @@
 \contactinformation{\\
 Hermanni Hyytiälä\\
 Huhtalammentie 5 as. 17\\
-37637 JYVÄSKYLÄ\\
+40640 JYVÄSKYLÄ\\
 sähköposti: address@hidden
 
 
@@ -43,7 +43,7 @@
 key properties. We summarize open problems in Peer-to-Peer networks and divide
 problems into three sub-categories. We observe that there are many
 problems, which have not solutions at all, or problems have proposed
-solutions, but they are practically unrealizable.
+solutions but they are practically unrealizable.
 
 Then, we give an overview of our Fenfire system.  We evaluate existing
 Peer-to-Peer approaches-- loosely and tightly structured overlays-- with regard
@@ -52,16 +52,16 @@
 }
 \tiivistelma{
 Tässä opinnäytetyössä arvioimme olemassaolevia vertaisverkkoja, protokollia ja
-niiden erikoisominaisuuksia. Teemme yhteenvedon olemassa olevista ongelmista
-vertaisverkoista ja jaamme ongelmat kolmeen alakategoriaan. Havaitsemme, että
+niiden erityisominaisuuksia. Teemme yhteenvedon olemassa olevista ongelmista
+vertaisverkoissa ja jaamme ongelmat kolmeen alakategoriaan. Havaitsemme, että
 on olemassa useita ongelmia, joihin ei ole ratkaisua lainkaan, tai on ehdotelma
-ongelman ratkaisemiseki, mikä kuitenkin käytännössä on mahdotonta toteuttaa.
+ongelman ratkaisemiseksi, mikä käytännössä on kuitenkin mahdotonta toteuttaa.
 
-Tämän jälkeen annamme yleiskuvan Fenfire-järjestelmästämme.
+Tämän jälkeen annamme yleiskuvan Fenfire-järjestelmästä.
 Arvioimme olemassaolevia vertaisverkkoarkkitehtuureja-- löyhästi ja tiukasti
 rakennettuja päällysverkkoja-- Fenfiren vaatimusten valossa. Lopuksi ehdotamme
 yksin-kertaisia algoritmeja, joiden avulla voidaan tehokkaasti löytää
-vertaisverkosta tietoa Fenfire:n liittyen.
+vertaisverkosta Fenfire:n kannalta olennaista tietoa
 }
 
 \begin{document}
@@ -81,8 +81,8 @@
 Peer-to-Peer systems have recently received noteworthy attention in both 
 academia and industry for a number of reasons. First, the lack of 
centralization 
 means that participants can form a distributed system without any investment 
to 
-centralized, high-priced hardware which would to coordinate it. Second, 
Peer-to-Peer
-provides new, direct way to achieve interoperability between network 
participants. 
+centralized, high-priced hardware which would coordinate it. Second, 
Peer-to-Peer
+provides new direct way to achieve interoperability between network 
participants. 
 Finally, the distributed and ad-hoc nature of Peer-to-Peer improves scalability
 and reliability againts certain kinds of faults (e.g., single point of 
failure).
 
@@ -111,7 +111,7 @@
 problems in easy-to-understand tables. 
 
 Next, we give an overview of our Fenfire system, which implements xanalogical 
storage model. We
-also describe briefly Storm software module, which is essential part of 
Fenfire's
+also describe briefly Storm software module, which is an essential part of 
Fenfire's
 Peer-to-Peer functionality. We evaluate existing Peer-to-Peer approaches and
 choose the best alternative to our needs. We discover that Storm, xanalogical 
model and
 tightly structured Peer-to-Peer approach all have similar method to deal with 
data,
@@ -129,27 +129,27 @@
 is to find the most efficient way to locate and fetch Fenfire related data 
from the 
 Peer-to-Peer network, where Scroll block's identifier is given. Second, we want
 to find the most efficient way to locate and fetch most recent Fenfire related 
data from the
-the Peer-to-Peer network, which is associated with a given urn-5 random 
string. Final problem
+the Peer-to-Peer network, which is associated with a given urn-5 random 
string. Third problem
 is otherwise same as the second problem, except we want to locate and fetch 
all Fenfire
 related data from the Peer-to-Peer network, where given date and/or time range 
is given.
 
 When comparing different Peer-to-Peer approaches and algorithms, we will 
examine their
 scalability, efficiency, space requirements for neighbor connections and 
overhead
-associated with system maintenance. When we have solutions to our research
+associated with system maintenance. Finally, when we have solutions to our 
research
 problems, we will use best solutions as examples in our algorithm proposals.
 
 \section{Thesis overview}
 This thesis is structured as follows. In next chapter, we give an overview of
 existing Peer-to-Peer approaches, algorithms and key differences. In chapter 
3, we
 address open problems in Peer-to-Peer domain and divide problems into three
-sub-categories. Chapter 4 gives an overview of our Fenfire system. In chapter
-5 we evaluate existing Peer-to-Peer approaches with regard to Fenfire system 
and
-propose simple algorithms perform data lookups in Fenfire's Peer-to-Peer 
enviroment. 
-We also discuss open issues and future work. Finally, we present conclusions 
in chapter
-6.
+sub-categories. Chapter 4 gives an overview of Fenfire system. In chapter
+5, we evaluate existing Peer-to-Peer approaches with regard to Fenfire system 
and
+propose simple algorithms to perform data lookups in Fenfire's Peer-to-Peer 
enviroment. 
+Then, we discuss open issues and future work. In chapter 6, we present 
conclusions. 
+
 
 \chapter{Peer-to-Peer architectures}
-In this chapter we give brief history and overview of Peer-to-Peer networks, 
+In this chapter we will give brief history and overview of Peer-to-Peer 
networks, 
 review most important Peer-to-Peer algorithms and list key differences between 
 two main approaches.
 
@@ -189,17 +189,17 @@
 form of modern Peer-to-Peer computing is file-sharing. In this scenario, 
participants
 of Peer-to-Peer network share their resources to other participants while 
obtaining
 more resources from others. This can been seen as a variant of distributed 
filesystem
-(see, e.g., \cite{levy90distributedfilesystems}). 
+(e.g., \cite{levy90distributedfilesystems}). 
 
 In a development of modern Peer-to-Peer systems, lot of influences has been 
attained from 
 other research areas than computer science. There has been done research 
regarding 
 to ad-hoc nature of complex networks \cite{albert-02-statistical}, 
\cite{albert-00-tolerance}, \cite{watts00dynamics}. 
 It's interesting to realize that chemical properties of cells, the Internet, 
ad-hoc 
-Peer-to-Peer networks, have in common that they all self-organize based on 
same 
+Peer-to-Peer networks, have all in common that they self-organize based on 
same 
 principles.  Furthermore, the assocation between social connections among 
people 
 and Peer-to-Peer overlay topology has been studied recently  
\cite{watts00dynamics}, 
 \cite{kleinberg99small}, \cite{nips02-Kleinberg}. This insight is motivated
-by Milgram, how noticed that people are very effective to locate other people 
in a wide scale, 
+by Milgram, how noticed that people are very effective to locate other people 
in a wide scale 
 based on local knowledge. This phenomenon is called as ''small-world 
phenomenon'' 
 \cite{milgram67smallworld}. As a consequence, many modern Peer-to-Peer systems
 have applied techniques outside of computer science when constructing and 
maintaining
@@ -207,7 +207,7 @@
 
 In the end, however, there are two main approaches in which all modern 
Peer-to-Peer
 systems fall: loosely structured approach and tightly structured approach. In 
loosely
-structured approach the construction and the maintenance of the overlay is-- 
as the name
+structured approach the construction and the maintenance of the overlay-- as 
the name
 suggests- is controlled loosely. This approach gives freedom for participating 
peers
 to perform certains tasks in Peer-to-Peer network. On the other hand, tightly 
structured
 approach has some rules, which all participating peers have to obey. In the 
following
@@ -230,12 +230,12 @@
 \section{Loosely structured}
 
 Gnutella \cite{gnutellaurl} is well-known example of loosely structured 
overlay network. As in
-other Peer-to-Peer networks, no Peer is more important than any other Peer in 
the network.
-The construction and maintenance of Gnutella network is extremely ad hoc, 
since participating
-peers can form the overlay network based on local knowledge. Figure 
\ref{fig:gnutella_overlay}
+other Peer-to-Peer networks, no peer is more important than any other peer in 
the network.
+The construction and maintenance of Gnutella network is extremely ad-hoc, 
since participating
+peers can form the overlay network based on \emph{local} knowledge. Figure 
\ref{fig:gnutella_overlay}
 illustrates how peers form an overlay network. Initially, peer 1 creates the 
overlay, since
 it's the first participating peer. Then, repeatly new peers join the network 
and connects to
-other nodes in a random manner. Thus, gnutella can be considered as a 
\emph{random graph}.
+other nodes in a random manner. Thus, gnutella can be considered as a 
variation of \emph{random graph}.
 
 \begin{figure}
 \centering
@@ -246,17 +246,17 @@
 
 
 In Gnutella, each participating peer maintains local index of its own shared 
content. Also,
-each peer has a few connections to other peer, i.e., peer's \emph{neighbors}. 
Basic gnutella
+each peer has a few connections to other peer, i.e. peer's \emph{neighbors}. 
Basic gnutella
 data lookup works as follows: peer broadcasts a query request to its neighors, 
which in turn
 forwards the query to their neighbors. This leads in the situation where 
number of messages
 in the network can grow with $O(n^{2})$, where $n$ is the number of 
participating peers in the
 Gnutella network. To limit the amount of network traffic, Gnutella uses 
Time-To-Live-limited
-(TTL) flooding to distributed queries. Gnutella uses a breadt-First traversal 
with depth limit 
-$T$ (e.g., 7), where T is the system-wide maximum TTL of a message in hops. 
Therefore, only peers that 
+(TTL) flooding to distributed queries. Gnutella uses a Breadt-First-Traversal 
(BFS) with depth limit 
+$T$ (e.g., 7), where $T$ is the system-wide maximum TTL of a message in hops. 
Therefore, only peers that 
 are TTL hops away from the query originator will forward the query or respond 
to the query. 
-In Gnutella network, search results are fast, because breadt-First traversal 
sends queries to 
-every possible neighbor. On the other hand, this method wastes resources and 
doesn't scale well.
-Figure \ref{fig:gnutella_query} shows the query lookup process of Gnutella 
network.
+In Gnutella network, search results are fast, because BFS sends queries to 
+every possible neighbor. Clearly, this method wastes resources and doesn't 
scale well.
+Figure \ref{fig:gnutella_query} shows the data lookup process of the Gnutella 
network.
 
 \begin{figure}
 \centering
@@ -271,16 +271,16 @@
 low, the query originator might not find the desired data even it's available 
somewhere
 in the network. Second, there are many duplicate messages generated by 
flooding, especially
 in high connectivity graphs. It is obvious that with these limitations, 
flooding creates 
-significant message processing overhead for each query. Furthermore, flooding 
may increase 
+significant message processing overhead for each data lookup. Even worse, 
flooding may increase 
 the load on participating to the point, where it has to leave the network. 
 
 Lately, there has been done lot of research to improve Gnutella's data lookup 
efficiency 
 and scalability. Adamic et. all \cite{adamic99small}, 
\cite{adamic02localsearch}, 
 \cite{adamic01powerlawsearch} has been studied different random walk methods 
in power-law 
 networks\footnote{In power-law networks only a few peers have high number of 
neighbor 
-links and major of peers have low nuber of neighbor links.} and they have 
found that by 
-instructing peers forwarding queries to select high degree peers the data 
lookup's 
-performance increases signficantly. As a result, some of the most recent 
loosely
+links and major of peers have low number of neighbor links.} and they have 
found that by 
+instructing peers forwarding data lookups to select high degree peers, the 
performance of data lookup 
+increases signficantly. As a result, some of the most recent loosely
 structured Peer-to-Peer systems have adopted this method with some 
modifications
 \cite{gnutella2url}, \cite{shareazaurl}, \cite{fasttrackurl}, 
\cite{morpheusurl}, 
 \cite{kazaaurl}, \cite{jxtaurl}, \cite{jxtaoverview}, 
\cite{botros01jxtasearch},
@@ -308,7 +308,8 @@
 
 Previously presented improvements are only partial solutions. Obviously, more
 research is required to make loosely structured approach's data lookup more
-scalable and effective. 
+scalable and effective. More advanced techniques to improve loosely strcutured
+systems' data lookup is presented in chapter 3.
 
 
 \subsection{Sketch of formal definition}
@@ -328,7 +329,8 @@
 
 \section{Tightly structured}
 
-In recents months, several tightly structured overlays has been proposed.
+Partly due to loosely structured systems' scalability problems, several 
tightly 
+structured overlays has been proposed.
 This list includes CAN \cite{ratnasamy01can}, Chord \cite{stoica01chord}, 
 Kademlia \cite{maymounkov02kademlia}, Kelips \cite{gupta03kelips}, 
 Koorde \cite{kaashoek03koorde}, ODHDHT \cite{naor03simpledht}, 
@@ -347,13 +349,22 @@
 value of $n$ varies among approaches. Again, CAN uses a $d$-dimensional 
cartesian 
 to implement identifier space.
 
+Stoica et al. \cite{balakrishanarticle03lookupp2p} have listed 
+four requirements for tightly structured overlays, which have to be 
+addressed in order to perform data lookups in tightly structured overlays. 
+First, mapping of keys to peers must be done in a load-balanced
+way. Second, the overlay must be able to forward a lookup for a 
+specific key to an approriate peer. Third, overlay must have a
+support for a distance function. Finally,  routing tables for each peer
+must be constructed and maintained adaptively.
+
 To store data into tightly structured overlay, each application-specific
 unique key (e.g., SHA-1 \cite{fips-sha-1}) is \emph{mapped} uniformly (e.g., 
using consistent
 hashing \cite{258660}) by the overlay to a existing peer in the overlay. Thus, 
tightly 
 structured overlay assigns a subset of all possible keys to every 
participating peer. 
-Furtermore, each peer in the structured overlay maintains a \emph{routing 
table}, which 
-consists of identifiers and IP addresses of other peers in the overlay. These 
are peer's 
-neighbors in the overlay network. Figure \ref{fig:structured_hashing} 
illustrates the 
+Specifically, each peer in tightly structured overlay maintains a 
\emph{routing table}, which 
+consists of identifiers and IP addresses of other peers in the overlay. 
Entries of routing
+table are peer's neighbors in the overlay network. Figure 
\ref{fig:structured_hashing} illustrates the 
 process of data to key mapping in tightly strucuted overlays.
 
 \begin{figure}
@@ -366,55 +377,48 @@
 All messages are routed across overlay links towards peers, whose
 peer identifier is gradually ''closer'' to the key's identifier
 in the identifier space. Distance can be measured by numerical
-difference between identifiers (e.g., Chord), the number of
-same prefix bits between identifiers (e.g., Pastry and Tapestry),
-bit-wise exclusive or (XOR) (e.g., Kademlia). However, in all
-previously schemes, each hop in the overlay shortens the path
+difference between identifiers (e.g., Chord \cite{stoica01chord}), the number 
of
+same prefix bits between identifiers (e.g., Pastry \cite{rowston01pastry} and 
Tapestry \cite{zhao01tapestry}),
+bit-wise exclusive or (XOR) (e.g., Kademlia \cite{maymounkov02kademlia}). 
However, in all
+previously schemes each hop in the overlay shortens the path
 between current peer working with query and the key which was
 looked up.
 
 Skip Graphs and Swan employ a key space very similar to a tightly structured
-overlay, but in which queries are routed  to \emph{identifiers}. In these 
systems
+overlay, but in which queries are routed  to \emph{keys}. In these systems
 peer occupies several positions in the identifier space, one for each 
 application-specific key. The indirection of placing close keys in the 
 custody of a storing peer\footnote{Storing peer is the peer in the overlay 
which stores the
 assigned keys.} keys is removed at the cost of each peer maintaining one 
-''resource node'' in the overlay network for each resource item pair it 
publishes.
+''resource peer'' in the overlay network for each resource item pair it 
publishes.
 
 PeerNet differs from other tightly structured overlays in that it operates
 at the \emph{network} level layer. Peernet makes an explicit distinction 
-between peer identity and address, which is supported by standard
-TCP/IP-algorithms. Otherwise, PeerNet has same performance properties
+between peer identity and address, which is not supported by standard
+TCP/IP-protocols. Otherwise, PeerNet has the same performance properties
 as other tightly structured overlays, i.e. $O(\log{n})$ space required
 for maintaining information about other peers in the system and 
-$O(\log{n})$ data lookup efficieny.
-
-Stoica et al. \cite{balakrishanarticle03lookupp2p} have listed 
-four requirements for tightly structured overlays, which have to be 
-addressed in order to perform data lookups in tightly structured overlays. 
-First, mapping of keys to peers must be done in a load-balanced
-way. Second, the overlay must be able to forward a lookup for a 
-specific key to an approriate peer. Third, overlay must have a
-support for a distance function. Finally,  routing tables for each peer
-must be constructed and maintained adaptively.
+$O(\log{n})$ data lookup efficiency.
 
 Currently, all proposed tightly structured overlays provide at least 
 poly--logaritmical data lookup operations. However, there are some key 
-differences in the data structure that they use as a routing table. For 
example, Chord, Skip graphs and 
-Skipnet maintain a local data structure which resembles skip lists 
\cite{78977}.
+differences in the data structure that they use as a routing table. For 
example, Chord 
+\cite{stoica01chord}, Skip graphs \cite{AspnesS2003} and Skipnet 
\cite{harvey03skipnet2} maintain a local 
+data structure which resembles Skip lists \cite{78977}.
 In figure \ref{fig:structured_query}, we present overview of Chord's lookup 
process.
 On the left side of Chord's lookup process, we show the same data lookup 
process
 as binary-tree abstraction.  We can notice, that in each step, the distance 
between
 the query originator and the target in both methods is halved. Thus, the 
 locarithmic efficiency. 
 
-Kademlia, Pastry and Tapestry uses balanced $k$-trees  
-as routing table's data structure. Figure \ref{fig:kademlia_lookup} shows the 
process of Kademlia
-data lookup. Viceroy maintains a butterfly data structure (see e.g., 
\cite{226658}), 
+Kademlia \cite{maymounkov02kademlia}, Pastry \cite{rowston01pastry} and 
Tapestry 
+\cite{zhao01tapestry} uses balanced $k$-trees as routing table's data 
structure. Figure 
+\ref{fig:kademlia_lookup} shows the process of Kademlia
+data lookup. Viceroy \cite{malkhi02viceroy} maintains a butterfly data 
structure (e.g., \cite{226658}), 
 which requires only constant number of neighbor peers while providing 
$O(\log{n})$ data lookup
-efficiency. Koorde, recent modification of Chord, uses de Bruijn graphs to 
maintain
-local routing tables. Koorde requires each peer to have only about two links 
to other
-peers to to provide $O(\log{n})$ performance.
+efficiency. Koorde \cite{kaashoek03koorde}, recent modification of Chord, uses 
de Bruijn graphs 
+\cite{debruijn46graph} to maintain local routing tables. Koorde requires each 
peer to have only 
+about two links to other peers to to provide $O(\log{n})$ performance.
 
 \begin{figure}
 \centering
@@ -449,9 +453,9 @@
 to nearest available peer, hosting a specific data item. This form of locality
 is not supported by DHT. Finally, tightly structured overlay can be used for
 scalable group multicast/anycast operations (CAST) (see e.g., 
\cite{zhuang01bayeux}).
-The basic operation are \texttt{join(groupIdentifier)}, 
\texttt{leave(groupIdentifier)},
+The basic operations are \texttt{join(groupIdentifier)}, 
\texttt{leave(groupIdentifier)},
 \texttt{multicast(message, groupIdentifier)},  \texttt{anycast(message, 
groupIdentifier)}.
-Participating peers may join and leave a group and send multicast messages to
+Participating peers may join and leave the group and send multicast messages to
 the group, or anycast message to a specific member of the group. DOLR's and 
CAST's
 have much in common.For instance, they both use network proximity techniques
 to optimize their operation in the overlay. Figure 
\ref{fig:Strucutred_lookup_using_DOLR_model}
@@ -489,9 +493,9 @@
 $ip = map(identifier(s))$, which maps data items, expressed by a identifier to 
coordinate 
 point $ip$ in $(IS,d)$. Peer's p resources are mapped onto a set $IS$ = \{$ip 
\in IS: 
 \exists s \in S$, $ip = map(identifier(s)) \wedge (provider(s) = p)$\}., 
-which means that resources that a peer provides into the system are not kept 
locally.
+which means that resources which peer provides into the system are not kept 
locally.
 Every $p$ has neighbor(s), named as $neighbor$, which are $P$ = \{$p \in P: 
\exists neighbor$, 
-where $difference(p,p_neighbor)= close$, and  $close$ is minimal difference 
$d$ in $(IS,d)$\}.
+where $difference(p,p_neighbor)= ''close''$, where  $''close''$ is small 
difference $d$ in $(IS,d)$\}.
 
 
 \section{Summary}
@@ -512,8 +516,8 @@
  
 Thus, there are significant differences between loosely structured and tightly 
structured approaches.
 The most important aspect is the performance and scalability. While loosely 
structured approach's performance
-is not always even linear, generally tightly structured approach can perform 
all operations in
-$\Theta(\log{n})$\footnote{However, it is unknown whether all proposed 
algorithms can preserve 
+is not always even linear, generally tightly structured approach can perform 
all internal operations in
+poly-logarithmic time\footnote{However, it is unknown whether all proposed 
algorithms can preserve 
 logarithmic properties in real-life applications or not.}. 
 
 Another key point is the philosophy how overlay network is constructed and 
maintained. While loosely 
@@ -521,10 +525,10 @@
 structured approach has certain features, in which participating peers have no 
control at all 
 (such as mapping of data items). 
 
-To end user, biggest difference between these systems is how data lookups are 
performed. Looselely
+To end user, biggest difference between these systems is how data lookups are 
performed. Loosely
 structured systems provides much more richier and user friendly way of 
searching data as they
-have support for keyword search and fuzzy search. On the other, tightly 
structured systems support
-only exact key lookups as each data item is identified by unique keys.
+have support for keyword search and fuzzy search. On the other hand, tightly 
structured systems support
+only exact key lookups as each data item is identified by globally unique keys.
 
 In the end, both systems have open problems and issues. We will discuss these 
aspects more detail in 
 chapter 3. Table \ref{table_comparison_approach} lists key differences between 
loosely structured 
@@ -638,15 +642,16 @@
 and their key properties with regard to performance and scalability. List 
 includes algorithms from two main approaches. However, majority of the 
algorithms 
 listed above belongs to tightly structured approach since there has been 
active 
-research being pursued lately. List doesn't include \emph{all} proposed 
Peer-to-Peer 
-systems, only the ones which already have been widely deployed in real-life, or
-the ones which may promising in the future's Peer-to-Peer systems.
+research being pursued towards tightly structured approach lately. List 
doesn't 
+include \emph{all} proposed Peer-to-Peer systems. Only the ones which already 
have 
+been widely deployed in real-life, or the ones which may promising in the 
future's 
+Peer-to-Peer systems, are included in this thesis.
  
 We decided to follow the guidelines from \cite{kaashoek03koorde} when
 measuring properties of different Peer-to-Peer systems. However, we dropped
 out fault tolerance and load balancing properties, since they are hard to 
measure
 in face of real life requirements. Additionally, we decided to include
-the number of real network connections for each peer in the overlay. 
+the number of \emph{real} network connections for each peer in the overlay. 
 
 Here, we describe the listed properties of Peer-to-Peer algorihms:
 
@@ -654,7 +659,7 @@
 \item \textbf{Lookup}: Number of messages required when a data lookup is 
performed
 \item \textbf{Space}: Number of neighbors which peers knows about (neighbors) 
 \item \textbf{Insert/delete}: Number of messages required when a peer joins or 
leaves the network
- \item \textbf{Number of network connections}: Number of concurrent 
\emph{network} connections required to maintain correct neighbor information
+ \item \textbf{Number of network connections}: Number of concurrent network 
connections required to maintain correct neighbor information
 \end{itemize}
 
 \scriptsize
@@ -852,13 +857,13 @@
 tables; we list description of the problem, solution and comments on that
 specific open problem. Note that open problems list considered here is not 
meant
 to be an exhaustive survey of \emph{all} open problems in Peer-to-Peer domain; 
-we focus our attention to security, scalability and performance related issues
+we focus our attention to security, scalability, usability and performance 
related issues
 only.
 
 \section{Overview}
 
 Partly due to the non-maturity of modern Peer-to-Peer technology, it has 
several
-open problems to be solved. Main open problems are related to performance, 
scalability
+open problems to be solved. Main open problems are related to performance, 
scalability, usability
 and security. More important, many techniques developed for traditional 
distributed
 systems may no longer apply with Peer-to-Peer systems. Therefore, new 
solutions are 
 needed to make Peer-to-Peer systems more secure and efficient.
@@ -867,8 +872,8 @@
 Since Napster \cite{napsterurl} and Gnutella \cite{gnutellaurl} was first time 
introduced 
 to public, researchers' main concern has been scalability problem of loosely 
structured 
 approach. However, people often misunderstand the scalability problem of 
loosely structured 
-approach; loosely structured approache's \emph{network} is scalable, but the 
\emph{query model} is not 
-scalable. Tightly structured approach's main concern is to make overlay's data 
lookup 
+approach; loosely structured systems' \emph{network} is scalable, but the 
\emph{query model} is not. 
+Tightly structured system's main concern is to make overlay's data lookup 
 routing more flexible againts hostile attacks. Another key problems in tightly 
structured 
 approach are the lack of keyword searches and support for heterogeneous peers.
 
@@ -889,8 +894,8 @@
 general Distrubuted Denial of Service attack. 
 
 In Sybil attack model, hostile entity presents multpile 
-entities. Therefore, one hostile entity can control a large fraction of the 
Peer-to-Peer system. The best
-possible solution to Sybil attack would be that system could \emph{distinct} 
entities reliably. Unfortunately,
+entities. Therefore, one hostile entity can control a large fraction of the 
Peer-to-Peer system. Optimal
+possible solution to Sybil attack would be that system could \emph{distinct} 
system's entities reliably. Unfortunately,
 currently there no realizable techiques for this task. Partial solutions for 
Sybil is attack is to replicate
 and fragment data randomly among several participating peer. However, both 
suggestions assume that two different 
 remote entities are actually different; Sybil attacks are still possible and 
therefore, would need centralized 
@@ -903,10 +908,7 @@
 troijan. Closey related to fail-stop model is the Byzantine attack model 
 \cite{357176}. Byzantine model can been seen more seveve than fail-stop model 
as there are no restrictions over 
 the behaviour of faulty peers. Partial, practical solution for byzantine 
failures has been proposed by Castro et 
-al \cite{296824}. General robustness properties of Peer-to-Peer system is able 
to deal with software failures and hostile
-attack, but redundancy againts external threats is unknown. The reason for 
this is that there are no experiences
-on these kinds of attacks. Possible solution would be distributed anti-virus 
software, but much more intensive
-research is required for solve these problems.
+al \cite{296824}. 
 
 Spam generating attack is another known attack model againts Peer-to-Peer 
system. In Spam
 attack, hostile or faulty peer may produce false information of the data. 
Possible solution againts this attack
@@ -926,7 +928,7 @@
 
 \subsection{Trust, data authenticity and integrity}
 
-Currently, trust in Peer-to-Peer systems is based on \emph{reputation}. 
Current repuation methods focus either
+Trust in Peer-to-Peer systems is based on \emph{reputation}. Proposed 
repuation methods focus either
 on the semantic properties, or data management properties of the trust model. 
Some research has been 
 done on reputation models in Peer-to-Peer systems, such as 
\cite{aberer01trust}, \cite{cornelli02reputableservents}. 
 Implementations include Advogato \cite{advogatourl}. None of the current 
proposals or implementations 
@@ -943,12 +945,12 @@
 
 ConChord \cite{ajmani02conchord} is the first Peer-to-Peer system which has a 
support for PKI based
 security infrastructure. Unfortunately, ConChord is in early in development 
and lacks of important
-features of PKI to be fully usable yet. Furthermore, the hierarchy of 
SDSI/SPKI may a problem for
-Peer-to-Peer systems, in which hierarchy is intentionally missing.
+features of PKI to be fully usable yet. Furthermore, the hierarchy of 
SDSI/SPKI \cite{rivest96sdsi}, 
+\cite{spkiworkinggroup} may a problem for Peer-to-Peer systems, in which 
hierarchy is intentionally missing.
 
-For data integrity, on the other hand, there are few working solutions. 
Cryptographic content hashes,
-such as \cite{fips-sha-1}, variations \cite{merkle87hashtree} and their 
implementation techniques \cite{mohr02thex},
-are efficient and reliable methods for identifying the integrity of data in 
Peer-to-Peer systems. One
+For data integrity, on the other hand, there are few working solutions. 
Cryptographic content hashes
+\cite{fips-sha-1}, variations \cite{merkle87hashtree} and their implementation 
techniques \cite{mohr02thex},
+are efficient and reliable method for identifying the integrity of data in 
Peer-to-Peer systems. One
 possible application of cryptographic content hashes may in peer identifier 
creation process, in which
 IP address of peer can be verified by the other peer. This is one form of 
\emph{self-certifying data}. 
 
@@ -974,9 +976,9 @@
 Peer-to-Peer system. Let's consider anonymity and efficient data lookup. In 
efficient lookup, we must know
 the peers responsible to given data in Peer-to-Peer system. Of course, when we 
know the peers responsible
 for the data, the anonymity of peer is lost. Fortunately, there are partial 
solutions to previously
-mentioned situations, i.e. pseudonym which is a partial form of anonymity. For 
instance, pseudonym can used for 
-addressing peer-anonymity by providing anonymous-like identifiers to peers 
(e.g., tightly structured peer 
-identifiers).
+mentioned situations, i.e. \emph{pseudonym} which is a partial form of 
anonymity. For instance, pseudonym can used for 
+addressing peer-anonymity by providing anonymous-like identifiers to peers 
(e.g., tightly structured system's 
+peer identifiers).
 
 Anonymity is widely used in those Peer-to-Peer system in which data 
publication and non-censorship are important properties
 of the system. These include
@@ -984,7 +986,7 @@
 Tangler \cite{502002} and upcoming Mnet \cite{mneturl}. Forwarding proxies are 
used in Freenet, Crowds and 
 Free Haven in order to provide various types of anonymity. Tangler and Publius 
uses cryptographic
 sharing methods to split a data into data fragments \cite{Shamir1979a}. 
Mixmailer networks, such as
-\cite{mixminionurl}, are commonly used in distributed systems, which are able 
to provide level
+\cite{mixminionurl}, are commonly used in distributed systems, which are able 
to provide some level
 of anonymity
 
 Even if many existing Peer-to-Peer systems are able to provide some of the 
types of anonymity, there is no
@@ -992,18 +994,17 @@
 between anonymity and other Peer-to-Peer system properties requires more 
research work.
 
 
-\subsection{Access Control}
+\subsection{Access control}
 
 Any distributed computing system must support different levels of access 
control. For instance, we may
-want to restrict the accessibility of data to only limited amount of 
participating peers. Currently, 
-Peer-to-Peer systems doesn't support working, trusted and  distributed access 
control scheme. Moreover, 
-there has been a lot of violation of copyright laws by users of Peer-to-Peer 
filesharing systems. As a consequence, some
-lawsuits has been created againts the companies how have build popular 
file-sharing programs.
+want to restrict the accessibility of data to only limited amount of 
participating peers. Peer-to-Peer 
+systems doesn't support working and distributed access control scheme. 
Moreover, 
+there has been a lot of violation of copyright laws by users of Peer-to-Peer 
filesharing systems. As a 
+consequence, some lawsuits has been created againts the companies how have 
build popular file-sharing programs.
 
-To our knowledge, Nejdl et al \cite{nejdl03accesscontrol} have proposed first 
practical solution to access 
+To our knowledge, Nejdl et al \cite{nejdl03accesscontrol} have proposed very 
recently first practical solution to access 
 control problem in Peer-to-Peer systems. They use RDF-based schema policies to 
restrict access to certain
-data. To be distributed system feasible, there must be way of control. 
Unfortunately, their solution
-works only in loosely structured systems.
+data. Unfortunately, their current prototype works only in loosely structured 
systems.
 
 
 \subsection{Hostile entities}
@@ -1012,13 +1013,13 @@
 Possible solutions include self-monitoring systems \cite{zhang03somo}, 
maintaining system invariants as
 proposed in \cite{sit02securitycons}, distributed and secure peer identifier 
assignment 
 \cite{castro02securerouting}, \cite{clarke00freenet} and self-certifying data 
using cryptographic 
-content hashes (e.g., SHA-1). Identification of hostile entities is essential 
in tightly structured 
+content hashes (e.g., SHA-1 \cite{fips-sha-1}). Identification of hostile 
entities is essential in tightly structured 
 approach, in which fundamental (and implicit) assumption is that there is 
random, uniform distribution 
 of peer identifiers that cannot be controlled by hostile entity.
 
 Of course centralized authorities could be used for assignment of peer 
identifiers, but they have 
 property of single point of failure. Furthermore, distributed peer 
identification assignment can
-be problematic as long as Sybil attack remains unsolved. However, there are 
some partial solutions
+be problematic as long as Sybil attack \cite{douceur02sybil} remains unsolved. 
However, there are some partial solutions
 for controlling the rate at which and hostile entity is able to obtain peer 
identifier, such as crypto-based
 puzzles \cite{juels99clientpuzzles}.
 
@@ -1026,7 +1027,7 @@
 efficient way. More research is required to solve this problems.
 
 
-\subsection{Secure Query Routing}
+\subsection{Secure query routing}
 
 Much work has been done on secure routing, especially in tightly structured 
systems. In 
 \cite{castro02securitystructured} and \cite{castro02securerouting}, authors 
suggests the usage
@@ -1036,8 +1037,8 @@
 correct peers, when a fraction $f$ of the other peers are faulty or hostile, 
is only $(1-f)^{h-1}$.
 
 Sit and Morris \cite{sit02securitycons} discuss the possibility of allowing 
query originator 
-to observe lookup progress and cross-check routing tables using random 
queries. However, Sit and
-Morris approach is not very efficient, since proposals create a lot of 
additional network traffic when
+to observe lookup progress and cross-check routing tables using random 
queries. However, their
+ approach is not very efficient, since proposals create a lot of additional 
network traffic when
 in function.
 
 Additionally, Lynch et al. \cite{lynch02atomicdataaccess} propose a solution 
to secure routing table 
@@ -1048,25 +1049,30 @@
 
 Aspnes et al in \cite{aspnes02faultrouting} and Kaashoek et all in 
\cite{kaashoek03koorde} formally 
 prove the lower and upper bounds for space requirements of locating a specific 
date item in 
-Peer-to-Peer system. They show that to provide high degree of fault tolerance 
efficiency, a peer 
-must maintain $O(\log{n})$ neighbors. In addition, most existing
+Peer-to-Peer system. They show that to provide high degree of fault tolerance 
and efficiency, each 
+participating peer must maintain $O(\log{n})$ neighbors. 
 
 Fiat et al in \cite{fiat02censorship}, \cite{saia02dynamicfaultcontentnetwork} 
and Datar in \cite{datar02butterflies}  
 describe tightly structured overlay with analytical results in the presence of 
hostile entities. However,
 none of these proposals doesn't address an efficient, dynamic tightly 
structured overlay and multiple rounds
-of hostile attack. Indeed, above mentioned solutions are not very efficient. 
In Fiat et al, each node 
-must maintain information of $O(\log^3{n})$ other peers, and in Datar 
$O(\log^3{n})$ is required.
+of hostile attack. Also, above mentioned propsals are not very efficient. In 
\cite{fiat02censorship}, each node 
+must maintain information of $O(\log^3{n})$ other peers, and in 
\cite{datar02butterflies}, $O(\log^2{n})$ is required.
 
 Finally, Ratnasamy and Gavoille \cite{ratnasamy02routing}, 
\cite{gavoille01routing} list several open problems
-regarding routing in distributed networks. Obviously, more research is 
required for make secure
-routing possible in Peer-to-Peer networks.
+regarding routing in distributed networks. Obviously, more research is 
required in order to provde secure
+data lookup routing possible in Peer-to-Peer networks.
 
 
-\subsection{Other Security threats}
+\subsection{Other security threats}
 
 Ross Lee graham lists several external threats againts Peer-to-Peer networks 
\cite{grahamp2psecurity}. The list 
 includes viruses, trojans and bugs in Peer-to-Peer software. Currently, there 
are not even partial solutions
-to the problems mentioned above.
+to the problems mentioned above. General robustness properties of Peer-to-Peer 
system is able to 
+deal with software failures and hostile attack, but redundancy againts 
external threats is unknown. 
+The reason for this is that there are no experiences on these kinds of 
attacks. Possible solution 
+would be distributed anti-virus software, but much more intensive research is 
required for solve these problems.
+
+
 
 
 \section{Performance and usability problems in Peer-to-Peer}
@@ -1089,13 +1095,13 @@
 is similar to iterative deepening techique. With these techniques, search 
 may not be fast when desired data item requires many consecutive flooding 
rounds.
 
-Directed breadt-first search \cite{yang02improvingsearch} optimizes the 
original 
-breadt-first search in way that peer selects neighbors with many quality 
results
+Directed BFS \cite{yang02improvingsearch} optimizes the original 
+BFS in way that peer selects neighbors with many quality results
 may be reached, thereby maintaining the the quality of costs and decreasing 
the amount
 of messages sent to network. Alpine \cite{alpineurl} and NeuroGrid 
\cite{joseph02neurogrid} 
-Peer-to-Peer system use somewhat similar method when performing data lookups.
+are Peer-to-Peer system use somewhat similar method when performing data 
lookups.
 
-Local indices \cite{yang02improvingsearch} in one variation of active caching. 
+Local indices \cite{yang02improvingsearch} is one variation of active caching. 
 In this scheme, each peer maintains an index over the data of all nodes within 
 $h$ hops of itself, where $h$ is a system-wide variable, called radius of the
 index\footnote{In normal BFS case, the value of $h$ is 0, as peer only has 
index
@@ -1117,9 +1123,10 @@
 
 Since tightly structured systems have efficient data lookup at the application 
level overlay,
 current research efforts are focused on proximity based data lookup. In 
proximity based data lookup, 
-peers try to choose routing-tables refering to other peers that are 
\emph{nearby} in the 
+peers try to choose routing-tables entries refering to other peers that are 
\emph{nearby} in the 
 underlying network. In this way, tightly structured systems are able to 
decrease actual 
-lookup \emph{latency}. CAN, Kademlia, Pastry and Tapestry have a advanced 
heuristics for 
+lookup \emph{latency}. CAN \cite{ratnasamy01can}, Kademlia 
\cite{maymounkov02kademlia}, 
+Pastry \cite{rowston01pastry} and Tapestry \cite{zhao01tapestry} have a 
advanced heuristics for
 proximity based routing. Additionally, most recent version of Chord uses 
proximity based 
 routing inspired by Karger and Ruhl \cite{karger02findingnearest}. Skipnet 
\cite{harvey03skipnet1} 
 uses combination of proximity and application level overlay routing when 
performing data 
@@ -1133,22 +1140,22 @@
 
 \subsection{Fast and usable search}
 
-To make Peer-to-Peer systems usable in a large, these systems have to support 
flexible, efficient
+To make Peer-to-Peer systems even more popular (and usable), these systems 
have to support flexible, efficient
 and easy to use search methods. For instance, Internet's perhaps the most 
important feature
 is the ability to perform keyword or fuzzy searches (e.g., Google). Currently, 
only loosely
 structured systems are able carry out this requirement. Unfortunately, as 
discussed in this text,
-the data loouk model of loosely structured approach is not scalable. Thus, 
research efforts have
+the data lookup model of loosely structured approach is not scalable. Thus, 
research efforts have
 been focused on tightly structured approach. 
 The main in problem with tightly structured approach is the fact that tightly 
structured algorihms
-performs data lookups based on a unique identifier. However, quite recently 
have been studies
+performs data lookups based on a globally unique identifier. Quite recent 
study has been focused 
 on the feasibility of Peer-to-Peer Web-like indexing and searching 
\cite{li03feasibility}. Authors
 argue, that it is possible to implement Peer-to-Peer Web-like search with 
certain radical compromises. 
 First, Peer-to-Peer search enginge may need to decrease result quality in 
order make searching more
 efficient. Second, Peer-to-Peer systems must observe better the properties of 
underlying network for
 better performance. 
 
-Some studies have been concentraded on SQL-like queries \cite{harren02complex},
-in tightly structured overlays. Another approaches include adapting loosely 
structured approache's
+Some studies have been concentraded on SQL-like queries \cite{harren02complex}
+in tightly structured overlays. Another approaches includes adapting loosely 
structured approache's
 data lookup model into tightly structured systems 
\cite{ansaryefficientbroadcast03}, \cite{chord:om_p-meng}.
 Additional studies include additional layer upon overlay network 
\cite{kronfol02fasdsearch}, 
 \cite{joseph02p2players} and range queries \cite{andrzejak02rangequeries}.
@@ -1157,9 +1164,9 @@
 studies queries follow Zipf-like distributions \cite{breslau98implications} 
caching and precomputation
 can be done for optimizting search indices \cite{li03feasibility}. Regular 
compression algorithms,
 Bloom filters \cite{362692}, vector space models 
\cite{CuencaAcuna2002DSIWorkshop} and view 
-trees \cite{Bhattacharjee03resultcache} can be used for even better 
optimizations. In addition, authors 
+trees \cite{Bhattacharjee03resultcache} can be used for even better 
optimizations. Authors 
 in \cite{li03feasibility} use Gap compression \cite{wittengigabytes}, Adaptive 
Set Intersection \cite{338634}  
-and Clustering with their search optimizations.
+and clustering with their search optimizations.
 
 While it is expected that web-like searches can be layered on top of tightly 
structured overlay, much
 more research is required to make indexing and searching more efficient.
@@ -1176,18 +1183,18 @@
 Peer-to-Peer system is \emph{never} in ''ideal'' state as it is always 
evolving system.
 
 Current research has been focused on tightly structured systems' system 
management, since all presented
-algorithms have been analyzed under static simulation environments. 
Furthermore, propsed tightly structured
+tightly structured approache's algorithms have been analyzed under static 
simulation environments. Furthermore, propsed tightly structured
 overlays are configured statically to achieve the desired reliability even in 
uncommon and adverse environment
 \cite{rowston03controlloingreliability}. The most important factor for
 future research is to get real-life experiences from tightly structured 
system, when there are frequent
 joins and leaves in the system. Some research has been done already in this 
area. 
 
 A concept of ''half-life'' was introduced by Liben-Nowell 
\cite{libennowell01observations}. Half-life is defined
-as follows: let there be N live nodes at time t. The doubling from time t is 
the time that pass before
-N new additional nodes arrive into the system. The halving time from time t is 
the time
-requires for half of the living nodes at time t to leave the system. The 
half-life from 
-time t is smaller of the properties stated above. The half-life of the entire 
system is the 
-minimum half-life overl all times t. Concept of half-time can be used as basic 
for developing
+as follows: let there be $N$ live nodes at time $t$. The doubling from time 
$t$ is the time that pass before
+$N$ new additional nodes arrive into the system. The halving time from time 
$t$ is the time
+requires for half of the living nodes at time $t$ to leave the system. The 
half-life from 
+time $t$ is smaller of the properties stated above. The half-life of the 
entire system is the 
+minimum half-life over all times $t$. Concept of half-time can be used as 
basic for developing
 more efficient analytical tools for modelling complex Peer-to-Peer system. 
 
 Some research has been done with regard to load balancing properties of 
tightly structured
@@ -1224,7 +1231,8 @@
 describe a arbitrary data structure on top of tightly structured overlay 
\cite{zhang03somo}. They
 call their proposal as \emph{data overlay}, since it support several 
fundamental data structures.
 Authors use this data overlay to build Self-Organized Metadata Overlay (SOMO), 
which can be used
-for monitoring health of tightly structured overlay. 
+for monitoring health of tightly structured overlay. Fault tolerance of SOMO 
itself is currently
+unknown. 
 
 
 \section{Miscellaneous problems in Peer-to-Peer}
@@ -1243,26 +1251,26 @@
 
 \subsection{Social behaviour}
 
-Very frequent assumption in Peer-to-Peer systems is that peers are willing to 
cooperate. Another belief 
+Frequent assumption in Peer-to-Peer systems is that peers are willing to 
cooperate. Another belief 
 is that all peers would behave equally, i.e. all peers both consume resources 
and contributes resources.
 However, these assumptions are not true as several studies show peers rather 
consume than contribute and
 and peers are unwilling to cooperate \cite{saroiu02measurementstudyp2p}, 
\cite{oram01harnessingpower}, 
 \cite{hearn02mojonation}. 
 
-Somewhat surprisingly little research has been in this area, especiaclly when 
considering 
+Somewhat surprisingly little research has been in this area, especially when 
considering 
 the possible impact of this \emph{unwanted socical behaviour} to performance 
of Peer-to-Peer 
-system. However, problem is addressed by Golle et al. 
\cite{golle01incentivesp2p}. Some 
+system. Problem is addressed by Golle et al. \cite{golle01incentivesp2p}. Some 
 research has been focused on semantic properties of the overlay in order to 
increase 
 cooperation among participating peers \cite{crespo02semanticoverlay}. 
Ramanathan et al. 
 \cite{ramanathan02goodpeers} and Bernstein et al. \cite{bernstein03selection} 
use 
 empirical metrics and decision trees when teaching peers to make better 
decisions
-when contacting other peers in Peer-to-Peer system. Alpine is an example of
+when contacting other peers in Peer-to-Peer system. Alpine \cite{alpineurl} is 
an example of
 Peer-to-Peer system, which uses empirical metrics for peer selection.
 
 
 \subsection{Simulating the Peer-to-Peer system}
 
-Very little research has been done on simulating the global Peer-to-Peer 
system. Presumably, this
+Very little research has been done on simulating the \emph{global} 
Peer-to-Peer system. Presumably, this
 is due to complex nature of Peer-to-Peer system, which makes comprehensive 
simulations very 
 diffucult. Floyd et al. has been studying the simulation of the Internet in 
\cite{504642}. Authors
 state that simulating the Internet is very challenging task, because of 
Internet's heterogeneity
@@ -1271,7 +1279,8 @@
 
 As long as global simulations of Peer-to-Peer systems are lacking, we cannot 
we make any further
 analysis e.g., on usage patterns in Peer-to-Peer systems. Presumedly, however, 
we can assume that
-queries follow the Zipf-like distributions both in the Internet and in 
Peer-to-Peer systems.
+, e.g., query keywords follow the Zipf-like distributions 
\cite{breslau98implications} both in the 
+Internet and in Peer-to-Peer systems.
 
 \section{Summary}
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]