[Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...

gzz-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...

From:	Hermanni Hyytiälä
Subject:	[Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...
Date:	Wed, 26 Mar 2003 10:00:23 -0500
CVSROOT:        /cvsroot/gzz
Module name:    gzz
Changes by:     Hermanni Hyytiälä <address@hidden>      03/03/26 10:00:22

Modified files:
        Documentation/misc/hemppah-progradu: masterthesis.tex 

Log message:
        "This is a never ending story...didiidii...."

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/masterthesis.tex.diff?tr1=1.196&tr2=1.197&r1=text&r2=text

Patches:
Index: gzz/Documentation/misc/hemppah-progradu/masterthesis.tex
diff -u gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.196 
gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.197
--- gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.196      Wed Mar 
26 07:49:20 2003
+++ gzz/Documentation/misc/hemppah-progradu/masterthesis.tex    Wed Mar 26 
10:00:22 2003
@@ -177,7 +177,7 @@
 In the loosely structured approach the construction and the maintenance of the 
overlay is controlled 
 loosely. The placement of services and the topology of overlay is random. The 
data lookup model in loosely structured systems is
 not very efficient, because of unstructured properties of the overlay. Data 
lookup model is a combination of methods which 
-are used for finding data from the overlay.  
+are used for locatin data in the overlay.  
 
 \subsection{Definition}
 
@@ -893,7 +893,7 @@
 
 According to \cite{aberer01trust}, mutual trust ''...allows agents to 
cooperate in a game-theoretic situation that corresponds 
 to the repeated prisoners dilemma and leads in the long term to an increased 
aggregated utility for the participating agents''. 
-They define \emph{trust management} as a mechanism that allows to establish 
mutual trust. Furthermore, \emph{reputation} is a measure
+The authors of \cite{aberer01trust} define \emph{trust management} as a 
mechanism that allows to establish mutual trust. Furthermore, \emph{reputation} 
is a measure
 that is derived from knowledge on interactions in the past 
\cite{aberer01trust}. In this subsection, we discuss mechanisms to maintain
 trust in Peer-to-Peer systems.
 
@@ -922,14 +922,14 @@
 \subsection{Anonymity}
 
 According to \cite{dingledine00free}, there exist several kinds of anonymity: 
author-anonymity, 
-publisher-anonymity, reader-anonymity, peer-anonymity and query-anonymity. 
Author-anonymity is a form
-of anonymity in which no one can link the author (who created the document) to 
a document. 
-In publisher-anonymity system, no one is able to determine the publisher (how 
published the document into
+publisher-anonymity, reader-anonymity, peer-anonymity and query-ano-nymity. 
Author-anonymity is a form
+of anonymity in which no one can link author (who created the document) to a 
document. 
+Publisher-anonymity means that no one is able to determine the publisher (how 
published the document into
 the system) of a document. Reader-anonymity means that a document cannot be 
linked to its readers.
 With peer-anonymity, no one is able to determine the peer, where the document 
was originally published.
-Document-anonymity means that a peer doesn't know which data it is currently 
hosting. Query-anonymity is a form
+Document-anonymity means that a peer doesn't know which data it is currently 
hosting. Finally, query-anonymity is a form
 of document-anonymity; when other peers perform data lookups, a peer doesn't 
know which data it serves
-to the data lookup originators. As the authors cite in 
\cite{dingledine00free}, some forms of anonymity 
+to the data lookup originators. As the authors of \cite{dingledine00free} 
cite, some forms of anonymity 
 may imply each other and possible issues raised by this property is one area 
of future work.
 
 Obviously, existance of several types of anonymity often conflicts with other 
key properties of
@@ -940,14 +940,14 @@
 For instance, pseudonymity can be used for addressing peer-anonymity by 
providing anonymous-like identifiers to 
 peers (e.g., peer identifiers of a tightly structured system).
 
-Anonymity is widely used in a Peer-to-Peer system in which data publication 
and non-censorship are important. These include
-Forwarding proxies are used in Freenet \cite{clarke00freenet}, Crowds 
\cite{reiter98crowds} and Free Haven \cite{dingledine00free}
+Anonymity is widely used in a Peer-to-Peer system in which data publication 
and non-censorship are important. Forwarding 
+proxies are used in Freenet \cite{clarke00freenet}, Crowds 
\cite{reiter98crowds} and Free Haven \cite{dingledine00free}
 in order to provide various types of anonymity. Tangler \cite{502002} and 
Publius \cite{pub00} use cryptographic sharing methods 
 to split data into fragments \cite{Shamir1979a}. Mix mailer networks (e.g., 
\cite{mixminionurl}) are commonly used in 
 distributed systems which are able to provide some level of anonymity (e.g., 
\cite{mneturl}).
 
 Even if many existing Peer-to-Peer systems are able to provide some of the 
types of anonymity, there is no
-such a system which is able to provide complete anonymity. Specifically, the 
conflicts
+such a system which is able to provide complete anonymity in all levels (see 
above). Specifically, the conflicts
 between anonymity and other properties of Peer-to-Peer system require more 
research work.
 
 
@@ -964,21 +964,25 @@
 schema policies to restrict access to certain data. Their current early 
prototype version only works in 
 loosely structured systems.
 
+More research work is required in the development of \emph{working} access 
control scheme for Peer-to-Peer
+systems.
+
 
 \subsection{Hostile entities}
 
 One serious problem in Peer-to-Peer systems is the inability to distinguish 
hostile entities from regular entities
-trustworthy.
+trustworthy. Identification of hostile entities is essential in the tightly 
structured 
+approach, in which the fundamental (and implicit) assumption is that there is 
a random, uniform distribution 
+of peer identifiers that cannot be controlled by a hostile entity.
+
 One possible solution is to use a self-monitoring system, such as SOMO 
\cite{zhang03somo}, in which a self-monitoring overlay
 constantly analyses the Peer-to-Peer overlay. Self-monitoring overlay is built 
on top of Peer-to-Peer overlay. Authors in
 \cite{sit02securitycons} suggest the use of system invariants. They emphasize 
that system invariants should be veriable, and if
 system invariants fail the system must have a recovery mechanism. In 
distributed peer identifier assignment \cite{castro02securerouting, 
clarke00freenet}, 
-multiple participating peers participate in a creation of peer identifier. 
Identification of hostile entities is essential in the tightly structured 
-approach, in which the fundamental (and implicit) assumption is that there is 
a random, uniform distribution 
-of peer identifiers that cannot be controlled by a hostile entity.
+multiple participating peers participate in a creation of peer identifier. 
 
-Naturally, centralized authorities could be used for the assignment of peer 
identifiers, but they may not be suitable
-for ad hoc Peer-to-Peer environrment and have property of single point of 
failure. Moreover, distributed peer 
+Centralized authorities could be used for the assignment of peer identifiers, 
but they may not be suitable
+for ad hoc Peer-to-Peer environrment and have property of single point of 
failure. Distributed peer 
 identification assignment can be problematic as long as the Sybil attack 
\cite{douceur02sybil} remains unsolved. 
 However, there are some partial solutions for controlling the \emph{rate} at 
which hostile entity is able to obtain peer 
 identifier, such as crypto-based puzzles \cite{juels99clientpuzzles}.
@@ -989,18 +993,14 @@
 is able to deliver a network message thoughout the overlay to a correct 
destination efficiently. 
 
 Aspnes et al. in \cite{aspnes02faultrouting} and Kaashoek et al. in 
\cite{kaashoek03koorde} formally 
-prove the lower and upper bounds for the space requirements of locating a 
specific data item reliable in a 
+prove the lower and upper bounds for the space requirements of locating a 
specific data item reliably in a 
 Peer-to-Peer system. They show that to provide high degree of fault tolerance 
and efficiency in the system, each 
-participating peer must maintain average of $O(\log{n})$ neighbors. Fiat et 
al. in \cite{fiat02censorship, saia02dynamicfaultcontentnetwork} 
-and Datar in \cite{datar02butterflies} propose a tightly structured overlay 
with analytical results in the 
-presence of hostile entities. However, none of these proposals address a 
dynamic tightly structured 
-overlay with fault tolerance against multiple rounds
-of hostile attacks. Also, above mentioned proposals are not very efficient. In 
\cite{fiat02censorship}, each peer 
-must maintain information of $O(\log^3{n})$ other peers, and in 
\cite{datar02butterflies}, $O(\log^2{n})$ is required.  
+participating peer must maintain average of $O(\log{n})$ neighbors. 
+  
 
 Authors argue in \cite{castro02securitystructured} that with the combination 
of 
 secure peer identifer assignment, secure routing table maintenance and secure 
message forwarding
-the secure routing in tightly structured systems is possible. Additionally, 
authors cite in \cite{castro02securerouting} 
+secure query routing in tightly structured systems is possible. Additionally, 
authors cite in \cite{castro02securerouting} 
 that the probability of routing successfully between to arbitrary 
 correct peers is $(1-f)^{h-1}$, when a fraction $f$ of the other peers are 
faulty or hostile and where
 $h$ is the number of hops in the overlay. Sit and Morris 
\cite{sit02securitycons} discuss the possibility of 
@@ -1011,11 +1011,15 @@
 Lynch et al. \cite{lynch02atomicdataaccess} propose a solution for secure 
routing table 
 maintenance, but their solution seems to have two major problems according to 
\cite{castro02securitystructured}. 
 First, the solution is very expensive even without faulty or hostile entities. 
Second, each group of replicas
-in their solution must have less than 1/3 of its peers faulty. Thus, this 
feature results in a low
+in their solution must have less than 1/3 of its peers faulty. This feature 
results in a low
 probability of successful routing.
 
-Finally, Gavoille \cite{gavoille01routing} lists open problems in general 
distributed systems 
-(not only in Peer-to-Peer domain).
+Fiat et al. in \cite{fiat02censorship, saia02dynamicfaultcontentnetwork} 
+and Datar in \cite{datar02butterflies} propose a tightly structured overlay 
with secure query routing in the 
+presence of hostile entities. However, none of these proposals address a 
dynamic tightly structured 
+overlay with fault tolerance against multiple rounds
+of hostile attacks. Also, above mentioned proposals are not very efficient. In 
\cite{fiat02censorship}, each peer 
+must maintain information of $O(\log^3{n})$ other peers, and in 
\cite{datar02butterflies}, $O(\log^2{n})$ is required.
 
 \subsection{Other security threats}
 
@@ -1027,7 +1031,8 @@
 
 \subsection{Summary}
 
-In this subsection we list security problems in Peer-to-Peer systems in the 
table.
+In this subsection we list security problems in Peer-to-Peer research. For 
each table entry,
+there is a brief description of the problem, possible solutions and comments.
 
 
 \scriptsize
@@ -1157,12 +1162,11 @@
 \subsection{Efficient data lookup}
 
 The most intensive research in Peer-to-Peer domain has been focused on 
efficient data lookup methods,
-especially with the loosely structured approach. In iterative deepening 
-\cite{yang02improvingsearch}, multiple BFS searches are initiated
-with successively larger TTL depth limits, until either the query is 
satisfied, 
-or the maximum depth $D$ has been reached. 
+especially with the loosely structured approach. 
 
-Expanding ring, proposed by Shenker et al. in \cite{lv02searchreplication}, 
+In iterative deepening \cite{yang02improvingsearch}, multiple BFS searches are 
initiated
+with successively larger TTL depth limits, until either the query is 
satisfied, 
+or the maximum depth $D$ has been reached. Expanding ring, proposed by Shenker 
et al. in \cite{lv02searchreplication}, 
 is similar to the iterative deepening technique. In this method, a peer starts 
a flood with small TTL, and 
 waits to see if the search is successful. If it is, then the peer stops the 
data lookuo. Otherwise, the peer increases 
 the TTL and starts another data lookup. With these techniques, searches 
@@ -1186,8 +1190,9 @@
 has a poor response time but it doesn't generate as much network traffic as 
 the original BFS. As suggested in \cite{lv02searchreplication}, the
 random walk approach can be made more effective by introducing
-multiple simultaneously working ''walkers''. Freenet \cite{clarke00freenet} 
uses 
-random walk searches in data lookups. Freenet's data lookup model resembles
+multiple simultaneously working ''walkers''. 
+
+Freenet \cite{clarke00freenet} uses random walk searches in data lookups. 
Freenet's data lookup model resembles
 Depth-First-Search (DFS) and peers' routing tables are dynamically built
 using caching. This is an outcome of Freenet's main design principles, 
anonymity. 
 Another property of the Freenet's data lookup model is that
@@ -1211,19 +1216,17 @@
 To make Peer-to-Peer systems even more popular (and usable), these systems 
have to support flexible, efficient
 and simple search methods \cite{li03feasibility}. Currently, loosely 
structured systems are able to carry out the flexibility and
 simplicity requirements and tightly structured systems are able to fulfill the 
efficiency requirement. 
-Research efforts have been focused to make security methods in tightly 
structured systems more usable since
-the data lookup model of the loosely structured approach doesn't scale. 
However, studies
-show that combining flexible search methods and tightly structured systems may 
not be trivial \cite{harren02complex,
-ansaryefficientbroadcast03}.  The main problem with tightly structured systems 
is the 
-fact that tightly structured algorithms perform data lookups based on a 
globally unique identifier thereby
-making efficient keyword searches hard to implement. 
+Research efforts have been focused to make security methods in tightly 
structured systems more usable and flexible. However, studies
+show that combining flexible search methods and tightly structured systems may 
not be trivial: tightly 
+structured algorithms perform data lookups based on a globally unique 
identifier thereby making efficient keyword searches hard to implement
+\cite{harren02complex, ansaryefficientbroadcast03}. 
 
 Some studies have been concentrated on SQL-like queries \cite{harren02complex}
 in tightly structured overlays. It is unknown, however, if this approach is 
realizable to implement, since
 initial analysis have shown that this approach is rather complex. Other 
approaches include adaption of the data lookup model of the loosely 
-structured approach into tightly structured systems 
\cite{ansaryefficientbroadcast03, chord:om_p-meng}.
-Work in \cite{ansaryefficientbroadcast03} seems quite promising. Authors' work 
is based on insight that
-performing data lookup in the overlay resembles regular tree-like search, 
where trees' data structure
+structured approach into tightly structured systems 
\cite{ansaryefficientbroadcast03}.
+Authors' work is based on insight that
+performing a data lookup in the overlay resembles regular tree-like search, 
where trees' data structure
 is distributed throughout the overlay. Some studies suggest additional layer 
upon overlay network \cite{kronfol02fasdsearch, 
 joseph02p2players}, which use metadata to implement search methods. The 
feasibility of implementing additional 
 search layer on top of the network layer is questionable, especially if the 
search layer and the network
@@ -1239,14 +1242,14 @@
 decrease the result quality in order to make searching more efficient. Second, 
Peer-to-Peer systems must 
 consult the properties of underlying network for better performance.
 
-Many techniques have been developed in order to provide more efficient search 
indexing. As 
+Additionally, many techniques have been developed in order to provide more 
efficient search indexing. As 
 several studies show, the popularity of queries in the Internet follow 
Zipf-like 
 distributions\footnote{Zipf-distribution is a variant of power-law function. 
 Zipf-distribution can be used in observation of frequency of occurrence event 
$E$, as a function of the rank 
 $i$ when the rank is determined by the frequency of occurrence, is a power-law 
function $E_i \sim \frac{1}{i^{a}}$, 
 where the exponent $a$ is close to unity.} (e.g., 
\cite{breslau98implications}).
 Therefore, according to \cite{li03feasibility}, caching and pre-computation 
can be done for optimizing search indices. 
-Authors in \cite{li03feasibility} use Gap compression \cite{wittengigabytes}, 
Adaptive Set Intersection \cite{338634}  
+Authors in \cite{li03feasibility} use gap compression \cite{wittengigabytes}, 
adaptive set intersections \cite{338634}  
 and clustering with their search optimizations. Regular compression 
algorithms, Bloom filters \cite{362692}, vector 
 space models \cite{CuencaAcuna2002DSIWorkshop} and view trees 
\cite{Bhattacharjee03resultcache} can be used for even 
 better optimizations. 
@@ -1262,7 +1265,9 @@
 structured system, peers join and leave the system constantly without any 
restrictions \cite{saroiu02measurementstudyp2p}. 
 On the other hand, however, peers in tightly structured system join and leave 
the system but have less freedom, 
 i.e., overlay chooses peer's neighbors on behalf of the peer itself and maps 
data items randomly 
-throughout the overlay network \cite{balakrishanarticle03lookupp2p}. Almost 
all presented algorithms 
+throughout the overlay network \cite{balakrishanarticle03lookupp2p}. 
+
+Almost all presented algorithms 
 for the tightly structured systems have been analyzed under static simulation 
 environments \cite{libennowell01observations}. Furthermore, proposed tightly 
structured overlays are configured statically to achieve 
 the desired reliability even in a uncommon and adverse environment 
\cite{rowston03controlloingreliability}. 
@@ -1274,9 +1279,7 @@
 computing power or network bandwidth, all data items are distributed 
uniformly. Clearly, this is
 a serious problem of tightly structured overlays in face of performance and 
load balancing \cite{rao03loadbalancing}. 
 Measurement study by Saroiu et al. show that there is a extreme heterogeneity 
among participating peers in already deployed Peer-to-Peer 
-systems \cite{saroiu02measurementstudyp2p}. Symphony \cite{gurmeet03symphony} 
seems to be the first tightly structured overlay system 
-which supports heterogeneity. Zhao et al. have proposed a secondary layer on 
top of a structured overlay
-to support heterogeneity better \cite{zhao02brocade}. 
+systems \cite{saroiu02measurementstudyp2p}. 
 
 Some research has been done with regard to load balancing properties of 
tightly structured
 overlays. Byers et al. suggest an idea of ''power of two choices'' whereby 
data item is stored at the less loaded 
@@ -1296,10 +1299,9 @@
 Hot spots happen, when a specific key is being requested extremely often in 
tightly structured overlays. Recent study 
 by Freedman et al. tries to reduce hot spots in the system by performing 
\emph{sloppy} hashing 
 \cite{sloppy:iptps03}. Authors' technique is especially suitable for the DOLR 
abstraction of tightly structured overlays. 
-They arque that with Sloppy hashing, the generation of query hot spots can be 
reduces and peers are able 
+They arque that with Sloppy hashing, the generation of query hot spots can be 
reduced and peers are able 
 locate nearby data without looking up data from distant peers. Moreover, 
authors' 
-proposal for self-organizing clusters using network diameters may be useful, 
-especially within small groups of working people.
+proposal for self-organizing clusters using network diameters. 
 
 The concept of ''half-life'' was introduced by Liben-Nowell 
\cite{libennowell01observations} since Peer-to-Peer 
 system is \emph{never} in the ''ideal'' state as Peer-to-Peer system is 
continiously evolving system. Half-life is defined
@@ -1319,7 +1321,8 @@
 
 \subsection{Summary}
 
-In this subsection we list performance and usability problems in Peer-to-Peer 
systems in the table.
+In this subsection we list performance and usability problems in Peer-to-Peer 
research. For each table entry,
+there is a brief description of the problem, possible solutions and comments.
 
 
 \scriptsize
@@ -1466,7 +1469,7 @@
 Recently, there have been few proposals towards common programming guidelines. 
Authors in 
 \cite{zhao03api} propose a higher level abstracions for tightly structured 
overlays. Frise et al. suggest the use of
 additional layer in Peer-to-Peer system to hide the structure of the overlay 
\cite{frise02p2pframework}. 
-Thus, with their method both the tightly structured and tightly structured 
approach can be used in the system.
+With their abstraction, both the tightly structured and tightly structured 
approach can be used in the system.
 Montresor proposes a framework supporting developers and researchers in the 
design of Peer-to-Peer system
 \cite{babaoglu02anthill}.
 
@@ -1477,17 +1480,17 @@
 
 Frequent assumption in Peer-to-Peer systems is that peers are willing to 
cooperate (e.g., \cite{shneidman03rationality}). 
 Another belief is that all peers would behave equally, i.e., all peers both 
consume and contribute services \cite{saroiu02measurementstudyp2p}.
-However, these assumptions are not true as several studies show 
\cite{saroiu02measurementstudyp2p, 
-oram01harnessingpower, hearn02mojonation}. Peers rather consume than 
contribute and peers are 
-unwilling to cooperate. 
+However, these assumptions are not true as several publications show: peers 
rather consume than contribute and are 
+unwilling to cooperate \cite{saroiu02measurementstudyp2p, 
oram01harnessingpower, hearn02mojonation}. 
 
 Somewhat surprisingly little research has been done in this area, especially 
when considering 
 the possible impact of \emph{unwanted social behavior} to performance of a 
Peer-to-Peer 
 system. The problem is addressed by Golle et al. \cite{golle01incentivesp2p}, 
Ngan et al. 
 \cite{ngan03enforcefile} and Shneidman et al. \cite{shneidman03rationality}. 
 Some research has been focused on semantic properties of the overlay in order 
to increase 
-cooperation among participating peers \cite{crespo02semanticoverlay}. 
Ramanathan et al. 
-\cite{ramanathan02goodpeers} and Bernstein et al. \cite{bernstein03selection} 
use 
+co-operation among participating peers \cite{crespo02semanticoverlay}, i.e., 
peers' 
+neighbor connections are influenced by the content of data (e.g. music or 
movies).
+Ramanathan et al. \cite{ramanathan02goodpeers} and Bernstein et al. 
\cite{bernstein03selection} use 
 empirical metrics and decision trees when teaching peers to make better 
decisions
 when contacting other peers in Peer-to-Peer system. Alpine \cite{alpineurl} is 
an example of
 Peer-to-Peer system, which uses empirical metrics for a peer selection.
@@ -1499,17 +1502,16 @@
 is due to complex nature of Peer-to-Peer system, which makes comprehensive 
simulations very 
 difficult \cite{bhagwan03availability}. Floyd et al. have been studying the 
simulation of the Internet in \cite{504642}. Authors
 state that simulating the Internet is very challenging task, because of its 
heterogeneity
-and rapid change. Obviously, these factors exist also in Peer-to-Peer systems 
even with higher
+and rapid change. These factors exist also in Peer-to-Peer systems even with 
higher
 rates.
 
 As long as comprehensive simulations of a Peer-to-Peer systems are lacking, we 
cannot make any detailed
-analysis on general properties of a Peer-to-Peer system, such as usage 
patterns. However, we can assume 
-that, e.g., query keywords follow the Zipf-like distribution 
\cite{breslau98implications} both in the 
-Internet and in Peer-to-Peer systems.
+analysis on general properties of a Peer-to-Peer system, such as usage 
patterns of participating peers.
                
 \subsection{Summary}
 
-In this subsection we list security problems in Peer-to-Peer systems in the 
table.
+In this subsection we list miscellaneous problems in Peer-to-Peer research. 
For each table entry,
+there is a brief description of the problem, possible solutions and comments.
 
 \scriptsize
 \begin{longtable}{|l|l|l|l|}
@@ -1599,14 +1601,15 @@
 \section{Overview}
 
 The Fenfire project \cite{fenfireurl} is an effort to build a location 
transparent, hyperstructured desktop 
-environment. By location transparent, we mean concealing the heterogeneous and 
distributed nature of the system
+environment. By location transparent, we mean hiding the heterogeneous and 
distributed nature of the system
 so that it appears to the end user like one system and by hyperstructured 
system
 a system in which data can be associated with other data arbitrarly. Fenfire 
uses xanalogical storage model 
 \cite{ted-xu-model} as a basis for hyperstructured media. Each data item in 
the Fenfire system has a globally unique 
 identifier. This property should allow making references between \emph{any} 
 data easier and more seamlessly interoperating than in other systems. For 
location transparency in the Fenfire system, 
-we are currently analysing the applicably of Peer-to-Peer infrastructure for 
locating and fetching blocks in a 
-distributed environment. Fenfire was formerly also a partial implementation 
+we are currently analysing the applicability of Peer-to-Peer infrastructure. 
+
+Fenfire was formerly also a partial implementation 
 of the ZigZag\texttrademark -- structure, which has been originally invented 
 by Ted Nelson. Now, however, Fenfire uses Resource Description Framework (RDF) 
\cite{w3rdfurl}
 for representing internal data structures and their relationships. 
@@ -1622,42 +1625,42 @@
 \item \textbf{LibVob}: a graphic library used for creating navigation 
interfaces in complex data views.
 \end{itemize}
 
-In this thesis, we focus on Storm and Alph modules as they are a foundation 
for Peer-to-Peer functionality
+In this thesis, we focus on Storm and Alph modules as they are the foundation 
for Peer-to-Peer functionality
 in the Fenfire system.
 
 \section{Xanalogical storage model}
 
 Xanalogical storage model \cite{nelson99xanalogicalneeded} is a different kind 
of model for 
 presenting data and relationships between data, e.g., while in the World Wide 
Web links are
-between \emph{documents}, in xanalogical storage model links are between 
individual 
-\emph{characters}. \emph{Enfilade},can be considered as a mutable ''virtual 
file'' (or part of one), which is a list 
-of fluid media content. Fluid media is the smallest units of data in 
xanalogical storage
-model. \emph{Transclusion} is an inclusion in 
+between documents, in the xanalogical storage model links are between 
individual 
+characters\footnote{Xanalogical storage model
+is not limited to text. It can support arbitrary data, e.g., pixels of picture 
or 
+frames of video.}. \emph{Enfilade} is a mutable ''virtual file'' (or part of 
one), which is a list 
+of fluid media content. Fluid media is the smallest units of data in the 
xanalogical storage
+model (e.g., a character). \emph{Transclusion} is an inclusion in 
 enfilade of contents already used in another enfilade. With the transclusion, 
a system 
-implementing xanalogical storage model is able to show \emph{all} data content 
that share the same 
+implementing the xanalogical storage model is able to show \emph{all} data 
content that share the same 
 fluid media with current data content (e.g., all documents in a system 
containing document's text). 
 Figure \ref{fig:xanalogical_model} 
-illustrates xanalogical storage model with documents, text and characters.
+illustrates the xanalogical storage model with documents, text and characters.
 
-In xanalogical storage model, links between data are external 
+In the xanalogical storage model, links between data are external 
 and bidirectional. A link is shown between any two data contents 
-containing a specific \emph{fluid media unit} (e.g., a character) that the 
link connects.
-Each fluid media unit in xanalogical storage model has a
-permanent, globally unique identifier\footnote{Xanalogical storage model
-is not limited to text. It can support arbitrary data, e.g., pixels of picture 
or 
-frames of video.}. For instance, let's consider the following
+containing a specific fluid media unit that the link connects.
+Each fluid media unit in the xanalogical storage model has a
+permanent, globally unique identifier. For instance, let's consider the 
following
 example, presented first time in \cite{lukka02freenetguids}: ''the character 
'D' 
 typed by Janne Kujala on 10/8/97 8:37:18''. When character 
-'D' is first typed in, xanalogical storage model
+'D' is first typed in, the xanalogical storage model
 creates a permanent globally identifier for that character 
-and retains it when the character is copied to different document. In 
practice, xanalogical 
+and retains it when the character is copied to different document. In 
practice, the xanalogical 
 storage model uses \emph{spans}, ranges of consecutive
 fluid media units to perform storage operations. 
 
 \begin{figure}
 \centering
 \includegraphics[width=14cm, height=12cm]{xanadu_model.eps}
-\caption{Xanalogical storage model.}
+\caption{Xanalogical storage model with documents, text and characters.}
 \label{fig:xanalogical_model}
 \end{figure}
 
@@ -1665,19 +1668,19 @@
 \section{Storm}
 
 In this section, we will give a brief overview of Storm design. More 
information can be found
-from recent publications: for general discussion about Fenfire in Peer-to-Peer 
environment, 
+from recent publications. For general discussion about Fenfire in Peer-to-Peer 
environment, 
 see \cite{lukka02freenetguids}, and for detailed Storm design, see 
\cite{fallenstein03storm}.
 
-Storm (for \emph{STORage Module}) stores all data as \emph{blocks}, which
+Storm (for \emph{STORage Module}) stores all data as data \emph{blocks}, which
 are immutable byte sequences. Storm \emph{assigns} a globally unique 
identifier to each
 data block\footnote{This resembles the process how tightly structured overlay 
assigns a subset of keys
 to each participating peer: a user has no control over the assignment 
process.}. SHA-1 cryptographic 
 content hash function\footnote{SHA-1 is 
 considered as a collision free hash function. Therefore, it is very unlikely 
that two different Storm data blocks 
 would have same identifier.} \cite{fips-sha-1} is used
-for creating semantic-free, globally unique identifiers for blocks. Because of 
SHA-1
+for creating unstructured and semantic-free, globally unique identifiers for 
blocks. Because of SHA-1
 content hash, all identifiers are directly the data verifiers as well. The 
uniquess of blocks creates 
-a basis for implementing xanalogical storage model in the Fenfire system. 
Storm blocks have much in common with regular files as they
+a basis for implementing the xanalogical storage model in the Fenfire system. 
Storm blocks have in common with regular files as they
 both contain the data. The main difference is that Storm blocks are 
\emph{immutable} since any 
 change to the byte sequence would change block's hash value (i.e., globally 
unique identifier).  
 
@@ -1690,28 +1693,28 @@
 \emph{Pointer} \cite{benja02urn5, fallenstein03storm} is a semantic-free 
updatable reference to 
 Storm data block. Pointer is a unique reference to the data and it is usually 
 represented as a random string. Storm pointers are rather a \emph{concept} of 
data (e.g., ''The front page of the most recent 
-version of New York Times newspaper'') whereas scroll blocks \emph{contain} 
the data 
+version of New York Times newspaper'') whereas blocks \emph{contain} the data 
 (''New York Times newspaper, 10.10.2002, version 1.0'').
 Figure \ref{fig:storm_model} illustrates Storm storage model with pointers. 
 
 Each pointer is \emph{linked} to a collection of \emph{pointer blocks}. 
-Pointers can be created by a user, before the creation of scroll blocks. 
Pointer blocks 
-are created automatically by Storm when a scroll block is created and 
associated with a pointer 
-(e.g., a user creates a scroll block associated with the concept ''The front 
page of the most recent 
-version of New York Times newspaper''). Pointer block has always a single 
target (i.e., a scroll block) 
-for the pointer, saying that pointer $P$ targets block $B$. In addition to 
this, pointer block 
+Pointers can be created by a user, before the creation of a data block. 
Pointer blocks 
+are created automatically by Storm when a data block is created and associated 
with a pointer 
+(e.g., a user creates a data block associated with the concept ''The front 
page of the most recent 
+version of New York Times newspaper''). Pointer block has always a single 
target (i.e., a data block), 
+saying that pointer $P$ targets data block $B$. In addition to this, pointer 
block 
 may contain a list of zero or more obsoleted pointer blocks: when a new 
version of pointer 
 block is created, it supersedes one older version which has been created in 
the past using the Storm indexing
 mechanisms. For details, see \cite{fallenstein03storm}. Next 
-time, when the pointer is used for referring to a specific scroll block, only 
+time, when the pointer is used for referring to a specific data block, only 
 the most recent pointer's block target is loaded. However, the pointer blocks 
pointing
-to the previous versions of scroll blocks remains accessible, if needed. In 
figure \ref{fig:storm_pointercreation}, 
+to the previous versions of data blocks remains accessible, if needed. In 
figure \ref{fig:storm_pointercreation}, 
 we show the overall pointer creation process. 
 
 \begin{figure}
 \centering
 \includegraphics[width=10cm, height=10cm]{storm_uml.eps}
-\caption{Implementation of xanalogical storage model on Storm. Storm storage 
model is based on
+\caption{Implementation of the xanalogical storage model on Storm. Storm 
storage model is based on
 fluid media units, i.e., fluid media units are smallest units of data. 
Currently, Storm provides a support
 for textual fluid media units (characters) only, but a support for arbitrary 
data (e.g., video or music) is 
 planned in future versions of Storm.}
@@ -1741,7 +1744,7 @@
 
 Some research regarding to Peer-to-Peer technologies and hypermedia systems 
have been made by Lukka et al. 
 \cite{lukka02freenetguids}. Authors' work is mainly based on the insight of 
implementing  
-xanalogical storage model in Peer-to-Peer environment with globally unique 
identifiers. Lukka et al.
+the xanalogical storage model in Peer-to-Peer environment with globally unique 
identifiers. Lukka et al.
 use Freenet \cite{clarke00freenet} as an example Peer-to-Peer system 
supporting 
 globally unique identifiers. The work presented in this thesis extends their 
work by 
 evaluating different Peer-to-Peer systems more extensively to Fenfire's needs.
@@ -1758,11 +1761,10 @@
 First, as discussed in chapter 4, xanalogical document is a ''virtual
 file'', in which parts of the document are fetched from a 
 \emph{global} data repository\footnote{Global repository is not a requirement. 
Locally constructed xanalogical
-documents are feasible and they can be assembled without global data 
repository.}. Thus, system implementing xanalogical storage model \emph{must}
-support global data lookups order to assemble the ''virtual file'' from 
fragments of data. 
-Specifically, our task is to locate and fetch (i.e., obtain) \emph{all} Storm 
blocks\footnote{These blocks are called \emph{scroll} blocks.}, 
-associated to a specific ''virtual file'' from the Peer-to-Peer once the 
construction of ''virtual'' file
-is resolved (i.e., we know what blocks are required to assemble the ''virtual 
file''). Also, in addition to the
+documents are feasible and they can be assembled without global data 
repository.}. System implementing the xanalogical storage model \emph{must}
+support global scale data lookups, i.e., if a data item exists in the system 
it can be located and fetched. 
+Specifically, our task is to locate and fetch (i.e., obtain) all Storm 
blocks\footnote{We call these blocks as \emph{scroll} blocks.}, 
+associated to a specific ''virtual file'' from the Peer-to-Peer network. Also, 
in addition to the
 \emph{direct} block obtaining using globally unique identifier of Storm block, 
 we also must support the \emph{indirect} obtaining of Storm block using the 
pointer mechanism. 
 Second, we want that users' operations in Fenfire 
@@ -1779,9 +1781,9 @@
 respond to fetching of Storm blocks as fetching can be performed easily once
 Storm block is located. 
 
-In chapter 2, we discussed main differences between the loosely and the 
tightly structured 
+In chapter 2, we discussed main the differences between the loosely and the 
tightly structured 
 approach. As stated, the most significant difference is that the tightly 
structured 
-approach has logarithmical properties in all internal operations, while the 
loosely 
+approach has at least poly-logarithmical properties in all internal 
operations, while the loosely 
 structured approach doesn't always have even linear properties. Furthermore, 
the
 data lookup model of the tightly structured overlay scales much better than in 
loosely 
 structured overlays; the tightly structured overlay supports global data 
lookups
@@ -1792,27 +1794,24 @@
 
 Since both Storm and tightly structured overlays use globally unique 
identifiers for each data item,
 it is feasible to use tightly structured overlays for \emph{locating} Storm 
blocks efficiently.
-Another key feature of tightly structured overlays is that they are able
-to provide general purpose \emph{interface} for Reference Resolution Services 
(RRS)
- \cite{balakrishnan03semanticfree}. Authors argue that next generation RRS 
must be 
-application-independent and references itself should be \emph{unstructured} 
and 
-\emph{semantically free}.  Thus, we see the tightly structured approach as the 
best alternative to 
-locate data in Peer-to-Peer environment.
+Additionally, the unstructured and semantic-free properties of Storm 
identifiers enables
+the use of general purpose Reference Resolution Services (RRS) 
\cite{balakrishnan03semanticfree} on
+top of the tightly structured overlay.
 
 Again, there are research challenges with tightly structured systems which 
have to be
 addressed, as described in chapter 3. The main concerns include decreased 
performance and fault 
 tolerance in presence of system flux, non-optimal distance functions in 
identifier space, 
 proximity routing, hostile entities and flexible search 
\cite{balakrishanarticle03lookupp2p}.
-Additionally, there is only little real world experiments yet with tightly 
structured systems
+Additionally, there is only little real world experiments with tightly 
structured systems
 (e.g., \cite{overneturl, edonkey2kurl}). Therefore, we can't say for sure, how 
well these 
-systems would perform in real Peer-to-Peer environment. However, we believe 
that these issues are 
-solved in near future, since there is a strong and wide research community 
towards to the tightly structured 
+systems would perform in real Peer-to-Peer environment. However, we believe 
that these issues will be 
+solved in the near future, since there is a strong and wide research community 
towards tightly structured 
 overlays \cite{projectirisurl}.
 
       
 \section{Fenfire system model in Peer-to-Peer environment}
 
-In this section we give a proposal for Fenfire Peer-to-Peer system, which 
consists 
+In this section we give a proposal for the Fenfire Peer-to-Peer system, which 
consists 
 of several technologies reviewed in this thesis.  Then, we introduce methods 
for 
 obtaining Fenfire data from a Peer-to-Peer network. 
  
@@ -1821,7 +1820,9 @@
 We emphasize that we prefer \emph{abstraction}
 level analysis as very recently better and better tightly structured 
algorithms have been proposed. 
 Thus, we don't want to bind our system proposal to a specific algorithm 
definitively as we expect 
-that this development continues. Currently, we see Kademlia 
\cite{maymounkov02kademlia} as the best algorithm for
+that this development continues. 
+
+Currently, we see Kademlia \cite{maymounkov02kademlia} as the best algorithm 
for
 locating data efficiently in the Peer-to-Peer overlay. There are two
 reasons for this. First, Kademlia's XOR-based distance function is superior
 over distance functions of other systems (see section 2.3.2). Secondly, 
Kademlia
@@ -1829,24 +1830,24 @@
 (e.g., \cite{overneturl, edonkey2kurl, kashmirurl,kato02gisp}), which means 
that 
 Kademlia's algorithm is simple and easy to implement.
 
-On top of Kademlia, we propose the usage of Sloppy hashing 
\cite{sloppy:iptps03} which
+On top of Kademlia, we propose the use of Sloppy hashing \cite{sloppy:iptps03} 
which
 is optimized for the DOLR abstraction of tightly structured overlays. With 
Sloppy hashing
-we can provide locality properties for the Fenfire system.
+we can provide locality properties for the Fenfire system which may be useful 
+within a small group of working people.
 
 For better fault tolerance and self-monitoring for Fenfire, we propose 
techniques
 presented by Rowston et al. \cite{rowston03controlloingreliability}.  With 
these 
 techniques, we can ensure the performance of the Fenfire system in a highly 
adverse conditions, such
 as sudden network partition, or highly dynamic and heterogeneous environment.
 
-Finally, for more efficient data transfer, we can use variable techniques for 
this purpose.
+Additionally, for more efficient data transfer, we can use variable techniques 
for this purpose.
 For small amounts of data, HTTP can be used \cite{rfc2068}. For large amounts 
of data, we can use
 multisource downloads for better efficiency and reliability. Specifically, the 
technology based
 on rateless erasure codes \cite{maymounkov03ratelesscodes} seems very 
promising.
 Furthermore, multisource downloads can be used for decreasing load of a 
certain peer, thus avoiding query 
 hot spots in the system \cite{ratnasamy02routing}. Current client-server 
implementation of Fenfire uses 
 standard single source downloads (HTTP) and SHA-1 \cite{fips-sha-1} 
cryptographic content 
-hash for verifying the integrity of data by recomputing the content hash
-for Storm block. In face of multisource downloads, Fenfire must support
+hash for verifying the integrity of data. In face of multisource downloads, 
Fenfire must support
 tree-based hashes\footnote{With multisource downloads, tree-based hash 
functions can be used 
 to verify fixed length segments of data. If hash value of data segment is 
incorrect, 
 we need only to fetch \emph{segment} of data (instead of whole data) from 
@@ -1855,17 +1856,17 @@
 
 \subsection{Methods}
 
-We use the DOLR abstraction of the tightly structured approach since DOLR 
systems locate data without 
+In our data lookup methods, we use the DOLR abstraction of the tightly 
structured approach since DOLR systems locate data without 
 specifying a storage policy explicitly \cite{rhea03benchmarks}, i.e., each 
participating peer hosts 
-the data they are offering and the overlay maintains only the \emph{pointers} 
to the data. 
-DHT-based storage systems, such as CFS \cite{dabek01widearea} and PAST 
\cite{rowstron01storage}, may have
+the data they are offering and the overlay maintains the \emph{pointers} to 
the data. 
+Storage systems based on the DHT abstraction, such as CFS 
\cite{dabek01widearea} and PAST \cite{rowstron01storage}, may have
 severe problems with load balancing in a highly heterogeneous environment 
\cite{rao03loadbalancing}. The problem is caused by peers 
 which may not be able to store relatively large blocks, assigned randomly by 
the mapping function of the overlay. 
 
 For simplicity, we assume that we have resolved the construction of the 
''virtual file'' before locating any Storm blocks, i.e., 
 when assembling the ''virtual file'' we know all the Storm blocks, which are 
required to complete the ''virtual file''. 
 Also, we don't respond to the security issues related to Peer-to-Peer systems, 
since there is no working solution
-available yet. Thus, we either assume that Fenfire has a reliable technique 
for identifying individual entities, or
+available yet. We either assume that Fenfire has a reliable technique for 
identifying individual entities, or
 there are no hostile entities among participating peers, i.e.,  Storm blocks 
can be identified correctly (e.g., when
 performing searches). In the next subsection, we discuss security problems in 
more detail. 
 
@@ -1919,19 +1920,19 @@
 Perhaps the most biggest issue in Peer-to-Peer systems is the non-maturity of
 security technologies. For the Fenfire system, one security related problem 
occurs when a user wants to 
 perform a global data lookup with a given pointer; how the user is able to 
verify 
-the correctness of the search results, i.e.,  how she or he knows which one is 
the 
+the correctness of the search results, e.g., how she or he knows which one is 
the 
 correct Storm block ? Another problem related to the Fenfire's 
 security is that if a user downloads data from the network to local computer 
 and after a network disconnection, user wants to verify \emph{off line} the 
 authenticity of data. Finally, if a data lookup is performed by a user, but 
there is no reply
 from the Fenfire system, how are we able to know if this was the Spam attack 
\cite{naor03simpledht}, 
 or the data really doesn't exist in the system ?  
-However, these problems are not only limited to the Fenfire system as it 
+These problems, however, are not only limited to the Fenfire system as it 
 concerns all Peer-to-Peer computer systems.   
 
-Obviously, optimal solutions to all security issues would be that digital 
+Optimal solutions to all security issues would be that digital 
 signatures are included to every message which are sent to the system or the 
use of working PKI-based 
-certificate distribution. As security technologies come more mature, we wish 
to apply these 
+certificate distribution. As security technologies become more mature, we wish 
to apply these 
 technologies with Fenfire, if applicable.
 
 \chapter{Conclusions and future work}
@@ -1943,21 +1944,20 @@
 problems into the three sub-categories: security related problems,
 performance related problems and miscellaneous problems. 
 
-Then we gave a brief overview of the Fenfire system and xanalogical storage 
model. We also 
+Then we gave a brief overview of the Fenfire system and the xanalogical 
storage model. We also 
 described Storm software module.
 
 In the last chapter, we evaluated existing Peer-to-Peer approaches with regard
-to Fenfire's needs. We see that the tightly structured approach is the
-best alternative to Fenfire's needs for the following reasons. First, our 
Storm design uses \emph{semantic-free references} 
-(block identifiers and pointers) for locating data in distributed 
-networks. As the authors of \cite{balakrishnan03semanticfree},
-we also agree that references should be semantically-free in next-generation 
-reference resolution services. Second, by using 
-the DOLR abstraction of tightly structured overlay and Sloppy hashing , we can 
-minimize the lack of locality in the tightly structured approach, i.e., we are 
able to locate nearby 
-data without looking up data from distant peers. Third, we believe that issues 
-related to tightly structured overlays are solved in the near future, because 
of
-wide and intensive co-operation among research groups.
+to Fenfire's needs. Currently, we see that the tightly structured approach as 
the
+best alternative to Fenfire's needs for the following reasons.
+First, both Storm and tightly structured overlays use globally unique 
identifiers for each data item. Therefore,
+it is feasible to use tightly structured overlays for \emph{locating} Storm 
blocks efficiently.
+Second, the unstructured and semantic-free properties of Storm identifiers 
enables
+the use of general purpose Reference Resolution Services (RRS) 
\cite{balakrishnan03semanticfree} on
+top of the tightly structured overlay. As the authors of 
\cite{balakrishnan03semanticfree},
+we also agree that references should be semantically-free in next-generation 
RRS systems. 
+Third, we believe that issues related to tightly structured overlays will be 
solved in 
+the near future, because of wide and intensive co-operation among research 
groups.
 
 Our future work includes a support for searching transclusions and xanalogical
 links in Peer-to-Peer environment. Preliminary analysis have shown
@@ -1966,8 +1966,6 @@
 database systems may prove to be useful. Some fundamental results
 regarding Peer-to-Peer and database systems have already been
 presented in \cite{gribble01p2pdatabase}.  
-
-We will implement a Fenfire Peer-to-Peer prototype in the near future. 
 
 \bibliographystyle{gradu}
 \bibliography{progradu}
[Prev in Thread]
Current Thread
[Next in Thread]
[Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., (continued)
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/25
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/25
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/25
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/25
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/25
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/26
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/26
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/26
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/26
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/26
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä <=
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/27
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/27
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/27
Prev by Date: [Gzz-commits] libvob/src/paper Makefile
Next by Date: [Gzz-commits] gzz/Documentation/Presentations/ThursdaySeminar...
Previous by thread: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...
Next by thread: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...
Index(es):
- Date
- Thread