[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...
From: |
Hermanni Hyytiälä |
Subject: |
[Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert... |
Date: |
Wed, 26 Mar 2003 10:00:23 -0500 |
CVSROOT: /cvsroot/gzz
Module name: gzz
Changes by: Hermanni Hyytiälä <address@hidden> 03/03/26 10:00:22
Modified files:
Documentation/misc/hemppah-progradu: masterthesis.tex
Log message:
"This is a never ending story...didiidii...."
CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/masterthesis.tex.diff?tr1=1.196&tr2=1.197&r1=text&r2=text
Patches:
Index: gzz/Documentation/misc/hemppah-progradu/masterthesis.tex
diff -u gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.196
gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.197
--- gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.196 Wed Mar
26 07:49:20 2003
+++ gzz/Documentation/misc/hemppah-progradu/masterthesis.tex Wed Mar 26
10:00:22 2003
@@ -177,7 +177,7 @@
In the loosely structured approach the construction and the maintenance of the
overlay is controlled
loosely. The placement of services and the topology of overlay is random. The
data lookup model in loosely structured systems is
not very efficient, because of unstructured properties of the overlay. Data
lookup model is a combination of methods which
-are used for finding data from the overlay.
+are used for locatin data in the overlay.
\subsection{Definition}
@@ -893,7 +893,7 @@
According to \cite{aberer01trust}, mutual trust ''...allows agents to
cooperate in a game-theoretic situation that corresponds
to the repeated prisoners dilemma and leads in the long term to an increased
aggregated utility for the participating agents''.
-They define \emph{trust management} as a mechanism that allows to establish
mutual trust. Furthermore, \emph{reputation} is a measure
+The authors of \cite{aberer01trust} define \emph{trust management} as a
mechanism that allows to establish mutual trust. Furthermore, \emph{reputation}
is a measure
that is derived from knowledge on interactions in the past
\cite{aberer01trust}. In this subsection, we discuss mechanisms to maintain
trust in Peer-to-Peer systems.
@@ -922,14 +922,14 @@
\subsection{Anonymity}
According to \cite{dingledine00free}, there exist several kinds of anonymity:
author-anonymity,
-publisher-anonymity, reader-anonymity, peer-anonymity and query-anonymity.
Author-anonymity is a form
-of anonymity in which no one can link the author (who created the document) to
a document.
-In publisher-anonymity system, no one is able to determine the publisher (how
published the document into
+publisher-anonymity, reader-anonymity, peer-anonymity and query-ano-nymity.
Author-anonymity is a form
+of anonymity in which no one can link author (who created the document) to a
document.
+Publisher-anonymity means that no one is able to determine the publisher (how
published the document into
the system) of a document. Reader-anonymity means that a document cannot be
linked to its readers.
With peer-anonymity, no one is able to determine the peer, where the document
was originally published.
-Document-anonymity means that a peer doesn't know which data it is currently
hosting. Query-anonymity is a form
+Document-anonymity means that a peer doesn't know which data it is currently
hosting. Finally, query-anonymity is a form
of document-anonymity; when other peers perform data lookups, a peer doesn't
know which data it serves
-to the data lookup originators. As the authors cite in
\cite{dingledine00free}, some forms of anonymity
+to the data lookup originators. As the authors of \cite{dingledine00free}
cite, some forms of anonymity
may imply each other and possible issues raised by this property is one area
of future work.
Obviously, existance of several types of anonymity often conflicts with other
key properties of
@@ -940,14 +940,14 @@
For instance, pseudonymity can be used for addressing peer-anonymity by
providing anonymous-like identifiers to
peers (e.g., peer identifiers of a tightly structured system).
-Anonymity is widely used in a Peer-to-Peer system in which data publication
and non-censorship are important. These include
-Forwarding proxies are used in Freenet \cite{clarke00freenet}, Crowds
\cite{reiter98crowds} and Free Haven \cite{dingledine00free}
+Anonymity is widely used in a Peer-to-Peer system in which data publication
and non-censorship are important. Forwarding
+proxies are used in Freenet \cite{clarke00freenet}, Crowds
\cite{reiter98crowds} and Free Haven \cite{dingledine00free}
in order to provide various types of anonymity. Tangler \cite{502002} and
Publius \cite{pub00} use cryptographic sharing methods
to split data into fragments \cite{Shamir1979a}. Mix mailer networks (e.g.,
\cite{mixminionurl}) are commonly used in
distributed systems which are able to provide some level of anonymity (e.g.,
\cite{mneturl}).
Even if many existing Peer-to-Peer systems are able to provide some of the
types of anonymity, there is no
-such a system which is able to provide complete anonymity. Specifically, the
conflicts
+such a system which is able to provide complete anonymity in all levels (see
above). Specifically, the conflicts
between anonymity and other properties of Peer-to-Peer system require more
research work.
@@ -964,21 +964,25 @@
schema policies to restrict access to certain data. Their current early
prototype version only works in
loosely structured systems.
+More research work is required in the development of \emph{working} access
control scheme for Peer-to-Peer
+systems.
+
\subsection{Hostile entities}
One serious problem in Peer-to-Peer systems is the inability to distinguish
hostile entities from regular entities
-trustworthy.
+trustworthy. Identification of hostile entities is essential in the tightly
structured
+approach, in which the fundamental (and implicit) assumption is that there is
a random, uniform distribution
+of peer identifiers that cannot be controlled by a hostile entity.
+
One possible solution is to use a self-monitoring system, such as SOMO
\cite{zhang03somo}, in which a self-monitoring overlay
constantly analyses the Peer-to-Peer overlay. Self-monitoring overlay is built
on top of Peer-to-Peer overlay. Authors in
\cite{sit02securitycons} suggest the use of system invariants. They emphasize
that system invariants should be veriable, and if
system invariants fail the system must have a recovery mechanism. In
distributed peer identifier assignment \cite{castro02securerouting,
clarke00freenet},
-multiple participating peers participate in a creation of peer identifier.
Identification of hostile entities is essential in the tightly structured
-approach, in which the fundamental (and implicit) assumption is that there is
a random, uniform distribution
-of peer identifiers that cannot be controlled by a hostile entity.
+multiple participating peers participate in a creation of peer identifier.
-Naturally, centralized authorities could be used for the assignment of peer
identifiers, but they may not be suitable
-for ad hoc Peer-to-Peer environrment and have property of single point of
failure. Moreover, distributed peer
+Centralized authorities could be used for the assignment of peer identifiers,
but they may not be suitable
+for ad hoc Peer-to-Peer environrment and have property of single point of
failure. Distributed peer
identification assignment can be problematic as long as the Sybil attack
\cite{douceur02sybil} remains unsolved.
However, there are some partial solutions for controlling the \emph{rate} at
which hostile entity is able to obtain peer
identifier, such as crypto-based puzzles \cite{juels99clientpuzzles}.
@@ -989,18 +993,14 @@
is able to deliver a network message thoughout the overlay to a correct
destination efficiently.
Aspnes et al. in \cite{aspnes02faultrouting} and Kaashoek et al. in
\cite{kaashoek03koorde} formally
-prove the lower and upper bounds for the space requirements of locating a
specific data item reliable in a
+prove the lower and upper bounds for the space requirements of locating a
specific data item reliably in a
Peer-to-Peer system. They show that to provide high degree of fault tolerance
and efficiency in the system, each
-participating peer must maintain average of $O(\log{n})$ neighbors. Fiat et
al. in \cite{fiat02censorship, saia02dynamicfaultcontentnetwork}
-and Datar in \cite{datar02butterflies} propose a tightly structured overlay
with analytical results in the
-presence of hostile entities. However, none of these proposals address a
dynamic tightly structured
-overlay with fault tolerance against multiple rounds
-of hostile attacks. Also, above mentioned proposals are not very efficient. In
\cite{fiat02censorship}, each peer
-must maintain information of $O(\log^3{n})$ other peers, and in
\cite{datar02butterflies}, $O(\log^2{n})$ is required.
+participating peer must maintain average of $O(\log{n})$ neighbors.
+
Authors argue in \cite{castro02securitystructured} that with the combination
of
secure peer identifer assignment, secure routing table maintenance and secure
message forwarding
-the secure routing in tightly structured systems is possible. Additionally,
authors cite in \cite{castro02securerouting}
+secure query routing in tightly structured systems is possible. Additionally,
authors cite in \cite{castro02securerouting}
that the probability of routing successfully between to arbitrary
correct peers is $(1-f)^{h-1}$, when a fraction $f$ of the other peers are
faulty or hostile and where
$h$ is the number of hops in the overlay. Sit and Morris
\cite{sit02securitycons} discuss the possibility of
@@ -1011,11 +1011,15 @@
Lynch et al. \cite{lynch02atomicdataaccess} propose a solution for secure
routing table
maintenance, but their solution seems to have two major problems according to
\cite{castro02securitystructured}.
First, the solution is very expensive even without faulty or hostile entities.
Second, each group of replicas
-in their solution must have less than 1/3 of its peers faulty. Thus, this
feature results in a low
+in their solution must have less than 1/3 of its peers faulty. This feature
results in a low
probability of successful routing.
-Finally, Gavoille \cite{gavoille01routing} lists open problems in general
distributed systems
-(not only in Peer-to-Peer domain).
+Fiat et al. in \cite{fiat02censorship, saia02dynamicfaultcontentnetwork}
+and Datar in \cite{datar02butterflies} propose a tightly structured overlay
with secure query routing in the
+presence of hostile entities. However, none of these proposals address a
dynamic tightly structured
+overlay with fault tolerance against multiple rounds
+of hostile attacks. Also, above mentioned proposals are not very efficient. In
\cite{fiat02censorship}, each peer
+must maintain information of $O(\log^3{n})$ other peers, and in
\cite{datar02butterflies}, $O(\log^2{n})$ is required.
\subsection{Other security threats}
@@ -1027,7 +1031,8 @@
\subsection{Summary}
-In this subsection we list security problems in Peer-to-Peer systems in the
table.
+In this subsection we list security problems in Peer-to-Peer research. For
each table entry,
+there is a brief description of the problem, possible solutions and comments.
\scriptsize
@@ -1157,12 +1162,11 @@
\subsection{Efficient data lookup}
The most intensive research in Peer-to-Peer domain has been focused on
efficient data lookup methods,
-especially with the loosely structured approach. In iterative deepening
-\cite{yang02improvingsearch}, multiple BFS searches are initiated
-with successively larger TTL depth limits, until either the query is
satisfied,
-or the maximum depth $D$ has been reached.
+especially with the loosely structured approach.
-Expanding ring, proposed by Shenker et al. in \cite{lv02searchreplication},
+In iterative deepening \cite{yang02improvingsearch}, multiple BFS searches are
initiated
+with successively larger TTL depth limits, until either the query is
satisfied,
+or the maximum depth $D$ has been reached. Expanding ring, proposed by Shenker
et al. in \cite{lv02searchreplication},
is similar to the iterative deepening technique. In this method, a peer starts
a flood with small TTL, and
waits to see if the search is successful. If it is, then the peer stops the
data lookuo. Otherwise, the peer increases
the TTL and starts another data lookup. With these techniques, searches
@@ -1186,8 +1190,9 @@
has a poor response time but it doesn't generate as much network traffic as
the original BFS. As suggested in \cite{lv02searchreplication}, the
random walk approach can be made more effective by introducing
-multiple simultaneously working ''walkers''. Freenet \cite{clarke00freenet}
uses
-random walk searches in data lookups. Freenet's data lookup model resembles
+multiple simultaneously working ''walkers''.
+
+Freenet \cite{clarke00freenet} uses random walk searches in data lookups.
Freenet's data lookup model resembles
Depth-First-Search (DFS) and peers' routing tables are dynamically built
using caching. This is an outcome of Freenet's main design principles,
anonymity.
Another property of the Freenet's data lookup model is that
@@ -1211,19 +1216,17 @@
To make Peer-to-Peer systems even more popular (and usable), these systems
have to support flexible, efficient
and simple search methods \cite{li03feasibility}. Currently, loosely
structured systems are able to carry out the flexibility and
simplicity requirements and tightly structured systems are able to fulfill the
efficiency requirement.
-Research efforts have been focused to make security methods in tightly
structured systems more usable since
-the data lookup model of the loosely structured approach doesn't scale.
However, studies
-show that combining flexible search methods and tightly structured systems may
not be trivial \cite{harren02complex,
-ansaryefficientbroadcast03}. The main problem with tightly structured systems
is the
-fact that tightly structured algorithms perform data lookups based on a
globally unique identifier thereby
-making efficient keyword searches hard to implement.
+Research efforts have been focused to make security methods in tightly
structured systems more usable and flexible. However, studies
+show that combining flexible search methods and tightly structured systems may
not be trivial: tightly
+structured algorithms perform data lookups based on a globally unique
identifier thereby making efficient keyword searches hard to implement
+\cite{harren02complex, ansaryefficientbroadcast03}.
Some studies have been concentrated on SQL-like queries \cite{harren02complex}
in tightly structured overlays. It is unknown, however, if this approach is
realizable to implement, since
initial analysis have shown that this approach is rather complex. Other
approaches include adaption of the data lookup model of the loosely
-structured approach into tightly structured systems
\cite{ansaryefficientbroadcast03, chord:om_p-meng}.
-Work in \cite{ansaryefficientbroadcast03} seems quite promising. Authors' work
is based on insight that
-performing data lookup in the overlay resembles regular tree-like search,
where trees' data structure
+structured approach into tightly structured systems
\cite{ansaryefficientbroadcast03}.
+Authors' work is based on insight that
+performing a data lookup in the overlay resembles regular tree-like search,
where trees' data structure
is distributed throughout the overlay. Some studies suggest additional layer
upon overlay network \cite{kronfol02fasdsearch,
joseph02p2players}, which use metadata to implement search methods. The
feasibility of implementing additional
search layer on top of the network layer is questionable, especially if the
search layer and the network
@@ -1239,14 +1242,14 @@
decrease the result quality in order to make searching more efficient. Second,
Peer-to-Peer systems must
consult the properties of underlying network for better performance.
-Many techniques have been developed in order to provide more efficient search
indexing. As
+Additionally, many techniques have been developed in order to provide more
efficient search indexing. As
several studies show, the popularity of queries in the Internet follow
Zipf-like
distributions\footnote{Zipf-distribution is a variant of power-law function.
Zipf-distribution can be used in observation of frequency of occurrence event
$E$, as a function of the rank
$i$ when the rank is determined by the frequency of occurrence, is a power-law
function $E_i \sim \frac{1}{i^{a}}$,
where the exponent $a$ is close to unity.} (e.g.,
\cite{breslau98implications}).
Therefore, according to \cite{li03feasibility}, caching and pre-computation
can be done for optimizing search indices.
-Authors in \cite{li03feasibility} use Gap compression \cite{wittengigabytes},
Adaptive Set Intersection \cite{338634}
+Authors in \cite{li03feasibility} use gap compression \cite{wittengigabytes},
adaptive set intersections \cite{338634}
and clustering with their search optimizations. Regular compression
algorithms, Bloom filters \cite{362692}, vector
space models \cite{CuencaAcuna2002DSIWorkshop} and view trees
\cite{Bhattacharjee03resultcache} can be used for even
better optimizations.
@@ -1262,7 +1265,9 @@
structured system, peers join and leave the system constantly without any
restrictions \cite{saroiu02measurementstudyp2p}.
On the other hand, however, peers in tightly structured system join and leave
the system but have less freedom,
i.e., overlay chooses peer's neighbors on behalf of the peer itself and maps
data items randomly
-throughout the overlay network \cite{balakrishanarticle03lookupp2p}. Almost
all presented algorithms
+throughout the overlay network \cite{balakrishanarticle03lookupp2p}.
+
+Almost all presented algorithms
for the tightly structured systems have been analyzed under static simulation
environments \cite{libennowell01observations}. Furthermore, proposed tightly
structured overlays are configured statically to achieve
the desired reliability even in a uncommon and adverse environment
\cite{rowston03controlloingreliability}.
@@ -1274,9 +1279,7 @@
computing power or network bandwidth, all data items are distributed
uniformly. Clearly, this is
a serious problem of tightly structured overlays in face of performance and
load balancing \cite{rao03loadbalancing}.
Measurement study by Saroiu et al. show that there is a extreme heterogeneity
among participating peers in already deployed Peer-to-Peer
-systems \cite{saroiu02measurementstudyp2p}. Symphony \cite{gurmeet03symphony}
seems to be the first tightly structured overlay system
-which supports heterogeneity. Zhao et al. have proposed a secondary layer on
top of a structured overlay
-to support heterogeneity better \cite{zhao02brocade}.
+systems \cite{saroiu02measurementstudyp2p}.
Some research has been done with regard to load balancing properties of
tightly structured
overlays. Byers et al. suggest an idea of ''power of two choices'' whereby
data item is stored at the less loaded
@@ -1296,10 +1299,9 @@
Hot spots happen, when a specific key is being requested extremely often in
tightly structured overlays. Recent study
by Freedman et al. tries to reduce hot spots in the system by performing
\emph{sloppy} hashing
\cite{sloppy:iptps03}. Authors' technique is especially suitable for the DOLR
abstraction of tightly structured overlays.
-They arque that with Sloppy hashing, the generation of query hot spots can be
reduces and peers are able
+They arque that with Sloppy hashing, the generation of query hot spots can be
reduced and peers are able
locate nearby data without looking up data from distant peers. Moreover,
authors'
-proposal for self-organizing clusters using network diameters may be useful,
-especially within small groups of working people.
+proposal for self-organizing clusters using network diameters.
The concept of ''half-life'' was introduced by Liben-Nowell
\cite{libennowell01observations} since Peer-to-Peer
system is \emph{never} in the ''ideal'' state as Peer-to-Peer system is
continiously evolving system. Half-life is defined
@@ -1319,7 +1321,8 @@
\subsection{Summary}
-In this subsection we list performance and usability problems in Peer-to-Peer
systems in the table.
+In this subsection we list performance and usability problems in Peer-to-Peer
research. For each table entry,
+there is a brief description of the problem, possible solutions and comments.
\scriptsize
@@ -1466,7 +1469,7 @@
Recently, there have been few proposals towards common programming guidelines.
Authors in
\cite{zhao03api} propose a higher level abstracions for tightly structured
overlays. Frise et al. suggest the use of
additional layer in Peer-to-Peer system to hide the structure of the overlay
\cite{frise02p2pframework}.
-Thus, with their method both the tightly structured and tightly structured
approach can be used in the system.
+With their abstraction, both the tightly structured and tightly structured
approach can be used in the system.
Montresor proposes a framework supporting developers and researchers in the
design of Peer-to-Peer system
\cite{babaoglu02anthill}.
@@ -1477,17 +1480,17 @@
Frequent assumption in Peer-to-Peer systems is that peers are willing to
cooperate (e.g., \cite{shneidman03rationality}).
Another belief is that all peers would behave equally, i.e., all peers both
consume and contribute services \cite{saroiu02measurementstudyp2p}.
-However, these assumptions are not true as several studies show
\cite{saroiu02measurementstudyp2p,
-oram01harnessingpower, hearn02mojonation}. Peers rather consume than
contribute and peers are
-unwilling to cooperate.
+However, these assumptions are not true as several publications show: peers
rather consume than contribute and are
+unwilling to cooperate \cite{saroiu02measurementstudyp2p,
oram01harnessingpower, hearn02mojonation}.
Somewhat surprisingly little research has been done in this area, especially
when considering
the possible impact of \emph{unwanted social behavior} to performance of a
Peer-to-Peer
system. The problem is addressed by Golle et al. \cite{golle01incentivesp2p},
Ngan et al.
\cite{ngan03enforcefile} and Shneidman et al. \cite{shneidman03rationality}.
Some research has been focused on semantic properties of the overlay in order
to increase
-cooperation among participating peers \cite{crespo02semanticoverlay}.
Ramanathan et al.
-\cite{ramanathan02goodpeers} and Bernstein et al. \cite{bernstein03selection}
use
+co-operation among participating peers \cite{crespo02semanticoverlay}, i.e.,
peers'
+neighbor connections are influenced by the content of data (e.g. music or
movies).
+Ramanathan et al. \cite{ramanathan02goodpeers} and Bernstein et al.
\cite{bernstein03selection} use
empirical metrics and decision trees when teaching peers to make better
decisions
when contacting other peers in Peer-to-Peer system. Alpine \cite{alpineurl} is
an example of
Peer-to-Peer system, which uses empirical metrics for a peer selection.
@@ -1499,17 +1502,16 @@
is due to complex nature of Peer-to-Peer system, which makes comprehensive
simulations very
difficult \cite{bhagwan03availability}. Floyd et al. have been studying the
simulation of the Internet in \cite{504642}. Authors
state that simulating the Internet is very challenging task, because of its
heterogeneity
-and rapid change. Obviously, these factors exist also in Peer-to-Peer systems
even with higher
+and rapid change. These factors exist also in Peer-to-Peer systems even with
higher
rates.
As long as comprehensive simulations of a Peer-to-Peer systems are lacking, we
cannot make any detailed
-analysis on general properties of a Peer-to-Peer system, such as usage
patterns. However, we can assume
-that, e.g., query keywords follow the Zipf-like distribution
\cite{breslau98implications} both in the
-Internet and in Peer-to-Peer systems.
+analysis on general properties of a Peer-to-Peer system, such as usage
patterns of participating peers.
\subsection{Summary}
-In this subsection we list security problems in Peer-to-Peer systems in the
table.
+In this subsection we list miscellaneous problems in Peer-to-Peer research.
For each table entry,
+there is a brief description of the problem, possible solutions and comments.
\scriptsize
\begin{longtable}{|l|l|l|l|}
@@ -1599,14 +1601,15 @@
\section{Overview}
The Fenfire project \cite{fenfireurl} is an effort to build a location
transparent, hyperstructured desktop
-environment. By location transparent, we mean concealing the heterogeneous and
distributed nature of the system
+environment. By location transparent, we mean hiding the heterogeneous and
distributed nature of the system
so that it appears to the end user like one system and by hyperstructured
system
a system in which data can be associated with other data arbitrarly. Fenfire
uses xanalogical storage model
\cite{ted-xu-model} as a basis for hyperstructured media. Each data item in
the Fenfire system has a globally unique
identifier. This property should allow making references between \emph{any}
data easier and more seamlessly interoperating than in other systems. For
location transparency in the Fenfire system,
-we are currently analysing the applicably of Peer-to-Peer infrastructure for
locating and fetching blocks in a
-distributed environment. Fenfire was formerly also a partial implementation
+we are currently analysing the applicability of Peer-to-Peer infrastructure.
+
+Fenfire was formerly also a partial implementation
of the ZigZag\texttrademark -- structure, which has been originally invented
by Ted Nelson. Now, however, Fenfire uses Resource Description Framework (RDF)
\cite{w3rdfurl}
for representing internal data structures and their relationships.
@@ -1622,42 +1625,42 @@
\item \textbf{LibVob}: a graphic library used for creating navigation
interfaces in complex data views.
\end{itemize}
-In this thesis, we focus on Storm and Alph modules as they are a foundation
for Peer-to-Peer functionality
+In this thesis, we focus on Storm and Alph modules as they are the foundation
for Peer-to-Peer functionality
in the Fenfire system.
\section{Xanalogical storage model}
Xanalogical storage model \cite{nelson99xanalogicalneeded} is a different kind
of model for
presenting data and relationships between data, e.g., while in the World Wide
Web links are
-between \emph{documents}, in xanalogical storage model links are between
individual
-\emph{characters}. \emph{Enfilade},can be considered as a mutable ''virtual
file'' (or part of one), which is a list
-of fluid media content. Fluid media is the smallest units of data in
xanalogical storage
-model. \emph{Transclusion} is an inclusion in
+between documents, in the xanalogical storage model links are between
individual
+characters\footnote{Xanalogical storage model
+is not limited to text. It can support arbitrary data, e.g., pixels of picture
or
+frames of video.}. \emph{Enfilade} is a mutable ''virtual file'' (or part of
one), which is a list
+of fluid media content. Fluid media is the smallest units of data in the
xanalogical storage
+model (e.g., a character). \emph{Transclusion} is an inclusion in
enfilade of contents already used in another enfilade. With the transclusion,
a system
-implementing xanalogical storage model is able to show \emph{all} data content
that share the same
+implementing the xanalogical storage model is able to show \emph{all} data
content that share the same
fluid media with current data content (e.g., all documents in a system
containing document's text).
Figure \ref{fig:xanalogical_model}
-illustrates xanalogical storage model with documents, text and characters.
+illustrates the xanalogical storage model with documents, text and characters.
-In xanalogical storage model, links between data are external
+In the xanalogical storage model, links between data are external
and bidirectional. A link is shown between any two data contents
-containing a specific \emph{fluid media unit} (e.g., a character) that the
link connects.
-Each fluid media unit in xanalogical storage model has a
-permanent, globally unique identifier\footnote{Xanalogical storage model
-is not limited to text. It can support arbitrary data, e.g., pixels of picture
or
-frames of video.}. For instance, let's consider the following
+containing a specific fluid media unit that the link connects.
+Each fluid media unit in the xanalogical storage model has a
+permanent, globally unique identifier. For instance, let's consider the
following
example, presented first time in \cite{lukka02freenetguids}: ''the character
'D'
typed by Janne Kujala on 10/8/97 8:37:18''. When character
-'D' is first typed in, xanalogical storage model
+'D' is first typed in, the xanalogical storage model
creates a permanent globally identifier for that character
-and retains it when the character is copied to different document. In
practice, xanalogical
+and retains it when the character is copied to different document. In
practice, the xanalogical
storage model uses \emph{spans}, ranges of consecutive
fluid media units to perform storage operations.
\begin{figure}
\centering
\includegraphics[width=14cm, height=12cm]{xanadu_model.eps}
-\caption{Xanalogical storage model.}
+\caption{Xanalogical storage model with documents, text and characters.}
\label{fig:xanalogical_model}
\end{figure}
@@ -1665,19 +1668,19 @@
\section{Storm}
In this section, we will give a brief overview of Storm design. More
information can be found
-from recent publications: for general discussion about Fenfire in Peer-to-Peer
environment,
+from recent publications. For general discussion about Fenfire in Peer-to-Peer
environment,
see \cite{lukka02freenetguids}, and for detailed Storm design, see
\cite{fallenstein03storm}.
-Storm (for \emph{STORage Module}) stores all data as \emph{blocks}, which
+Storm (for \emph{STORage Module}) stores all data as data \emph{blocks}, which
are immutable byte sequences. Storm \emph{assigns} a globally unique
identifier to each
data block\footnote{This resembles the process how tightly structured overlay
assigns a subset of keys
to each participating peer: a user has no control over the assignment
process.}. SHA-1 cryptographic
content hash function\footnote{SHA-1 is
considered as a collision free hash function. Therefore, it is very unlikely
that two different Storm data blocks
would have same identifier.} \cite{fips-sha-1} is used
-for creating semantic-free, globally unique identifiers for blocks. Because of
SHA-1
+for creating unstructured and semantic-free, globally unique identifiers for
blocks. Because of SHA-1
content hash, all identifiers are directly the data verifiers as well. The
uniquess of blocks creates
-a basis for implementing xanalogical storage model in the Fenfire system.
Storm blocks have much in common with regular files as they
+a basis for implementing the xanalogical storage model in the Fenfire system.
Storm blocks have in common with regular files as they
both contain the data. The main difference is that Storm blocks are
\emph{immutable} since any
change to the byte sequence would change block's hash value (i.e., globally
unique identifier).
@@ -1690,28 +1693,28 @@
\emph{Pointer} \cite{benja02urn5, fallenstein03storm} is a semantic-free
updatable reference to
Storm data block. Pointer is a unique reference to the data and it is usually
represented as a random string. Storm pointers are rather a \emph{concept} of
data (e.g., ''The front page of the most recent
-version of New York Times newspaper'') whereas scroll blocks \emph{contain}
the data
+version of New York Times newspaper'') whereas blocks \emph{contain} the data
(''New York Times newspaper, 10.10.2002, version 1.0'').
Figure \ref{fig:storm_model} illustrates Storm storage model with pointers.
Each pointer is \emph{linked} to a collection of \emph{pointer blocks}.
-Pointers can be created by a user, before the creation of scroll blocks.
Pointer blocks
-are created automatically by Storm when a scroll block is created and
associated with a pointer
-(e.g., a user creates a scroll block associated with the concept ''The front
page of the most recent
-version of New York Times newspaper''). Pointer block has always a single
target (i.e., a scroll block)
-for the pointer, saying that pointer $P$ targets block $B$. In addition to
this, pointer block
+Pointers can be created by a user, before the creation of a data block.
Pointer blocks
+are created automatically by Storm when a data block is created and associated
with a pointer
+(e.g., a user creates a data block associated with the concept ''The front
page of the most recent
+version of New York Times newspaper''). Pointer block has always a single
target (i.e., a data block),
+saying that pointer $P$ targets data block $B$. In addition to this, pointer
block
may contain a list of zero or more obsoleted pointer blocks: when a new
version of pointer
block is created, it supersedes one older version which has been created in
the past using the Storm indexing
mechanisms. For details, see \cite{fallenstein03storm}. Next
-time, when the pointer is used for referring to a specific scroll block, only
+time, when the pointer is used for referring to a specific data block, only
the most recent pointer's block target is loaded. However, the pointer blocks
pointing
-to the previous versions of scroll blocks remains accessible, if needed. In
figure \ref{fig:storm_pointercreation},
+to the previous versions of data blocks remains accessible, if needed. In
figure \ref{fig:storm_pointercreation},
we show the overall pointer creation process.
\begin{figure}
\centering
\includegraphics[width=10cm, height=10cm]{storm_uml.eps}
-\caption{Implementation of xanalogical storage model on Storm. Storm storage
model is based on
+\caption{Implementation of the xanalogical storage model on Storm. Storm
storage model is based on
fluid media units, i.e., fluid media units are smallest units of data.
Currently, Storm provides a support
for textual fluid media units (characters) only, but a support for arbitrary
data (e.g., video or music) is
planned in future versions of Storm.}
@@ -1741,7 +1744,7 @@
Some research regarding to Peer-to-Peer technologies and hypermedia systems
have been made by Lukka et al.
\cite{lukka02freenetguids}. Authors' work is mainly based on the insight of
implementing
-xanalogical storage model in Peer-to-Peer environment with globally unique
identifiers. Lukka et al.
+the xanalogical storage model in Peer-to-Peer environment with globally unique
identifiers. Lukka et al.
use Freenet \cite{clarke00freenet} as an example Peer-to-Peer system
supporting
globally unique identifiers. The work presented in this thesis extends their
work by
evaluating different Peer-to-Peer systems more extensively to Fenfire's needs.
@@ -1758,11 +1761,10 @@
First, as discussed in chapter 4, xanalogical document is a ''virtual
file'', in which parts of the document are fetched from a
\emph{global} data repository\footnote{Global repository is not a requirement.
Locally constructed xanalogical
-documents are feasible and they can be assembled without global data
repository.}. Thus, system implementing xanalogical storage model \emph{must}
-support global data lookups order to assemble the ''virtual file'' from
fragments of data.
-Specifically, our task is to locate and fetch (i.e., obtain) \emph{all} Storm
blocks\footnote{These blocks are called \emph{scroll} blocks.},
-associated to a specific ''virtual file'' from the Peer-to-Peer once the
construction of ''virtual'' file
-is resolved (i.e., we know what blocks are required to assemble the ''virtual
file''). Also, in addition to the
+documents are feasible and they can be assembled without global data
repository.}. System implementing the xanalogical storage model \emph{must}
+support global scale data lookups, i.e., if a data item exists in the system
it can be located and fetched.
+Specifically, our task is to locate and fetch (i.e., obtain) all Storm
blocks\footnote{We call these blocks as \emph{scroll} blocks.},
+associated to a specific ''virtual file'' from the Peer-to-Peer network. Also,
in addition to the
\emph{direct} block obtaining using globally unique identifier of Storm block,
we also must support the \emph{indirect} obtaining of Storm block using the
pointer mechanism.
Second, we want that users' operations in Fenfire
@@ -1779,9 +1781,9 @@
respond to fetching of Storm blocks as fetching can be performed easily once
Storm block is located.
-In chapter 2, we discussed main differences between the loosely and the
tightly structured
+In chapter 2, we discussed main the differences between the loosely and the
tightly structured
approach. As stated, the most significant difference is that the tightly
structured
-approach has logarithmical properties in all internal operations, while the
loosely
+approach has at least poly-logarithmical properties in all internal
operations, while the loosely
structured approach doesn't always have even linear properties. Furthermore,
the
data lookup model of the tightly structured overlay scales much better than in
loosely
structured overlays; the tightly structured overlay supports global data
lookups
@@ -1792,27 +1794,24 @@
Since both Storm and tightly structured overlays use globally unique
identifiers for each data item,
it is feasible to use tightly structured overlays for \emph{locating} Storm
blocks efficiently.
-Another key feature of tightly structured overlays is that they are able
-to provide general purpose \emph{interface} for Reference Resolution Services
(RRS)
- \cite{balakrishnan03semanticfree}. Authors argue that next generation RRS
must be
-application-independent and references itself should be \emph{unstructured}
and
-\emph{semantically free}. Thus, we see the tightly structured approach as the
best alternative to
-locate data in Peer-to-Peer environment.
+Additionally, the unstructured and semantic-free properties of Storm
identifiers enables
+the use of general purpose Reference Resolution Services (RRS)
\cite{balakrishnan03semanticfree} on
+top of the tightly structured overlay.
Again, there are research challenges with tightly structured systems which
have to be
addressed, as described in chapter 3. The main concerns include decreased
performance and fault
tolerance in presence of system flux, non-optimal distance functions in
identifier space,
proximity routing, hostile entities and flexible search
\cite{balakrishanarticle03lookupp2p}.
-Additionally, there is only little real world experiments yet with tightly
structured systems
+Additionally, there is only little real world experiments with tightly
structured systems
(e.g., \cite{overneturl, edonkey2kurl}). Therefore, we can't say for sure, how
well these
-systems would perform in real Peer-to-Peer environment. However, we believe
that these issues are
-solved in near future, since there is a strong and wide research community
towards to the tightly structured
+systems would perform in real Peer-to-Peer environment. However, we believe
that these issues will be
+solved in the near future, since there is a strong and wide research community
towards tightly structured
overlays \cite{projectirisurl}.
\section{Fenfire system model in Peer-to-Peer environment}
-In this section we give a proposal for Fenfire Peer-to-Peer system, which
consists
+In this section we give a proposal for the Fenfire Peer-to-Peer system, which
consists
of several technologies reviewed in this thesis. Then, we introduce methods
for
obtaining Fenfire data from a Peer-to-Peer network.
@@ -1821,7 +1820,9 @@
We emphasize that we prefer \emph{abstraction}
level analysis as very recently better and better tightly structured
algorithms have been proposed.
Thus, we don't want to bind our system proposal to a specific algorithm
definitively as we expect
-that this development continues. Currently, we see Kademlia
\cite{maymounkov02kademlia} as the best algorithm for
+that this development continues.
+
+Currently, we see Kademlia \cite{maymounkov02kademlia} as the best algorithm
for
locating data efficiently in the Peer-to-Peer overlay. There are two
reasons for this. First, Kademlia's XOR-based distance function is superior
over distance functions of other systems (see section 2.3.2). Secondly,
Kademlia
@@ -1829,24 +1830,24 @@
(e.g., \cite{overneturl, edonkey2kurl, kashmirurl,kato02gisp}), which means
that
Kademlia's algorithm is simple and easy to implement.
-On top of Kademlia, we propose the usage of Sloppy hashing
\cite{sloppy:iptps03} which
+On top of Kademlia, we propose the use of Sloppy hashing \cite{sloppy:iptps03}
which
is optimized for the DOLR abstraction of tightly structured overlays. With
Sloppy hashing
-we can provide locality properties for the Fenfire system.
+we can provide locality properties for the Fenfire system which may be useful
+within a small group of working people.
For better fault tolerance and self-monitoring for Fenfire, we propose
techniques
presented by Rowston et al. \cite{rowston03controlloingreliability}. With
these
techniques, we can ensure the performance of the Fenfire system in a highly
adverse conditions, such
as sudden network partition, or highly dynamic and heterogeneous environment.
-Finally, for more efficient data transfer, we can use variable techniques for
this purpose.
+Additionally, for more efficient data transfer, we can use variable techniques
for this purpose.
For small amounts of data, HTTP can be used \cite{rfc2068}. For large amounts
of data, we can use
multisource downloads for better efficiency and reliability. Specifically, the
technology based
on rateless erasure codes \cite{maymounkov03ratelesscodes} seems very
promising.
Furthermore, multisource downloads can be used for decreasing load of a
certain peer, thus avoiding query
hot spots in the system \cite{ratnasamy02routing}. Current client-server
implementation of Fenfire uses
standard single source downloads (HTTP) and SHA-1 \cite{fips-sha-1}
cryptographic content
-hash for verifying the integrity of data by recomputing the content hash
-for Storm block. In face of multisource downloads, Fenfire must support
+hash for verifying the integrity of data. In face of multisource downloads,
Fenfire must support
tree-based hashes\footnote{With multisource downloads, tree-based hash
functions can be used
to verify fixed length segments of data. If hash value of data segment is
incorrect,
we need only to fetch \emph{segment} of data (instead of whole data) from
@@ -1855,17 +1856,17 @@
\subsection{Methods}
-We use the DOLR abstraction of the tightly structured approach since DOLR
systems locate data without
+In our data lookup methods, we use the DOLR abstraction of the tightly
structured approach since DOLR systems locate data without
specifying a storage policy explicitly \cite{rhea03benchmarks}, i.e., each
participating peer hosts
-the data they are offering and the overlay maintains only the \emph{pointers}
to the data.
-DHT-based storage systems, such as CFS \cite{dabek01widearea} and PAST
\cite{rowstron01storage}, may have
+the data they are offering and the overlay maintains the \emph{pointers} to
the data.
+Storage systems based on the DHT abstraction, such as CFS
\cite{dabek01widearea} and PAST \cite{rowstron01storage}, may have
severe problems with load balancing in a highly heterogeneous environment
\cite{rao03loadbalancing}. The problem is caused by peers
which may not be able to store relatively large blocks, assigned randomly by
the mapping function of the overlay.
For simplicity, we assume that we have resolved the construction of the
''virtual file'' before locating any Storm blocks, i.e.,
when assembling the ''virtual file'' we know all the Storm blocks, which are
required to complete the ''virtual file''.
Also, we don't respond to the security issues related to Peer-to-Peer systems,
since there is no working solution
-available yet. Thus, we either assume that Fenfire has a reliable technique
for identifying individual entities, or
+available yet. We either assume that Fenfire has a reliable technique for
identifying individual entities, or
there are no hostile entities among participating peers, i.e., Storm blocks
can be identified correctly (e.g., when
performing searches). In the next subsection, we discuss security problems in
more detail.
@@ -1919,19 +1920,19 @@
Perhaps the most biggest issue in Peer-to-Peer systems is the non-maturity of
security technologies. For the Fenfire system, one security related problem
occurs when a user wants to
perform a global data lookup with a given pointer; how the user is able to
verify
-the correctness of the search results, i.e., how she or he knows which one is
the
+the correctness of the search results, e.g., how she or he knows which one is
the
correct Storm block ? Another problem related to the Fenfire's
security is that if a user downloads data from the network to local computer
and after a network disconnection, user wants to verify \emph{off line} the
authenticity of data. Finally, if a data lookup is performed by a user, but
there is no reply
from the Fenfire system, how are we able to know if this was the Spam attack
\cite{naor03simpledht},
or the data really doesn't exist in the system ?
-However, these problems are not only limited to the Fenfire system as it
+These problems, however, are not only limited to the Fenfire system as it
concerns all Peer-to-Peer computer systems.
-Obviously, optimal solutions to all security issues would be that digital
+Optimal solutions to all security issues would be that digital
signatures are included to every message which are sent to the system or the
use of working PKI-based
-certificate distribution. As security technologies come more mature, we wish
to apply these
+certificate distribution. As security technologies become more mature, we wish
to apply these
technologies with Fenfire, if applicable.
\chapter{Conclusions and future work}
@@ -1943,21 +1944,20 @@
problems into the three sub-categories: security related problems,
performance related problems and miscellaneous problems.
-Then we gave a brief overview of the Fenfire system and xanalogical storage
model. We also
+Then we gave a brief overview of the Fenfire system and the xanalogical
storage model. We also
described Storm software module.
In the last chapter, we evaluated existing Peer-to-Peer approaches with regard
-to Fenfire's needs. We see that the tightly structured approach is the
-best alternative to Fenfire's needs for the following reasons. First, our
Storm design uses \emph{semantic-free references}
-(block identifiers and pointers) for locating data in distributed
-networks. As the authors of \cite{balakrishnan03semanticfree},
-we also agree that references should be semantically-free in next-generation
-reference resolution services. Second, by using
-the DOLR abstraction of tightly structured overlay and Sloppy hashing , we can
-minimize the lack of locality in the tightly structured approach, i.e., we are
able to locate nearby
-data without looking up data from distant peers. Third, we believe that issues
-related to tightly structured overlays are solved in the near future, because
of
-wide and intensive co-operation among research groups.
+to Fenfire's needs. Currently, we see that the tightly structured approach as
the
+best alternative to Fenfire's needs for the following reasons.
+First, both Storm and tightly structured overlays use globally unique
identifiers for each data item. Therefore,
+it is feasible to use tightly structured overlays for \emph{locating} Storm
blocks efficiently.
+Second, the unstructured and semantic-free properties of Storm identifiers
enables
+the use of general purpose Reference Resolution Services (RRS)
\cite{balakrishnan03semanticfree} on
+top of the tightly structured overlay. As the authors of
\cite{balakrishnan03semanticfree},
+we also agree that references should be semantically-free in next-generation
RRS systems.
+Third, we believe that issues related to tightly structured overlays will be
solved in
+the near future, because of wide and intensive co-operation among research
groups.
Our future work includes a support for searching transclusions and xanalogical
links in Peer-to-Peer environment. Preliminary analysis have shown
@@ -1966,8 +1966,6 @@
database systems may prove to be useful. Some fundamental results
regarding Peer-to-Peer and database systems have already been
presented in \cite{gribble01p2pdatabase}.
-
-We will implement a Fenfire Peer-to-Peer prototype in the near future.
\bibliographystyle{gradu}
\bibliography{progradu}
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., (continued)
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/25
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/25
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/25
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/25
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/25
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/26
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/26
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/26
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/26
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/26
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...,
Hermanni Hyytiälä <=
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/27
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/27
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/27