[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc...
From: |
Hermanni Hyytiälä |
Subject: |
[Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc... |
Date: |
Tue, 17 Dec 2002 05:36:21 -0500 |
CVSROOT: /cvsroot/gzz
Module name: gzz
Changes by: Hermanni Hyytiälä <address@hidden> 02/12/17 05:36:21
Modified files:
Documentation/misc/hemppah-progradu: research_problems
Log message:
Some initial answers to research problems
CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/research_problems.diff?tr1=1.4&tr2=1.5&r1=text&r2=text
Patches:
Index: gzz/Documentation/misc/hemppah-progradu/research_problems
diff -u gzz/Documentation/misc/hemppah-progradu/research_problems:1.4
gzz/Documentation/misc/hemppah-progradu/research_problems:1.5
--- gzz/Documentation/misc/hemppah-progradu/research_problems:1.4 Tue Dec
17 03:27:50 2002
+++ gzz/Documentation/misc/hemppah-progradu/research_problems Tue Dec 17
05:36:21 2002
@@ -47,8 +47,7 @@
+keyword/fuzzy search possible
-not scalable
-huge network traffic
--not fast routing (in fact, there is no upper limit, because there are
multiple simultaneous
-breadh-first searches in the network)
+-not fast routing
-no guarantee that all data will be located
-Example systems: Gnutella, Fastrack family (Kazaa, Morpheus), JXTA Search,
Gnutella2
@@ -82,12 +81,69 @@
Kademlia: O(log n)* O(log n) O(log n)
Viceroy: O(log n) O(1) O(log n)
Small Worlds: O(1) O(1) O(log^2 n)
-Flooding: N/A O(1) "high"
+Flooding: N/A** N/A** N/A**
Hybrid: N/A N/A N/A
-Social: N/A O(1) N/A
+Social: N/A*** N/A*** N/A***
* = In Kademlia, there is no action required when nodes leaves the system
+** = Please see http://www.darkridge.com/~jpr5/doc/gnutella.html for details
+
+** = From p2p-hackers mailinglist:
+- - -
+
+> Btw, how well Alpine can scale (e.g. number of users) ? Do you have any
"real-
+> life" experiences from Alpine (how social-connection paradigm really works
+> etc.) ? What about search performance in Alpine (any big O's) ? Security
+> (PKIs) ?
+
+Scalability is the big question, as it is hard to guage real world
+scalability without an actual real world network :-)
+
+I have done some preliminary scalability testing on the OSDL systems
+( http://www.osdl.org/ ) using a pair of 4way SMP systems on a gigabit
+network link. I was able to establish and communicate with over 4 million
+concurrent DTCP/Alpine connections between them, using about a gigabyte of
+RAM.
+
+On lower end hardware and bandwidth (5 - 10M of memory, DSL/cable) peers
+groups would be much smaller, 10,000 to 25,000 concurrent.
+
+The main feature of alpine that allows it to scale with less effort is the
+fact that all communication is direct, using a lightweight UDP based
+transport. The number of connections is limited only by the memory and
+bandwidth you have available to use.
+
+Search performance is also difficult to guage because of the social
+discovery mechanism employed. Each peer is continually tuning peer groups
+to increase the use of high quality peers with similar interests and
+removing peers that do not contribute or share interests.
+
+When you first join the network your query effectiveness is going to be
+low. As you use the network, query effectiveness increases given the
+feedback received from previous queries and how they affect the composition
+of the peer groups you use for resource discovery.
+
+This makes it nice for people who share interests in unpopular or obscure
+resources; they can eventually obtain a fast, effective peer group for
+finding these resources. This is in direct contrast to most other search
+mechanisms where obscure or unpopular resources are always more difficult
+to locate.
+
+I am focusing on usability for the next and future devel snapshots of
+alpine, so hopefully some real world use on a wider scale will allow
+me to answet your questions which much more detail in the future.
+Right now all I can provide you is rough guesstimates and intuitions..
+
+Last, regarding security, I would like to integrate PGP/GPG style
+assymetric cyphers and digital signatures, however this is a low
+priority. I am also considering an implementation of peer frogs
+( http://cubicmetercrystal.com/peerfrog/ ) for avoiding man-in-the-middle
+attacks in large decentralized peer networks (although this is geared
+more for large wireless peer networks)
+
+- - -
+
Insert/Delete:
Number of messages when a node joins or leaves the network.
@@ -137,19 +193,30 @@
-obviously a lot of extra traffic arises in the network
-approach specific pseudo thinking: "ask repeating from each node if it has
block, whose ID value is X"
-d) HS
-(-text missing)
-
-e) SDS
-(-text missing)
2.1.4. Open questions
(-This case is quite straight forward, since it resembles very much ordinary
"locate file and get it")
--Really, how much better are the second generation FBN systems (Gnutella2)
compared to the first generation FBN systems (Gnutella)
+-really, how much better are the second generation FBN systems (Gnutella2)
compared to the first generation FBN systems (Gnutella)
-efficiency ?
-netowork traffic ?
-will the data will be located, if it exists in the network ?
-scalability
+-when searhing, do we have to know the exact hash of block ID or is there
other alternatives ?
+ -metadata ?
+
+2.1.5. Answers to research problem
+-the most efficient algorithm: O(log n)
+ -DHTs requires O(log n) hops, log n neighbors, except Viceroy (however,
robustness suffers)
+ -SWNs requires O(log^2 n) hops, much less neighbors (constant, is not
depedent of n)
+-in DHTs and SWNs, it is possible to locate data in constant time, BUT it
requires knowing all n neighbors!!!
+-in basic scenario, DHTs and SWTs assume that we have to know value's key
beforehand
+-in this case, DHTs and SWNs require little bandwidth, because "network knows
where the block resides"
+-in this case, FBNs, Hybrids, Social Discovery require much bandwidth, since
there is no benefit from the block's ID at all (they have to ask constantly)
+-SWNs require less memory than DHTs, since there are less connections to other
nodes
+-SWNs relies less to neighbor nodes than DHTs
+-proposol: most efficient approaches for this research problem are DHTs and
SWNs, since they are based on key-value pairs (Storm has a key as block ID)
+
+
2.2. "Searching for most recent Storm block associated with specific urn-5
name, where
the block has been signed with a given key"
@@ -180,12 +247,31 @@
-one fore block IDs
-one for urn-5 names, which are associated with block IDs
-is this approach too difficult to maintain ?
-
-
+
-is there possibility, in the specific urn-5 name, to maintain information
about most recent block's ID for better search performance (or moreover,
-tree based structure for all blocks for specific urn-5 name) ?
+tree/list based structure for all blocks for specific urn-5 name) ?
-How CAs' data should be saved ?
+2.2.5. Answers to research problem
+-compared to previous research problem, the benefits of DHTs and SWNs in this
research problem are smaller
+ -we don't know which block is the most recent associated with a specfic
urn-5 name
+-in theory, though, the most efficient algorithm is O(log n), *assuming* that
we already know the most recent block's ID (see previos research problem)
+-currently, there is no "out-of-the-box" answer to this research problem
+-the question is, how we can efficiently associate urn-5 names with the most
recent block / to all blocks which are associated with it
+ For DHTs and SWNs:
+ -adaption of Benja's 1st idea: each node maintains a local hash table
(urn-5 name -> most recent local block ID) for every urn-5 names
+ which node hosts --> we don't have to check all blocks and their urn-5
associations
+ -adaption of Benja's 2nd idea: All urn-5 name mappings are stored as
<key, value[ ]>, where the key is urn-5 name's hash and value is a record
containing
+ block ID and timestamp of that block. So, when we store a block in our
system first time, we have to create a new key-value:
+ <hash_of_urn_5_name, [block id, block timestamp]> and route this
mapping to node which is "closest" to a hash value. Now when we want to find
the most
+ recent block associated with a specific urn-5 name, we do:
+ 1. locally compute a hash for given urn-5 string
+ 2. route the query to a node which hosts the given hash of
urn-5 ("closest" node)
+ 3. get most recent block's ID using the idea from previuos idea
+
+ In this approach, we don't have to perform additional searching and
sorting of mappings. And of course, we know that for given urn-5, only one node
+ hosts *all* the block information ("block history") for the urn-5,
since mappings are mapped to a single node, closest to urn-5 hash value.
+ Again, this should work fine under existing DHTs (and SWTs ?).
2.3. "Searching for Storm blocks associated with specific urn-5 name, where
specific date has been defined (or date range), and where Storm block
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/16
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/17
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc...,
Hermanni Hyytiälä <=
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/17
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/18
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/18
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/18
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/18
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/19
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/19
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/19