gnunet-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GNUnet-developers] DistribNet and GNUNet


From: Kevin Atkinson
Subject: [GNUnet-developers] DistribNet and GNUNet
Date: Tue, 23 Apr 2002 07:40:45 -0400 (EDT)

I am starting a new distributed network, DistribNet, which has similar
aims of Freenet and GNUNet except that my main focus is speed and
stability rather than anonymously.  Compared to freenet I like your
network a lot better.  If for now other reason is that it is not written
in god dam Java.  Sorry about that, it is just that I hate Java in just
about every way possible.  Seriously though, I like most things about your
project except for your extreme approach of using UDP packets for
*everything*, tiny 1K block sizes, and using the filesystem to store these
tiny blocks.  I plan to use a mixture of UDP and TCP packets and 32K block
sizes for splitting up files.

Below is an outline of DistribNet.  Let me know what you think.

Perhaps will can work together provided our design goals don't conflict 
too much.  Maybe in the future we can even merge the two networks.

I am most interested in your lookup services and accounting of GNUNet.

Feedback more than welcome.

                              DistribNet

A global peer-to-peer internet file system in which anyone can tap into
or add content to.

Kevin Atkinson (kevin at atkinson dhs org)
Last Modified: 2002-04-22
Project Page: http://distribnet.sourceforge.net/
Mailing list: http://lists.sourceforge.net/lists/listinfo/distribnet-devel

Meta Goals:

*) To allow anyone, possibly anonymously, to publish web sites with
    out having to pay to for the bandwidth for a commercial provider
    or having to put up with the increasingly add ridden free web
    sites.  One should not have to worry about bandwidth
    considerations at all.

*) Bring back the sense of community on the Internet that was once
    present before the internet become so commercialized.

*) Serve as an efficient replacement for current file sharing networks
    such as Morpheus and Gnutella.

*) To have the network stable and working before some Commercial
    company designs a propitiatory network similar to what I envision
    that can only be accesses via freely available but not FSF
    approved free license.

(Possibly Impossible) Goals:

*) *Really* fast lookup to find data.  The worst case should be O(log(n))
    and the average case should be O(1) or very close to it.

*) Actually retrieving the data should also be really fast.  Popular
    data should be sitting on the same subnet.  On average it should
    be as fast or faster than a typical web site (such as slashdot,
    google, etc.).  It should make effective use of the
    topology of the internet to to minimize network load and maximize
    performance.

*) General searching based on keywords will be build into the protocol
    from the beginning.  The searching faculty will be designed in
    such a way to make message boards trivial to implement.

*) Ability to update data while keeping old revisions around so data never
    disappears until it is truly unwanted.  No one person will have
    the power to delete data once it spreads throughout the network.

*) Will try very hard to keep all but the most unpopular content from
    falling off the network.  Basically before deleting a locally
    unpopular key it will first check if other nodes are storing the
    key and how popular they find the key.  If not enough nodes are
    storing the key and there is any indication that the data may be
    useful at a latter date it will not delete it unless it absolutely
    has to.  And if it does delete it it will first try uploading it
    to other nodes with more disk space available.

*) Ability to store data indefinitely if someone is willing to provide
    the space for it (and being able to find that data in log(n)
    time).

*) Extremely robust so that the only way to kill the network is to
    disable almost all of the nodes.  The network should still
    function even if say 90% of it goes down.

*) Extremely effect cpu-wise so that a fully functional node can run in
    the background and only take 1-2% of the CPU.

Applications:

I would like the protocol to be able to effectually support (ie with out
any ugly hacks that many of the application for Freenet use)

1) Efficient Web like sites (with HTTP gateway to make browsing easy)
2) Efficient sharing of files large and small.
3) Public message forms (with IMAP gateway to make reading easy)
4) Private Email (with the message encrypted so only the intended
    recipient can read it, again with IMAP gateway)
5) Streaming Media
6) Online Chat (with possible IRC or similar gateway)

Anti-Goals:

(Also see philosophy for why I don't find these issues that important)

*) Complete anonymity for the browser.  I want to focus first on
    performance than on anonymity.  In fact I plan to use extensive
    logging in the development versions so that I track network
    performance and quickly cache performance bugs.  As DistribNet
    stabilizes anonymity will be improved at the expense of logging.

    The initial version will only use cryptology when absolutely
    necessary (for example key signing).  Most communications will be
    done in the clear.  After DistribNet stabilizes encryption will
    slowly be added.  When I add encryption I will carefully monitor
    the effect it has on CPU load and if proves to be expensive I will
    allow it to be optional. 

    Please note that I still wish to allow for anonymous posting of
    content.  However, without encryption, it probably won't be as
    anonymous as Freenet or your GNUNet.

*) Data in the cache will be stored in a straight forward manner.  No
    attempt will be made to prevent the node operate from knowing what
    is in his own cache.  Also, by default, very little attempt will
    be made to prevent others from knowing what is a particular node
    cache.

Philosophy:

*) I have nothing against complete anonymity, it is just that I am
    afraid that both Freenet and GnuNet or more designed around the
    anonymity and privacy issues then they are around the performance
    and scalability issues.

*) For most type of things the level of anonymity that Freenet and
    GnuNet offers is simply not needed.  Even for copyrighted and
    censored material there is, in general, little risk in actually
    viewing the information because it is simply impractical to go
    after every single person who access forbidden information.  Most
    all of the time the lawsuits and such are after the original
    distributors of the information and not the viewers.  There for
    DistribNet will aim to provide anonymity for distributing
    information, but not for actually viewing it.  However, since
    there *is* some information where even viewing it is extremely
    risky, DistribNet will eventually be able to provide the same
    level of anonymity that Freenet or GnuNet offers, but it will be
    completely optional.

*) I also believe that knowing what is in one owns datastore and being
    able to block certain type of material from one owns node is not
    that big of a deal.  Unless almost everyone blocks a certain type
    of information the availability of blocked information will not be
    harmed.  This is because even if 90% of the nodes block say,
    kiddie porn, the information will still be available on the other
    10% of the nodes which, if the network is designed correctly,
    should be more than enough for anyone to get at blocked
    information.  Furthermore, since the source code for DistribNet
    will be protected under the GPL or similar license, it will be
    completely impractical for other to force a significant number of
    nodes to block information.  Due to the dynamic nature of the
    cache I find it legally difficult to hold anyone responsible for
    the contents of there cache as it is constantly changing.

DistribNet Key Types:

There will essentially be two types of keys.  Map keys and data keys.
Map keys will be uniquely identified in a similar manner as freenet SSK
keys.  Data keys will be identified in a similar manner as freenet's
CHK keys.

Map keys will contain the following information:

  * Short Description
  * Public Namespace Key
  * Timestamped Index pointers
  * Timestamped Data pointers

_At any given point in time_ each map key will only be associated with
one index pointer and one data pointer.  Map keys can be updated by
appending a new index or data pointer to the existing list.  By
default, when a map key is queried only the most recent pointer will
be returned.  However, older pointers are still there and may be
retrieved by specifying a specific date.  Thus, map keys may be
updated, but information is never lost or overwritten.

Data keys will be very much like freenet's CHK keys except that they will
not be encrypted.  Since they are not encrypted delta compression may
be used to save space.

There will not be anything like freenet's KSK keys as those proved to
be completely insure.  Instead Map keys may be requested with out a
signature.  If there is more than one map key by that name than a list
of keys is presented sorted by popularity.  To make such a list
meaning full every public key in freenet will have a descriptive
string associated with it.

Data Key Details:

Data keys will be stored in maximum size blocks of just under 32K.  If
an object is larger than 32K it will be broken down into smaller size
chunks and an index block, also with a maximum size of about 32K, will
be created so that the final object can be reassembled.  If an object
is too big to be indexed by one index block the index blocks themselves
will be split up.  This can be done as many times as necessary therefore
providing the ability to store files of arbitrary size.  DistribNet
will use 64 bit integers to store the file size therefore supporting
file sizes up to 2^64-1 bytes.

Data keys will be retrieved by blocks rather than all at once.  When a
client first requests a data key that is too large to fit in a block
an index block will be returned.  It is then up the client to figure out
how to retrieve the individual blocks.  

Please note that even though that blocks are retrived individually
they are not treated as trully independent keys by the nodes.  For
example a node can be asked which blocks it has based on a given index
block rather than having to ask for each and every data block.  Also,
nodes maintain persistent connections so that blocks can be retrieved
one after another without having to re-establish to connection each
time.

Data and index blocks will be indexed based on the SHA-1 hash of there
contents.  The exact numbers of as follows:

Data Block Size:                         2^15 - 128 = 32640;
Index block header size:                 40
Maximum number of keys per index block:  1630
Key Size:                                20

Maximum object sizes:

direct   => 2^14.99 bytes , about 31.9 kilo
1 level  => 2^25.66 bytes , about 50.7 megs
2 levels => 2^36.34 bytes , about 80.8 gigs
3 levels => 2^47.01 bytes , about 129 tera
4 levels => 2^57.68 bytes
5 levels => 2^68.35 bytes (but limited to 2^64 - 1)

Why 32640?

A block size of just under 32K was chosen because I wanted a size
which will allow most text files to fix in one block, most other files
with one level of indexing, and just about anything anybody would
think of transferring on a public network in two levels and 32K worked
out perfectly.  Also, files around 32K are rather rare therefor
preventing a lot of of unnecessary splitting of files that don't quite
make it.  32640 rather than exactly 32K was chosen to allow some
additional information to be transfered with the block without pushing
the total size over 32K.  32640 can also be stored nicely in a 16 bit
integer without having to worry if its signed or unsigned.

Storage:

Blocks are currently stored in one of three ways

1) block smaller than a fixed threshold (currently 1k) are stored using
   Berkeley DB (version 3.3 or better).

2) blocks larger than the threshold are stored as files.  The primary
   reason for doing this is to avoid limiting the size of data store
   by the maximum size of a file which is often 2 or 4 gb on most
   32-bit systems.

3) blocks are not stored at all instead they are linked to an external
   file out side of the data store much like a symbolic link links to
   file out side of the current directory.  However since blocks often
   only represent part of the file the offset is also stored as part
   of the link.  These links are stored in the same database that
   small blocks are stored in.  Since the external file can easily be
   changed by the user, the SHA-1 hashes will be recomputed when the
   file modification data changes.  If the SHA-1 hash of the block
   differs all the links to the file will be thrown out and the file
   will be relinked. (This part is not implemented yet).

Most of the code for the data keys can be found in data_key.cpp

Lookup Details:

Lookup will probably be done by using the chord protocol.  See 
http://www.pdos.lcs.mit.edu/chord/.

Language:

DistribNet is/will be written in fairly modern C++.  It will use
several external libraries however it will not use any C++ specific
libraries.  In particular I have no plan to use any sort of
Abstraction library for POSIX functionally.  Instead thin wrapper
classes will be used which I have complete control over and will serve
mainly to make the process of using POSIX functions less tedious
rather than abstract away the details of using them.


--- 
http://kevin.atkinson.dhs.org






reply via email to

[Prev in Thread] Current Thread [Next in Thread]