bug-gne
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnupedia] From the "Department of Wacky Ideas"


From: Bob Dodd
Subject: [Bug-gnupedia] From the "Department of Wacky Ideas"
Date: Sat, 20 Jan 2001 11:57:04 -0800 (PST)

>From the "Department of Wacky Ideas"

I would like a free lunch please… And I would like that lunch to be:
virtually unlimited disk capacity, unlimted processor time, and
guaranteed up-time.  Oh, and I would like to be able to avoid legal
problems of storing certain type of material within national
boundaries. 

One solution to that, which would cover all of those lunchtime delights
would be:

Take a leaf out of SETI-at-Home, and the MP3 file sharing systems (the
new ones designed to get round being closed down by the RIAA). Instead
of having a single reference server (with mirrors), have hundreds if
not thousands of smaller servers, each holding a small (duplicated
multiple times) part of the whole. This leaves the reference server and
mirrors to hold the mappings of who has what, which machines are up,
and how nuch traffic each machine is currently receiving.

In practice this would mean, that opening the GNU/Alexandria web page
would do a redirection to an available "baby server" machine who will
handle your server requests for you (and perhaps set a cookie on your
machine to say what's happening…) When the user makes a query, the
information may be directly available on the "baby server", if not then
the "baby server" queries the reference server for a machine which can
resolve all or part of the query. The query could then be resolved.

There are some big advantages in this approach:

1) No single machine needs to hold the whole library/encyclopedia, so
sizing issues (and hopefully associated costs) become mute.

1) "Baby servers" may be chosen to hold information based on their
physical location, so for example we could avoild howling any World War
2 material or links on German servers (and hence avoiding the problems
AOL gets into over promotion of Nazia material). 

1) No single"baby server"  machine is taking all the hits. Also each
machine can limit its traffic to the levels allowed by its ISP. This
makes at least some aspects of performance (though not all…) mute.

1) With sufficient "baby servers", all holding random, dupilcated
sections fo the whole, we have a remarkably robust application, that so
long as enough local routing knowledge rmains in any disjoint network
segment, the database can remain at least partially usable. 

5) The distributed nature of the model also make the encyclopedia less
prone to effective political interference. If we can generate enough
local routing knowledge, it would be very difficult for countries (and
I'm thinking of the likes of China and Singapore here) to block all
portals to the enclyclopedia. It's not as simple as blocking
"yahoo.com"… Hence local censorship becomes mute.

6) In addition to chosing "baby server" content based on physical
location, it can also be chonen on the basis of network topography, so
that we can try to keep subject areas closely connected, and hence to
minimise the network traffic involved in queries.

Of course, even a free lunch has its down-side:

1) You need enough "baby servers" alive at any one time to keep the
encyclopedia alive. This is a problem for any "SETI" style application
in the build-up phase. Of course SETI didn't have that problem because
half the world chose to start looking fro ET in the first 24 hours…
Also, they could simply do less scanning when less machines were
available. With an encyclopedia, you need all the material available
all the time.

2) Not all machines running as "baby servers" are going to be running
the same OS, or even version of it, so code development and maintence
is likely to be higher than a classic "everything on one server andf
throw in a few mirrors" approach.

3) Response to a single query by a user will be slower than the same
request on dedicated server (assume there are no performance issues to
deal with). The more similar queries the user makes, the better the
response time, but the first hit will definietely be slower.

4) Updates of the database may take time to percolate through the
system, so that (worst case) the same query run from the same user
machine sequentially, could find different versions of the same
article. Pretty unlikely, unless there is a journalist watching of
course…

5) In any autonomous, self-healing, distributed database (which is what
we'er talking about), you have to be able to deal with joining lost
limbs back onto the body. Straightforawd to do (it's just some polling
based on time & date) but it's still extra work, extra complexity.

6) There is the question of content integrity when the data is not
securely held. That means entries need to contain signatures, or they
need public key ecryption of some sort. It also means that the
reference servers need to be randomly polling "baby servers" to check
up on the integrity of the material, and this "checking up" needs
protection against spoofing. Not a trivial task.

None of the above problems are unsolvable, and these kind of
high-availability databases do exist (though not quite as suggested
here). If you want an example of a real commercial company that uses
this apporach for high availablity, look at people like ObjectStore who
use this divide-and-conquer approach: if you had 3 servers, they would
divide your data into 3 parts, and each machine would hold 2/3 of the
whole so that machine A would hold parts a & b, machine B would hold
parts a & c, and machine C would hold parts b & c (or some combination
of the same…) So if any one server fails it is still possible to
continue.   With more machines, they would make smaller parts, and each
machine would perhaps hold more parts.

In terms of using a solution like this on GNU/Alexandria, it also has
the advantage of not having to be rolled out immediately. You can start
with a normal server/mirror approach, and then as the data and traffic
increase (and as the software matures), you can start switching over
(e.g. the server and mirrors start behaving as "baby servers"), and the
roll-out can be gradual. I


As I say, it's a wacky idea, but it might just work.

/Bob Dodd



__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices. 
http://auctions.yahoo.com/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]