bug-gne
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnupedia] From the "Department of Wacky Ideas"


From: Tom Chance
Subject: Re: [Bug-gnupedia] From the "Department of Wacky Ideas"
Date: Sat, 20 Jan 2001 12:17:11 -0800 (PST)

Um, I think this is a very wacky idea, and a great
model being applied to the wrong situation. You see
with something like SETI, or Gnutella, it doesn't
matter if one server goes off line. In fact, so long
as some people are there, it still runs. But thats
nothing like Alexandria or Nupedia. Imagine if you
found the server(s) holding the nugget of information
you wanted had gone offline!

Then there's the databases. It would be an incredible
task to first of all have one central article
submission server, that would then somehow transfer
the article + media to the correct server, and then
reference that somewhere (where?). You would have to
have the web site on one server, but ho would you
search the database? It would take forever to to a
cross-web search, as Gnutella aptly shows. Even
Napster was a bit slow at searching, and it had the
info on a central server! You said yourself you'd have
problems keeping check on the baby servers too.

No, you'd get a very, very messy setup with redundant
info on some servers, half the information missing,
and a very confusing setup to put up, expand, use, and
program.

I see no disadvantages in a central server with
several (as many as possible) mirrors. The central
server would be, I assume, in the US, and even their
control-freak government can't shut down something
like this. Then there are plenty of countries in the
world where it could be mirrored. And I don't see how
the baby-server idea gets around the "Chinese
firewall" which prevents browsing of the web properly
anyway!

All you need is some sponsoring, and a really decent
setup with unlimited capabilities could be set up
easily with Perl, mod_backhand and mySQL. It would
handle any amoutn of data thrown at it, provided we
had the HDD space, and mirrors would all be set up in
the same way.

This way the resource is fast, reliable, and problem
free. Why change it with a model that would be fraught
with possible problems?

Tom Chance


--- Bob Dodd <address@hidden> wrote: > From
the "Department of Wacky Ideas"
> 
> I would like a free lunch please… And I would like
> that lunch to be:
> virtually unlimited disk capacity, unlimted
> processor time, and
> guaranteed up-time.  Oh, and I would like to be able
> to avoid legal
> problems of storing certain type of material within
> national
> boundaries. 
> 
> One solution to that, which would cover all of those
> lunchtime delights
> would be:
> 
> Take a leaf out of SETI-at-Home, and the MP3 file
> sharing systems (the
> new ones designed to get round being closed down by
> the RIAA). Instead
> of having a single reference server (with mirrors),
> have hundreds if
> not thousands of smaller servers, each holding a
> small (duplicated
> multiple times) part of the whole. This leaves the
> reference server and
> mirrors to hold the mappings of who has what, which
> machines are up,
> and how nuch traffic each machine is currently
> receiving.
> 
> In practice this would mean, that opening the
> GNU/Alexandria web page
> would do a redirection to an available "baby server"
> machine who will
> handle your server requests for you (and perhaps set
> a cookie on your
> machine to say what's happening…) When the user
> makes a query, the
> information may be directly available on the "baby
> server", if not then
> the "baby server" queries the reference server for a
> machine which can
> resolve all or part of the query. The query could
> then be resolved.
> 
> There are some big advantages in this approach:
> 
> 1) No single machine needs to hold the whole
> library/encyclopedia, so
> sizing issues (and hopefully associated costs)
> become mute.
> 
> 1) "Baby servers" may be chosen to hold information
> based on their
> physical location, so for example we could avoild
> howling any World War
> 2 material or links on German servers (and hence
> avoiding the problems
> AOL gets into over promotion of Nazia material). 
> 
> 1) No single"baby server"  machine is taking all the
> hits. Also each
> machine can limit its traffic to the levels allowed
> by its ISP. This
> makes at least some aspects of performance (though
> not all…) mute.
> 
> 1) With sufficient "baby servers", all holding
> random, dupilcated
> sections fo the whole, we have a remarkably robust
> application, that so
> long as enough local routing knowledge rmains in any
> disjoint network
> segment, the database can remain at least partially
> usable. 
> 
> 5) The distributed nature of the model also make the
> encyclopedia less
> prone to effective political interference. If we can
> generate enough
> local routing knowledge, it would be very difficult
> for countries (and
> I'm thinking of the likes of China and Singapore
> here) to block all
> portals to the enclyclopedia. It's not as simple as
> blocking
> "yahoo.com"… Hence local censorship becomes mute.
> 
> 6) In addition to chosing "baby server" content
> based on physical
> location, it can also be chonen on the basis of
> network topography, so
> that we can try to keep subject areas closely
> connected, and hence to
> minimise the network traffic involved in queries.
> 
> Of course, even a free lunch has its down-side:
> 
> 1) You need enough "baby servers" alive at any one
> time to keep the
> encyclopedia alive. This is a problem for any "SETI"
> style application
> in the build-up phase. Of course SETI didn't have
> that problem because
> half the world chose to start looking fro ET in the
> first 24 hours…
> Also, they could simply do less scanning when less
> machines were
> available. With an encyclopedia, you need all the
> material available
> all the time.
> 
> 2) Not all machines running as "baby servers" are
> going to be running
> the same OS, or even version of it, so code
> development and maintence
> is likely to be higher than a classic "everything on
> one server andf
> throw in a few mirrors" approach.
> 
> 3) Response to a single query by a user will be
> slower than the same
> request on dedicated server (assume there are no
> performance issues to
> deal with). The more similar queries the user makes,
> the better the
> response time, but the first hit will definietely be
> slower.
> 
> 4) Updates of the database may take time to
> percolate through the
> system, so that (worst case) the same query run from
> the same user
> machine sequentially, could find different versions
> of the same
> article. Pretty unlikely, unless there is a
> journalist watching of
> course…
> 
> 5) In any autonomous, self-healing, distributed
> database (which is what
> we'er talking about), you have to be able to deal
> with joining lost
> limbs back onto the body. Straightforawd to do (it's
> just some polling
> based on time & date) but it's still extra work,
> extra complexity.
> 
> 6) There is the question of content integrity when
> the data is not
> securely held. That means entries need to contain
> signatures, or they
> need public key ecryption of some sort. It also
> means that the
> reference servers need to be randomly polling "baby
> servers" to check
> up on the integrity of the material, and this
> "checking up" needs
> protection against spoofing. Not a trivial task.
> 
> None of the above problems are unsolvable, and these
> kind of
> high-availability databases do exist (though not
> quite as suggested
> here). If you want an example of a real commercial
> company that uses
> this apporach for high availablity, look at people
> like ObjectStore who
> use this divide-and-conquer approach: if you had 3
> servers, they would
> divide your data into 3 parts, and each machine
> would hold 2/3 of the
> whole so that machine A would hold parts a & b,
> machine B would hold
> parts a & c, and machine C would hold parts b & c
> (or some combination
> of the same…) So if any one server fails it is still
> possible to
> continue.   With more machines, they would make
> smaller parts, and each
> machine would perhaps hold more parts.
> 
> In terms of using a solution like this on
> GNU/Alexandria, it also has
> the advantage of not having to be rolled out
> immediately. You can start
> with a normal server/mirror approach, and then as
> the data and traffic
> increase (and as the software matures), you can
> start switching over
> (e.g. the server and mirrors start behaving as "baby
> servers"), and the
> roll-out can be gradual. I
> 
> 
> As I say, it's a wacky idea, but it might just work.
> 
> /Bob Dodd
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Auctions - Buy the things you want at great
> prices. 
> http://auctions.yahoo.com/
> 
> _______________________________________________
> Bug-gnupedia mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/bug-gnupedia


__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices. 
http://auctions.yahoo.com/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]