gnunet-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Idea for file storage in GNUnet


From: Christian Grothoff
Subject: Re: [GNUnet-developers] Idea for file storage in GNUnet
Date: Fri, 07 Dec 2012 14:18:45 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.10) Gecko/20121027 Icedove/10.0.10

On 12/07/2012 01:18 PM, hypothesys wrote:

Dear LRN and Christian,

Thank you for your replies :). Regarding latency, peers running close to
quota and cached blocks I may, of course, not be understanding something but
still believe this could work.

First of all an ignorance-derived question regarding LRN point that the
probability of finding a block of data in a random node not sharing it on
purpose would be small. Would this probability not also be additive (or
increase exponentially?) when the said random node, after checking against
the local data block index and not find anything, relayed on the data
request? Assuming the data distribution in the network is not too
non-uniform/asymmetrical I do not see why that would not be the case.

Right, this is not an issue. A bigger issue might be that having more data at a node that already has gigabytes might not help, as the node may not have a problem with finding answers for requests on its disk but rather with its ability to transmit the results (due to bandwidth limitations). In any case, IF a node is at the quota limit, having more disk space available somehow will obviously benefit performance to some degree.

Also, privacy issues derived from prioritizing downloading and publishing
blocks to the "normal/minimum threshold/allocated" data storage: Why would
this be necessary?

I'm not sure I understand what you mean by the normal/minimum threshold/allocate data storage.

GNUnet storage could operate as normal and this new
"dynamic/maximum threshold" storage serve not only to cache "hot" and
popular blocks, but also use a percentage of the dynamic storage to cache
copies of other nodes "normal/minimum threshold/allocated" storage. In this
way both latency AND censorship-resistance would improve. It would probably
need a mechanism to re-organize/swap files in normal storage depending on
the priority/distribution of files throughout the network though.

I agree that GNUnet storage should be able to operate as normal with this more 'dynamic' quota. Only details like Bloomfilter (re)size(ing) and actual resizing of the database would need to be looked at more closely, not the actual mechanisms for space allocation.

I may be making a gross oversimplification here but it feels as if this
increased dynamic storage would add "capacity/ability/versatility" to
GNUnet, which could in turn, depending on the implementation, be used to
boost one, or several at the same time, feature(s) of GNUnet.

There are many ways to 'boost' features of GNUnet; right now, maximizing available disk space simply seems much less important to me compared to known issues with bandwidth management (keyword: ATS), ease-of-use/installation, routing (keyword: mesh) & various bugs.

But I'm not opposed to adding this one to the list; just be aware that we already have about 100 items on that list ;-).


Happy hacking!

Christian

Once again if I am wrong please say so. Not from the field ;)

Cheers,

hypothesys



Christian Grothoff-6 wrote:

On 12/07/2012 08:34 AM, LRN wrote:
And "spare" is the problem. I can easily spare 20 or 40 gigabytes, but
100 or 200 is somewhat trickier. I might have that kind of space now,
and be willing to give it to GNUnet, but i might want that space back
at some point. Not sure what GNUnet will do right now, if i shut down
my node, reduce the datastore size, then start the node up again.
Probably discard lowest-priority blocks until datastore shrinks to the
new limit?

Yes, that's what it would do.  However, there is a caveat: the
mysql/sqlite/postgres database that is involved might be happy to delete
the records, but might not automatically reduce its file system space
consumption.  So you may have to additionally trigger some
database-specific routine to force the database to defragment/relinquish
its allocation/garbage collect/whatever.

Doing this may (temporarily) double your space requirements, depending
on the database.  So this is an "implementation detail" that would make
an automatic 'shrink if disk is full' implementation somewhat harder
(but likely not impossible, as you can predict the necessary space for
the reorganization).  Alternatively, one _may_ be able to use multiple
database files and just delete one of those entirely once the quota is
reached (this depends on the database backend that is being used).

Now, having a minimum space allocated to the datastore, and then just
using N% of the remaining free disk space for for datastore too, while
it's available - that really makes the decision easier. If GNUnet is
then taught to use pre-allocated datastore for important blocks (files
being downloaded or published; what are privacy issues here?), that
would mean that your node will serve _your_ interests first, and will
use the free space available to serve the network as best as it can.

I don't think there is a problem here.  We already have routines to
shrink-to-quota which are triggered if we are above quota (due to
additional insertions or due to quota being lowered).

It should maintain either F% of space free, or G gigabytes (whichever
is larger). Obviously, F and G are configurable (i.d say - default F
to 20, and G to 20; unless GNUnet daemon that would reclaim free space
would be a slowpoke, 20 gigabytes should give it enough time to react).
It should also be completely disabled for SSDs, IMO. Because they are
small to begin with, _and_ because their performance degrades greatly
as they are filled with data.

I suspect those arguments may not hold for long as SSD technology
progresses...

Thus the idea is the same as with CPU resources - you set up low and
high thresholds for CPU load that GNUnet can cause. It will go as high
as the high threshold when uncontested, and will go down to the low
threshold when other processes compete for CPU resources with GNUnet.
Same for storage - use large portion of available free space for
datastore (primarily - for migrated and cached blocks), but be ready
to discard all that, and go as low as the size of the pre-allocated
datastore.

Well, LRN, if you think peers actually run close to quota, there is a
nice GNUNET_UTIL-call for starters: GNUNET_DISK_get_blocks_available.
Adjusting the quota option in datastore based on that should not be too
hard for you; the real bitch will be testing the various backends to
make sure that they actually reduce disk space consumption --- and I
guess reliably finding out which partition MySQL/Postgres actually store
their data on might also be not so easy...


Happy hacking! ;-).

-Christian

_______________________________________________
GNUnet-developers mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/gnunet-developers






reply via email to

[Prev in Thread] Current Thread [Next in Thread]