[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] How do updates work?

From: Christian Grothoff
Subject: Re: [GNUnet-developers] How do updates work?
Date: Sat, 8 Aug 2009 00:17:42 +0200
User-agent: KMail/1.9.9

Am Thursday 06 August 2009 01:01:41 schrieb Kenneth Almquist:
> I've been trying to understand how namespaces are supposed to support
> updates.
> As I understand it, a namespace advertisement contains a string called
> the root, which can be used in a namespace search.  A result of a
> search of a namespace may contain a string which can be used for a
> subsequent search, and it looks like the search code will initiate
> a search using this string automatically.
> Thus one could post a file under the keyword A with next_id set
> to B, and subsequently publish a file under keyword B with next_id
> set to C, and finally publish a file under keyword C.  This would
> allow someone to find all three files using repeated searches starting
> with A.  The question is:  how does this differ from publishing all
> three files under the keyword A?

Good question.  The answer is simple: you could tell people (for example, in 
an updated namespace announcement) that they should search for "B".  Then 
they would never see "A".  Or you could use an informal system:

MyContent-2007  update: MyContent-2008

Someone may then guess that the next update will be "MyContent-2009" and 
jump-ahead manually.  Those who don't know the system would find the 2009 
version as you describe it.  With more years (or finer granularity) this may 
become more useful.

> In terms of speed, a single search 
> for A is likely to be faster than performing a series of searches
> starting with A.  In terms of completeness, the answer is less clear.
> Possibly a chain of searches would be more likely to locate all the
> files, because each search only has to produce one result.  On the
> other hand, a single failed search in the middle of chain means that
> none of the files in the rest of the chain will be found, so I would
> suspect that the odds of the search finding *most* of the files would
> be higher with a single keyword search than with a chain once the
> chain reached a certain length.

Well, who said you need to do chains?  The system supports publishing multiple 
results under "A", so you could say that "B" updates "A" initially, and then 
later you can ALSO publish that "Z" updates "A".  Then, a search would follow 
both chains (presumably A->B->D->...->Z and A->Z). The real question in this 
case is of course if the GUI support for this is adequate (and I'm sure the 
answer is "no").

> If the above mechanism provides a way to perform updates, I don't see
> it.  When the ECRS paper states that, "since SBlocks are signed, it
> is possible to allow updates," the point is that updating an existing
> file involves deleting the old version, so we want the originator of
> the content to be able to do that, without allowing anyone else to do
> so.

Ah, well, maybe the paper is not precise here.  The idea was always to support 
finding of "more recent" versions, but always without the ability to 
remove "older" revisions.  We never intended to support deletions.

> A scheme to do this might work as follows: 
> Allow an SBlock to include a 16 bit version number.  (For backward
> compatibility, the version number is optional, and an omitted version
> number is equivalent to a version number of zero.)  The rules for
> dealing with version numbers are:
> 1)  Any time a host sees an SBlock with a given version number, it
>     gets rid of versions of the SBlock with lower version numbers.

So the version  number would have to be stored in plaintext, which is likely 
fine.  Now, we may want to allow multiple (valid/current) SBlocks per 
namespace, so we would need to refine this to say "lower version numbers" for 
SBlocks with the same identifier.  But identifiers are encrypted, so hosts 
can not tell if two SBlocks have the same identifier => oops.

> 2)  A host does not earn trust for supplying an obsolete version.
>     This means that if a host is doing a search, gets one version
>     of an SBlock, and then receives a later version, it should
>     revoke the trust credited to the host that provided the older
>     version and give it to the host that provided the newer
>     version.  This rule means that there is no incentive for
>     hosts to keep old versions around.

Other intermediaries would most likely not be able to tell that the version is 
obsolete.  Or worse: a host could deliberately keep the older versions, first 
supply the oldest (cash in), then the second-oldest (cash in again), etc..  
Combined with providing low expiration times this would give peers an 
incentive to keep older revisions (assuming the namespace is popular) and 
hand them out first.

> 3)  If a host receives an obsolete version of an SBlock, it can
>     request a refund of the trust expended on the search by
>     sending a more recent version of the SBlock.  This requires
>     a new message type.  This rule means that hosts which have
>     cached obsolete SBlocks are likely to be provided with updated
>     versions of those SBlocks.  (Otherwise, a host might cache
>     an obsolete version of an SBlock indefinitely, since it would
>     receive periodic search requests that matched it.)

To ask for a refund, you would have to supply proof (more recent SBlock), 
which would cost both peers bandwidth (more than the search query!) and IO 
(need to look up the block) and, possibly worse, might allow peers to abuse 
the system: the author of the namespace could just always make up a new 
revision and then ask for a refund from peers storing his SBlocks. So I'm not 
sure this can be done cleanly and efficiently.  Not to mention that asking 
for a refund would require peers to keep a charge-history (maybe the peer 
returned the SBlock without charging in the first place?), which would again 
increase costs. 

Finally, given that the sqstore provides long-term soft-state storage, we have 
some expiration of old, obsolete versions anyway.  So I'm not convinced that 
deletions in general are necessary.

> To allow a user with a given version of a file to see if a more recent
> version of the file exists, we need a variant of the keyword search
> which specifies the minimum version number that is being searched for.
> It would be conceptually simplest to say that a search with a version
> number of N would return any version greater than or equal to N.  To
> allow for reasonably efficent Bloomfield filtering, I would suggest
> that a search specifying version number 0 should query the data base
> for any version of the file, and a search with a nonzero version
> number N should query for versions N through at least N+4.

You can already do this: simply use version numbers for your identifiers.  
Use "0" for your root, then publish 1 as the first update, 2 for the second, 
etc.  Then if some user wants to query for versions > N, he just has to use N 
for the identifier.  So if you want a simple, linear versioning system, this 
would certainly be a good identifier-convention to use.  If you want to 
publish on a regular basis, using some date-format would likely be better.  
The current use of user-defined strings is in this respect a very flexible 
way to choose identifiers that can be adapted (by humans, not software) to 
fit many needs.  The only thing you don't get is deletion of older versions, 
but as a VCS-addict I tend to think of this as a good thing.

> Versioning only applies to search records.  Since DBlocks and IBlocks
> may be shared by different files, I don't see any way to explicitly
> get rid of the IBlocks and DBlocks from an obsolete version of a file.

That's where it is nice that sqstore is soft-state: DBlocks, IBlocks, SBlocks 
all will eventually be "expired" by the sqstore (and typically at the same 

> Once the obsolete version of the file no longer shows up in searches,
> there will be no more requests for the IBlocks and DBocks (unless
> they are, in fact, shared by a different file), and they can be pushed
> out of caches to make space for blocks thare are being requested.

That would also be true (we expire by expiration and popularity). 

Anyway, please take this just as comments and clarifications, this is a good 
discussion to have at this point in time (before 0.9.0's FS APIs are too far 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]