gnunet-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Propose new feature - Search Indexer WebService


From: Christian Grothoff
Subject: Re: [GNUnet-developers] Propose new feature - Search Indexer WebService for GNUnet
Date: Sat, 10 Nov 2012 12:07:29 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20121027 Lightning/1.0b1 Icedove/3.0.11

On 11/09/2012 06:32 PM, SMoratinos wrote:
> 
> The new service could be built on top of file-sharing,
> or not (i don't know). This is not a fs replacement
> but an extension or an alternative.
> Possible features :
>  - gnunet-search-db : search in the local database.

Search the local database already exists:
$ gnunet-search -n KEYWORD

>  - gnunet-notifier-db : notify user that new content, only uri
>     with metadata (like CHK) not the document, is arrived from
>     the network.

The issue is that you need the keyword/password to decrypt; if you have
that, you can just again run
$ gnunet-search [-n] keyword

all the time and it will notify if something arrives via migration (or
you can leave out the '-n' to actively request stuff).  Publishers can
also already use "gnunet-publish -r VALUE" to actively push data out
into the network.

>  - gnunet-indexer-db : a process which accept new entry from
>     network and put them in his database. A process wich send
>     new entry to the network.

gnunet-service-fs already does both, if you set the right options.
Set CONTENT_CACHING=YES, and your peer will accept new data from the
network.  Set CONTENT_PUSHING=YES and your peer will send content out
into the network (your own as well as any that you got via
CONTENT_CACHING, if that option is on as well).  So except that you
sound like you might want to limit this behavior to KSKs, no change is
required.

> Wwhat is this information which are store in the database.
> 
> It's like KBlock but with a all know K (i call it PBlock).
> For example K="gnunet_public" by convention.

By your own argument, you don't need a PBlock, KBlock with well-known K
does this. The issue you get with this is spam (try "test" right now,
lots of useless results...).

> All uploader who want that a publication became indexes by this new
> service must publish the content with keyword "gnunet_public".
> When an uploader publish content under the K="gnunet_public", then
> gnunet-indexer-db send a PBlock to the network.
> The PBlock will be propagated over the network.
> If the Key is know by all, so intermediaries could view metadata
> and uri, and that's what we want.
> Am I right ? Is it a problem ?

No, except that the changes you propose amount at best to adding an
option to
restrict content caching/migration to KBlocks and/or to KBlocks matching a
particular keyword.  Other than that, the existing code already has all
of this.

> About the size of the database.
> I have no idea of that size.
> In the documentation, i have understood that KBlock are
> appromatively 1% of the content.

Well, that always depends on the content size --- and the size of the
meta data stored in the KBlock.  The overhead for IBlocks amounts to
about 1%, and then the overhead for KBlocks is independent of the size
of the file (only depends on # keywords and amount of meta data).

> If I have in my database 10000 PBlock who reference
> 10000 files with an average size of 1Go.
> The content of my database will be 100Go, it's too huge.

10000 KBlocks can realistically take as little as ~10 MB, and at most 64
MB, never 100 Go. (I assume by "Go" you mean "GB").  Naturally, your DB
might add up to 2x for indices and other overheads, but IMO that's still
not significant for most applications/users.

> But this is not a problem if we keep PBlock only for a few period.
> For example only which are 10days old. Because I actually want
> the fresh result not the old.

The code by default expires results approximately 6 months after the
initial publication, the source of the data may specify a longer
lifetime for its own datastore and then repopulate the network.  For
various reasons, the 6 months (or some other value) should be a
network-wide choice.  Again, due to the much smaller size of the KBlocks
(compared to your estimate), 6 months should not be a major issue in
practice (I think it is almost realistic for a peer to cache ALL blocks
EVER received for 6 months with normal bandwidth/disk-size ratios).

> And I want notification not actually store all gnunet network !
> If I want old content I can search via normal gnunet-fs-search.
> And guys who want store more historic can, if they want.

Just run 'gnunet-search [-n] YOURKEYWORDOFCHOICE" in a screen session
and you already have what you want. ;-).

Happy hacking!

Christian



reply via email to

[Prev in Thread] Current Thread [Next in Thread]