gnunet-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Propose new feature - Search Indexer WebService


From: LRN
Subject: Re: [GNUnet-developers] Propose new feature - Search Indexer WebService for GNUnet
Date: Thu, 08 Nov 2012 23:49:55 +0400
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Thunderbird/19.0a1

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08.11.2012 17:37, SMoratinos wrote:
> 
> 3 / The Uploader publish things encrypted by namespace "a1" public
> key, it publish thinks to "goodfiles" namespace.
We'd have to avoid that, see below.

> 
> 4 / The namespace owner "a1" will continuously search for
> publications for namespace "a1". This suppose to be the default
> functionning of gnunet, a namespace search content for his own
> namespace. He find the new content and index only it's link to him
> database.
Your wording is quite unfortunate. I know what you mean "publications
intended for inclusion in a1", not "publications made in a1", because
only a1's owner can publish in a1. Searching your own namespace makes
no sense - only you can publish things in it anyway. And if you do
share its private key, then it is just the same as the global namespace.

> But how the downloader discover the "a1" namespace ? The problem is
> the same as the beginning.
Off-band. How do you know about TPB? You learn eventually, somehow.
Maybe google for it. Sadly, unless database contents are published on
a public web-site, they won't be googleable (and you won't be able to
discover database's existence initially by finding its contents), and
publishing them like that will attract attention, censorship, and may
threaten publisher's anonymity (OTOH, publisher may not be the same
entity as the database maintainer; still, who'd want to put himself in
danger like that? On a regular basis?)

> 
> If there is no central server, the link index database must be in
> all the network.
Yes, each user who regularly uses the database will maintain his own
copy (a closed community of users may opt to share one copy, but that
is their business).

> 
> All peers are a Database Owner, all the peer on the network have a
> copy of the Link Index Database.
No, only one node (or group of collaborating nodes) is the owner, able
to publish updated database in a1 namespace.

> When a database is updated, this update is propagated over the
> network.
Yes, since everyone will [eventually] get a copy. For the purpose of
efficiency database won't be monolithic, so you'd only need to
downloads a couple of megabytes to update existing copy.

> 
> Search became local ! This is no longer the search result which is 
> propagated but this is the Link Index Database.
Yes, although your local database will always be somewhat older than
the one database maintainer has (depending on how often you're able to
update it).

> Finally, I'm notified by all the network from updates. New
> contents are indexes by all the network continuously.
Yes, but GNUnet kind of does that already. Database is only different
in that it's moderated and allows discovering new stuff without
foreknowledge.

> 
> The drawback is that all peers must persit this database, and have 
> enough disk space but few Gbits is not a problem ?
Yes (also, see above about database sharing).

> And yes Gnunet doesn't work like this now.
Yes. Some things are pending features (DSA, always-namespace
searches), others are yet-to-be designed (basically what this
discussion is).

> 
> So what about that ?
> 
OK, i took some time to actually read [1] instead of just glancing
over the points.

Normal (global) search is like this (based on GHM 2010 talk):
K-block for keyword K contains payload R (payload is CHK and metadata).
H(K) is the hash of the keyword.
R is encrypted with a symmetric key derived from H(K), i.e. encrypted
payload is E_{H(K)} (R).
K is used to generate an RSA private/public key pair {PRIV_K, PUB_K}.
Publisher then appends PUB_K to E_{H(K)} (R), and signs all that with
PRIV_K to produce B, which is the K-block.
B can be produced by anyone, you only need to know the keyword.

Query initiator goes through the same thing - computes {PRIV_K,
PUB_K}, then produces hash of the public key H(PUB_K) and sends it as
a query.

Also anyone, who somehow gets a K-block, can check its signature
(verify that E_{H(K)} (R) + PUB_K was signed with PRIV_K, you only
need PUB_K itself for that).

Anyone, who has the K-block, can take the PUB_K part of it and hash
that, producing the same H(PUB_K), and remember that hash, then match
incoming queries against it. If it matches, then query initiator asked
for the corresponding K-block.

Since that stuff only depends on K, attackers can pre-compute PUB_K
and H(PUB_K) for any K's that they want to censor/monitor, but the
worst thing they can do is to refuse to re-transmit a query for K they
know. Or send valid K-blocks with garbage payload (yeah, that's
actually worse than dropping queries). But that is where namespaces
come in.




Now, what [1] proposes:
All K-blocks are published under _some_ namespace. The corner case is
the global namespace, private key for which is not a secret, so anyone
can share in that namespace.

PUB_N is the public key of a namespace, PRIV_N is the private key of a
namespace.
H(K+PUB_N) is the hash of the combination of keyword K and public key
of the namespace (as opposed to H(K), which is the hash of only the
keyword).
R is encrypted with a symmetric key derived from H(K+PUB_N), i.e.
encrypted payload is E_{H(K+PUB_N)} (R).
DSA private/public key pair {PRIV_K, PUB_K} is mathematically derived
from PRIV_N and H(K+PUB_N) (as opposed to generating it from K only).
Publisher then appends PUB_K to E_{H(K+PUB_N)} (R), and signs all that
with PRIV_K to produce B, which is the K-block.
B can NOT be produced by anyone, you need to know the keyword and the
private key of the namespace (it can be produced by anyone for global
namespace, as its PRIV_N is common knowledge, you just need to
know/guess the keyword).

Query initiator computes PUB_K (but not PRIV_K, as that requires
knowledge of PRIV_N) from PUB_N and H(K+PUB_N). PUB_N for global
namespace is known by everyone; how to learn PUB_N for other
namespaces not relevant here.
Then initiator produces hash of the public key H(PUB_K) and sends it
as a query.

Anyone, who somehow gets a K-block, can check its signature (verify
that E_{H(K+PUB_N)} (R) + PUB_K was signed with PRIV_K, you only need
PUB_K itself for that).

Anyone, who has the K-block, can take the PUB_K part of it and hash
that, producing the same H(PUB_K), and remember that hash, then match
incoming queries against it. If it matches, then query initiator asked
for the corresponding K-block.

Since that stuff depends on K and PUB_N, attackers can pre-compute
PUB_K and H(PUB_K) for any pairs of K's and PUB_N's that they want to
censor/monitor, but the worst thing they can do is to refuse to
re-transmit a query for K and PUB_N combinations they know. Obviously
the size of the list of forbidden H(PUB_K) that they must know is
multiplied by N (number of namespaces they want to censor) from the
size it had when only K was used for queries.
But they cannot forge K-blocks, since that requires knowing PRIV_K,
which, unlike the RSA case, is not known to everyone (except for the
global namespace, they can still poison keywords with garbage K-blocks
there).



Now, revenons a nos moutons.

We know that searching in a namespace will be secure. The remaining
problem is pushing publications to the anonymous database maintainer.

Your diagram mostly matches what i think, by the way, but after
thinking a bit i would now prefer to use the term "database
maintainer", not "database owner" ("owning" is not an activity we
should be concerned with, "maintaining" is).
Also, it doesn't really matter how uploader's namespace will be named.
While it will be beneficial for people to be able to query it directly
(doing normal namespace searches, once they learn of that namespace),
it's not a requirement that this namespace is advertised in any way -
people will eventually get CHKs for files in that namespace from the
database, without ever knowing uploader's namespace.

Anyway, the problem is that for normal K-block publication we have a
common shared "secret" - the keyword, known to both publisher and
query initiator. The cryptography builds on that.
For reverse publications we have nothing of the sort. Database
maintainer does not have any search criteria other than his own PUB_N
or things derived from it.
So it will go like this:
Uploader will publish a K-block in global namespace. It will be a
normal K-block, findable by a normal global namespace search.
The difference is that it will correspond to a keyword K that is
computed simply as H(PUB_N) (PUB_N is the public key of database
maintainer's namespace), and then it will go through all the normal
perturbations K goes through. It won't be something people will
normally find or search for. Also, will be easily guessable, since
PUB_N will, at some point, be well known to everyone. So censoring
this K-block (not re-transmitting queries for it) for adversaries will
be as easy as censoring a K-block for any other keyword they know in
advance and that doesn't change over time. It's not much worse than
normal K-block censorship, but for normal publications you can (and
will) use multiple different K, and my hope is that at least some of
them will not be well-known in advance (they will be made known
outside of GNUnet, at the moment of publication or shortly after it).
This H(PUB_N)-as-K will be a sitting duck, and won't change in a long
time.
Anyway, database maintainer will make a query using that keyword.
I expect that database maintainer will have to update namespace key
pair every now and then to avoid running out of Bloom filter space
(statistically it's large, but running the same search again and
again, and getting tons results, and then filtering them might give
too many false positives).

Also is that it will have the link (not CHK, that's a link to the
file; what's the acronym for namespace links? NSK? i forgot...) to
uploader's own namespace, in which the file must _also_ be published.
This is also akin to [2] (i'm still not sure how [2] will be
implemented; the idea that _i_ want to see in action is that finding
_any_ KBlock in global namespace for stuff that is _also_ published in
a non-global namespace namespace should allow you to learn that
non-global namespace, as right now you need to find that non-global
namespace first, _then_ search in it).
If upload goes through, that namespace will later be used by the
uploader to update his links in the database: database maintainer will
search for K of that K-block in that uploader's namespace, and update
the database from the search results -> much narrower, easier to do.
And/or database maintainer will just search the root element of that
namespace, and update the database from search results (i.e. uploader
will be able to not only update already published links, but also
publish new ones much faster). Again, root element of that namespace
will have to have special, agreed-upon format.

The K-block in question will also have some database-specific
information (category, etc, although THAT kind of info should not be
tied to that database implementation, and be made part of
GNUnet/libextractor specs and be usable by everyone).




Encrypting things with PUB_N of the database namespace doesn't really
give us anything (also, not a good idea to fill the net with K-blocks
that only 1 node in the whole network can decode, ever). Database
maintainer can't specify "only decodeable by my private key" as a
search criteria. So instead these uploads will be public, with extra
metadata attached.

Database maintainer will need to have a good machine to filter out
spam and fakes though.

Also, i think i should note that if you're thinking of mimicking
torrent indexers (moderated collections of well-categorized links) in
GNUnet this way, you should remember that you can't put ads on the
content of the database (or, rather, you can, but someone will just
automatically strip them off and re-publish the database without them
- - and people will use that version instead). And you won't have a
web-site to show ads on, to accept donations on or to sell stuff
(unless you want to just throw away anonymity/censorship resistance,
and play whack-a-mole the same way TPB does right now; or the database
viewer and format will be proprietary, in which case you won't see any
cooperation from us; and that still won't work in a long term).
So these databases will have to be numerous, small, easy to moderate,
and moderators will have to work for free.
Or you'd have to go back to variant one, where database only contains
things that database maintainer (who could be a group of persons)
discovered on his own - that way database maintainer won't need to
wade through swamps of spam/fakes targeted at him, just through loads
of crap that will populate the global namespace. Or cherry-pick other
people's namespaces (which the maintainer will learn passively, on his
own).

On GHM 2010 Grothoff also mentioned using web of trust to create
chains of namespaces to reduce spam - that could be used to great
effect (database maintainer will [eventually] learn a number of
passively trusted namespaces, and will be able to restrict [some]
searches to the WoT that spreads from these namespaces).

By the way, i'm not discussing the way moderators will communicate
with each other and synchronize the database updates among themselves.
They may or may not be anonymous to one another, and may or may not
use GNUnet for this.

[1] https://gnunet.org/bugs/view.php?id=2564
[2] https://gnunet.org/bugs/view.php?id=2185

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (MingW32)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQnAziAAoJEOs4Jb6SI2Cw0v4H/01qB6N785PxBodCV84S4EDx
CxUALMRTtChoGmByPo4uGfUdAYBKTy2mFQtdsDaPMzsH74ZTGFcbw4mLLhCwV0aA
bsxNeRORYac79/rqm05oh4/6VMsU5feFucaWakvgVkyM6/EmzUwKHv7PDd5kcbjZ
pMNMPqOLMtxaAopMHdXGYDzmCNEw04D/4CyuazQNyofU+rkbRV3dKyW1he960DpG
+PyHMkqDty34KQZIOGHj5M7ngAZZn3OIuqlTsc6Ywf3M/SnstIF70fbe9uqQeYl4
NGpIMOd64E4nZJC2ffxprSEbHh+6n/SHSe1R3qT1JTRHytchVNEqumfa9M3IqwQ=
=EbP7
-----END PGP SIGNATURE-----



reply via email to

[Prev in Thread] Current Thread [Next in Thread]