[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GNUnet-developers] Keyword queries shouldn't reveal the namespace

From: Kenneth Almquist
Subject: [GNUnet-developers] Keyword queries shouldn't reveal the namespace
Date: Sun, 24 Jun 2012 22:35:45 -0400

1.  The Issue

In GNUNET, keywords are passed through a one-way function before being
published or queried.  However, the namespace is published and included
in all queries.  There are several reasons why this is undesirable.

First, namespaces, like keywords, may be targets for censorship.  In
fact, the contents of a namespace are likely to represent a particular
viewpoint.  This should make them a particularly attractive target for
censorship by Western governements, which generally want to censor
particular types of materials in a way that doesn't restrict the
distribuation of other types of materials that are not targets for
censorship.  GNUNET uses a one-way function to hide the keywords in
queries.  It should hide the namespace as well, for the same reasons.

Second, the content of a GNUNET query can be determined by guessing.
The current design makes this type of attack unnecessarily easy by
revealing the namespace, meaning that the attacker only has to guess
the keyword, rather than being forced to guess a pair of values consisting
of a namespace and a keyword.

Third, specifying the namespace in a query (rather than making all queries
look alike to any attaker which fails to guess the query) might make
attacks based on traffic analysis easier.

Fourth, the current design treats queries of the global namespace
differently that queries of other namespaces.  This means that GNUNET
has to support two types of keyword searches, which makes the system
more complex at a conceptual level, and may make the code more complex
as well.  Consider, for example, that GNUNET currently guarantees that
if a response to a query in the global namespace contains a valid
signature, then the response was generated by someone who knew the
keyword.  However, if the query was for a namespace other than the
global namespace, then there is no such guarantee.  Having a single
keyword search mechanism that applied to all namespaces would eliminate
this sort of discrepancy, making the security properties of GNUNET
simpler and easier to understand and reason about.

2.  Our Proposal

In the current design of GNUNET, searches in the global namespace use
K-blocks.  Our proposal, in a nutshell, is to make a K-blocks use a
<namespace, keyword> pair, rather than just a keyword, as the search
key.  KS-blocks, currently use for searches of a keyword in namespaces
other than the global namespace, go away.

The remainder of this section explains the technical details of how this

2.1.  DSA

Our design uses DSA to generate digital signatures.  To use DSA, it is
necessary to select three parameters:  p, q, and g.  A single set of
parameters should be chosen for use by all GNUNET participants.

Private keys in DSA are integers in the range 1 thought q - 1.  To simplify
the exposition, we will assume for now that DSA also allows 0 to be used
as a private key.  We will revisit this assumption in section 2.6.

If x is a private key, the corresponding public key is g^x mod p.

2.2.  Namespaces

As with the current GNUNET design, a public/private key pair define a

One namespace is designated as the global namespace, and its private
key is made public so that anyone can publish to the global namspace.
Any namespace could be used as the global namespace, but for aesthetic
reasons I would chose 0 as the private key for the global namespace.

2.3.  Generating the Signature Key for K-blocks

Each K-block is signed by a key which is derived from the namespace and
the keyword.  Let x be the private key of the namespace.  Let h be a
value, in the range 0 through q-1, generated by hashing a combination of
the public key of the namespace and the keyword.  The private key used
to sign the K-block is x+h mod q.

This prevents a K-block from being forged by anyone who doesn't know
both the namespace private key and the keyword, because without knowing
these it is impossible to generate a valid signature.

We will discuss the details of signing in sections 2.6 thorough 2.8, but
first we consider query generation.

2.4.  Generating Queries

To perform a query, it is necessary to know the public key which
corresponds to the private key used to sign the K-blocks being
searched for.  The private key is x+h mod q, so the public key is:

        g^(x+h mod q) mod p

To compute the this without knowing the value of x (which is the
private key of the namespace), we proceed as follows.  DSA requires
the parameters be chosen such that

        g^q mod p = 1


        g^(x+h mod q) mod p = g^(x+h) mod p

Standard algebra tells us that

        g^(x+h) mod p = (g^x * g^h) mod p
                      = ((g^x mod p) * g^h) mod p

g^x mod p is the namespace public key, and h is a hash of the namespace
public key and the keyword.  Therefore, a query can be generated by
anyone who knows both the namespace public key and the keyword.

2.5.  Contents of a K-block

A K-block contains the following values, which are essentially the
same as in the existing design:

1)  The payload (typically a URI followed by metadata).  This is the
data returned by search that matches the K-block.  The payload is
encrypted using a symmetric encryption key generated from a hash of
the namespace public key and the search keyword.  This means that only
someone who knows the search criteria (namespace and keyword) can decode
the payload.

2)  The public key used to sign the payload.  Since this key is derived
from the namespace and the keyword, it (or a hash of it) can be used
as the search key.

3)  A digital signature of the encrypted payload using the public key
listed above.  A valid signature can only be generated by someone who
knows both the private key of the namespace being searched and the
keyword being searched for, so the signature can be checked to filter
out bogus K-blocks.

2.6.  Zero as a Private Key

As noted in section 2.1, the DSA specification does not permit a private
key of zero.  There are two ways we can deal with this.  One is to modify
the procedures described in sections 2.3 and 2.4 to avoid a private key
of zero.  If the sum x+h mod q turns out to be zero, rather than using zero
as the private key, we hash a combination of the namespace public key and
the keyword to produce a value in the range 1 through q-1, and use that
value as the private key.  In section 2.4, we don't compute the private
key, so rather than comparing the private key to 0, we compare the public
key to the public key which corresponds to a private key of zero.

Although the approach described in the preceding paragraph would work,
it appears unnecessary.  The requirement that the private key be
non-zero doesn't appear to be necessary for correctness, because a
look through the proof of correctness for DSA reveals no steps that
depend upon the assumption that the private key is nonzero.  Nor is
the requirement that the private key be non-zero necessary for security,
because the difference in security between DSA variants with or without
the requirement that the private key is nonzero can be proved to be

We now present the proof referred to in the preceding sentence.  In
what follows, we will use the name DSA1 to refer to the standard DSA
algorithm, where the minimum value for the private key is 1, and DSA0
to refer to a variant where then minimum value of the private key is
zero rather than one.

Suppose that we have an algorithm that will break DSA0 in an expected
time T, assuming that the private key is randomly selected with a uniform
distribution.  This algorithm will also break DSA1.  To determine maximum
run time for this algorithm when used to break DSA1, we assume that the
run time of the algorithm is zero when used to break an instance of DSA0
where the private key happens to be zero.  It then follows that the
expected run time if the private key is non-zero will be T * (q/(q-1)).
Since the actual run time for a zero key must be greater than zero, the
actual time to break DSA1 must be less than T * (q/(q-1)).  For a
realistic value of q, the difference between 1 and (q/(q-1)) is trivial.

Conversely, suppose that we have an algorithm that will break DSA1 in
an expected time T.  We can break DSA0 by first checking whether the
public key is 1 (which corresponds to a private key of 0), and then
running the algorithm to break DSA1 if the check fails.  The expected
run time will be C + T * ((q-1)/q), where C is the time required to
compare the public key with 1.  Assuming that DSA has any meaningful
resistance to attack, the value of C will be trivial compared to the
value of T.

In short, the security of DSA0 and DSA1 are essentially identical, so
there is no reason not to use DSA0.

2.7.  Generating a value of k for signing K-blocks

Constructing a digital signature using DSA requires chosing a value of k
in the range 1 through q-1.  We want to chose k using a deterministic
fashion so that if the same K-block is published more than once, the
signature won't change.  In addtion, we want to be sure that we don't
chose k in a way that will help an attacker.  To generate k, we compute
a cryptographic hash of the private key combined with the data being
signed.  This ensures that:

1)  At attacker who does not know the private key cannot determine the
    value of k.

2)  The probability of using the same value of k for two different
    signatures, if either the signing key or the data being signed is
    different, is the same as if k were chosen using a true random
    number generator.

3)  Repeatedly signing a given value with a given key will always use
    the same value of k, thereby producing the same signature.

The data being signed (the payload) is encrypted.  We could use the
unencrypted payload rather than the encrypted payload as input to
the hash function.  It doesn't matter which we use if the hash function
is secure.  If the hash function is not secure, using the unencrypted
payload won't really help because anyone who received the K-block as
a query result will be able to decrypt the payload.  So we specify that
the encrypted payload is to be used as input to the hash function, but
that choice is arbitrary.

2.8.  Avoiding Collisions

In section 2.3, we specify that h is based on a hash of the namespace
public key and the keyword.  In this section, we explain why it is
necessary to include the namespace public key in the input to the hash.

Suppose that the only input to the hash was the keyword.  Then if we
had two keywords K1 and K2, we would have corresponding hash values
h1 and h2 which would be independent of the namespace involved in
the query.

Suppose further that we have a namespace N1 for which we know the
private key x1.  The private key associated with a search for K1 in
namespace N1 is then (x1 + h1) mod q.  We can then define a new namespace
N2 with a private key x2 = (x1 + h1 - h2) mod q.  The private key associated
with a search for K2 in namespace N2 will then be
    (x2 + h2) mod q = (x1 + h1 - h2 + h2) mod q = (x1 + h1) mod q
In other words, we have produced a collision where the searches <N1, K1>
and <N2, K2> both produce the same search key.  Including the namespace
public key as an input to the hash prevents this attack.

3.  Conclusion

The design in section 2 shows that it is technically feasible to hide
the namespace as well as the keyword in GNUNET queries.  That design
used DSA, because it has a long track record, but it should be possible
to substitute a different ElGamel variant if desired.

One of the stated goals of GNUNET is to provide deniability.  Hiding
the namespace in queries would improve the ability of GNUNET to meet
that goal.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]