[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lsd0003] branch master updated: fix BF/CBF section
From: |
gnunet |
Subject: |
[lsd0003] branch master updated: fix BF/CBF section |
Date: |
Wed, 13 Jan 2021 23:20:58 +0100 |
This is an automated email from the git hooks/post-receive script.
grothoff pushed a commit to branch master
in repository lsd0003.
The following commit(s) were added to refs/heads/master by this push:
new ff4b147 fix BF/CBF section
ff4b147 is described below
commit ff4b147870e751f9712cd26aad77a73c76acac11
Author: Christian Grothoff <christian@grothoff.org>
AuthorDate: Wed Jan 13 23:20:12 2021 +0100
fix BF/CBF section
---
draft-summermatter-set-union.xml | 96 ++++++++++++++++++++++++----------------
1 file changed, 58 insertions(+), 38 deletions(-)
diff --git a/draft-summermatter-set-union.xml b/draft-summermatter-set-union.xml
index f39a4dc..74fc17d 100644
--- a/draft-summermatter-set-union.xml
+++ b/draft-summermatter-set-union.xml
@@ -177,34 +177,41 @@
<section anchor="background" numbered="true" toc="default">
<name>Background</name>
<section anchor="bf" numbered="true" toc="default">
- <name>Bloom Filter</name>
+ <name>Bloom Filters</name>
<t>
- A Bloom Filter (BF) is a space-efficient datastructure to
test if am element is part of a set of elements.
- Since a Bloom filter is a probabilistic datastructure its
possible to have false positives but false negatives
- are not possible.
+ A Bloom filter (BF) is a space-efficient datastructure to
test if am element is part of a set of elements.
+ Since a BF is a probabilistic datastructure, it is
possible to have false-positives: when asked
+ if an element is in the set, the answer from a BF is
either "no" or "maybe".
</t>
<t>
- A BF consists of multiple buckets, every bucket can be set
to 0 or 1. In the beginning all buckets are set
- to 0. To add an element to the BF the corresponding
buckets are set to 1. To map an element on the array of buckets
- a mapping function M is required. The mapping function is
non-injective an takes an element as input and outputs a
- deterministic bit stream of the length of the BF. The
mapping function is described by the following mathematical equation:
+ A BF consists of L buckets. Every bucket is a binary value
that can be either 0 or 1. All buckets are initialized
+ to 0. A mapping function M is used to map each element of
the set to a subset of k buckets. M is non-injective
+ and can thus map the same element multiple times to the
same bucket.
+ The type of the mapping function can thus be described by
the following mathematical notation:
</t>
<figure anchor="bf_mapping_function_math">
<artwork name="" type="" align="left" alt=""><![CDATA[
- --------------------------------
- # M: E->B^k (B=Z/l)
- --------------------------------
- # l = Number of bits per element
- # B = 0,1,2,3,4,...l
- # k = Number of buckets
- # E = Element from the set
- # Z = Natural Numbers Mod l
- --------------------------------
- Example: l=256, k=3
- M(E) = {4,6,255}
+ ------------------------------------
+ # M: E->B^k
+ ------------------------------------
+ # L = Number of buckets
+ # B = 0,1,2,3,4,...L-1 (the buckets)
+ # k = Number of buckets per element
+ # E = Set of elements
+ ------------------------------------
+ Example: L=256, k=3
+ M('element-data') = {4,6,255}
]]></artwork>
</figure>
+ <t>
+ A typical mapping function is constructed by hashing the
element, for example
+ using the well-known HKDF construction (FIXME: cite HKDF
RFC!).
+ </t>
+ <t>
+ To add an element to the BF, the corresponding buckets
under the map M are set to 1.
+ To check if an element may be in the set, one tests if all
buckets under the map M are set to 1.
+ </t>
<t>
Further in this document a bitstream outputted by the
mapping function is represented by
a set of numeric values for example (0101) = (2,4).
@@ -236,8 +243,13 @@
]]></artwork>
</figure>
<t>
- Its not possible to remove an element from the BF because
buckets can only be set to 1 or 0 so its not possible to
- differentiate between buckets containing one or multiple
elements. To remove elements from the BF a <xref target="cbf" format="title" />
+ The parameters L and k depend on the set size and must be
+ chosen carefully to ensure that the BF does not return too
+ many false-positives.
+ </t>
+ <t>
+ It is not possible to remove an element from the BF
because buckets can only be set to 1 or 0. Hence it is impossible to
+ differentiate between buckets containing one or more
elements. To remove elements from the BF a <xref target="cbf" format="title" />
is required.
</t>
</section>
@@ -245,13 +257,13 @@
<section anchor="cbf" numbered="true" toc="default">
<name>Counting Bloom Filter</name>
<t>
- A Counting Bloom Filter (CBF) is an extension of the <xref
target="bf" format="title" />. In the CBF the binary filed of the bucket is
replaced by
- an unsigned counter. This allows the removal of an
elements from the CBF.
+ A Counting Bloom Filter (CBF) is an extension of the <xref
target="bf" format="title" />. In the CBF, buckets are
+ unsigned numbers instead of binary values. This allows the
removal of an elements from the CBF.
</t>
<t>
- Adding an element to the CBF is similar to the adding
operation of the BF but instead of setting the bucket on hit to 1 the bucket
- is increased by 1. For example if two colliding elements
M(element1) = (1,3) and
- M(element2) = (0,3) are added to the CBF bucket 0 and 1
are set to 1 and bucket 3 (the colliding bucket) is set
+ Adding an element to the CBF is similar to the adding
operation of the BF. However, instead of setting the bucket on hit to 1 the
+ numeric value stored in the bucket is increased by 1. For
example if two colliding elements M(element1) = (1,3) and
+ M(element2) = (0,3) are added to the CBF, bucket 0 and 1
are set to 1 and bucket 3 (the colliding bucket) is set
to 2:
</t>
<figure anchor="figure_cbf_insert_0">
@@ -263,11 +275,10 @@
]]></artwork>
</figure>
<t>
- The order of a bucket is defined by the counter, if a
bucket contains two elements the counter is set to 2 so the order of
- the bucket is two.
+ The counter stored in the bucket is also called the order
of the bucket.
</t>
<t>
- To remove an element form the CBF the counter is decreased
by 1.
+ To remove an element form the CBF the counters of all
buckets the element is mapped to are decreased by 1.
</t>
<t>
Removing M(element2) = (1,3) from the CBF above:
@@ -280,19 +291,28 @@
+-------------+-------------+-------------+-------------+
]]></artwork>
</figure>
+ <t>
+ In practice, the number of bits available for the counters
is usually finite. For example, given a 4-bit
+ counter, a CBF bucket would overflow once 16 elements are
mapped to the same bucket. To efficiently
+ handle this case, the maximum value (15 in our example) is
considered to represent "infinity". Once the
+ order of a bucket reaches "infinity", it is no longer
incremented or decremented.
+ </t>
+ <t>
+ The parameters L and k and the number of bits allocated to
the counters should depend on the set size.
+ An IBF will degenerate when subjected to insert and remove
iterations of different elements, and eventually all
+ buckets will reach "infinity". The speed of the degradation
will depend on the choice of L and k in
+ relation to the number of elements stored in the IBF.
+ </t>
</section>
</section>
<section anchor="ibv" numbered="true" toc="default">
<name>Invertible Bloom Filter</name>
<t>
- A Invertible Bloom Filter (IBF) is a further extension of the
<xref target="cbf" format="title" />. The IBF extends the <xref target="cbf"
format="title" /> with two more operations,
- beside insert and remove the IBF supports a decode and set
difference operation. This two extra operations allows the
- IBF to calculate small differences in big sets very
efficiently.
-
- IBF are useful when there is the need to check a set of
elements for the presence or absence of some
- elements in a big set efficiently.
-
+ An Invertible Bloom Filter (IBF) is a further extension of the
<xref target="cbf" format="title" />.
+ An IBF extends the <xref target="cbf" format="title" /> with
two more operations:
+ decode and set difference. This two extra operations are
useful to efficiently extract
+ small differences between large sets.
</t>
<section anchor="ibf_format" numbered="true" toc="default">
<name>Format</name>
@@ -556,7 +576,7 @@
<section anchor="modeofoperation" numbered="true" toc="default">
<name>Mode of operation</name>
<t>
- The set union protocol uses the above discussed topics
Invertible Bloom Filter and Strata Estimators.
+ The set union protocol uses the above discussed topics IBF and
Strata Estimators.
Depending on the state of the two sets there are different
strategies or operation modes how to efficiently
determinate missing elements in two sets of two peers.
</t>
@@ -848,7 +868,7 @@
<section anchor="messages_ibf_description" numbered="true"
toc="default">
<name>Description</name>
<t>
- The Invertible Bloom Filter message contains a slices
of the IBF.
+ The IBF message contains a slice of the IBF.
</t>
<t>
The <em>IBF</em> message is sent at the start of the
protocol from the initiating peer in the transaction
--
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [lsd0003] branch master updated: fix BF/CBF section,
gnunet <=