Bloom Filter
Bloom Filter
What’s Bloom Filter
A Bloom filter is a data structure that’s used to test whether an element is a member of a set.
The primary advantage of a Bloom filter over other data structures is that it’s incredibly space-efficient
when dealing with large data sets.
However, this comes with a trade-off: while Bloom filters can definitively state if an element is not in the set,
it can only probabilistically determine if an element is in the set.
This means it has a certain rate of false positives.
In a Bloom filter, data is passed through several hash functions, with each function mapping the input to a
position within a bit array.
When an element is added to the Bloom filter, bits at the hashed positions are set to 1.
When testing for the presence of an element, the filter checks whether the bits at the hashed positions are set.
If they are, the filter returns “possibly in set”. If not, it returns “definitely not in set”.
Simple Java Implementation
1 | import java.util.BitSet; |
Unit tests1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;
public class BloomFilterTest {
private BloomFilter bloomFilter;
@Before
public void setup() {
bloomFilter = new BloomFilter();
}
@Test
public void testAddAndContains() {
String testValue = "test";
bloomFilter.add(testValue);
Assert.assertTrue(bloomFilter.contains(testValue));
}
@Test
public void testDoesntContain() {
String testValue = "test";
String otherValue = "other";
bloomFilter.add(testValue);
Assert.assertFalse(bloomFilter.contains(otherValue));
}
@Test
public void testEmpty() {
String testValue = "test";
Assert.assertFalse(bloomFilter.contains(testValue));
}
}