Research
Approximate Membership Functions

Approximate Membership Functions

On this page

Approximate Membership Functions (AMFs) are probabilistic data structures that efficiently can determine membership of a set, at the cost of false positives. They are typically much smaller than a full dataset, making them a useful pre-filtering method.

AMFs have been investigated in the context of reducing the number of HTTP requests when querying over a Triple Pattern Fragments interface.

1. Materials

2. Main findings

Learn more in our academic article.

AMFs lead to faster complete results

Due to the reduction of HTTP requests, complete results come in earlier. In some cases, the first result can be delayed.

Caching significantly speeds up query execution

An HTTP cache like NGINX achieves the best results, but additionally caching AMF filters server-side is not worth the effort.

Extreme false-positive probabilities slow down query execution

On average, a false-positive probability of 1/64 leads to the lowest overall query evaluation times for this experiment.

Query times for different false-positive probabilities

3. Recommendations for data publishers

Based on the conclusions of our experimental results, we derived the following guidelines for publishers who aim to use the AMF feature:

Enable HTTP caching with a tool such as NGINX.
Pre-compute AMFs (or at least cache) AMFs of size 10.000 or higher.
If AMFs can be cached, prefer Bloom filters over GCS.
Use a false-positive probability of 1/64.