Approximate Membership Functions
On this page
Approximate Membership Functions (AMFs) are probabilistic data structures that efficiently can determine membership of a set, at the cost of false positives. They are typically much smaller than a full dataset, making them a useful pre-filtering method.
AMFs have been investigated in the context of reducing the number of HTTP requests when querying over a Triple Pattern Fragments interface.
1. Materials
- Academic article (Initial work that was built upon)
- Reproducible experiments
- AMF-enabled Comunica engine
- AMF-enabled LDF Server
2. Main findings
Learn more in our academic article.
AMFs lead to faster complete results
Due to the reduction of HTTP requests, complete results come in earlier. In some cases, the first result can be delayed.
Caching significantly speeds up query execution
An HTTP cache like NGINX achieves the best results, but additionally caching AMF filters server-side is not worth the effort.
Extreme false-positive probabilities slow down query execution
On average, a false-positive probability of 1/64 leads to the lowest overall query evaluation times for this experiment.
3. Recommendations for data publishers
Based on the conclusions of our experimental results, we derived the following guidelines for publishers who aim to use the AMF feature:
- Enable HTTP caching with a tool such as NGINX.
- Pre-compute AMFs (or at least cache) AMFs of size 10.000 or higher.
- If AMFs can be cached, prefer Bloom filters over GCS.
- Use a false-positive probability of 1/64.