Approximate Membership Functions

On this page

    Approximate Membership Functions (AMFs) are probabilistic data structures that efficiently can determine membership of a set, at the cost of false positives. They are typically much smaller than a full dataset, making them a useful pre-filtering method.

    AMFs have been investigated in the context of reducing the number of HTTP requests when querying over a Triple Pattern Fragments interface.

    1. Materials

    2. Main findings

    Learn more in our academic article.

    AMFs lead to faster complete results

    Due to the reduction of HTTP requests, complete results come in earlier. In some cases, the first result can be delayed.

    Query times for F3

    Caching significantly speeds up query execution

    An HTTP cache like NGINX achieves the best results, but additionally caching AMF filters server-side is not worth the effort.

    Query times for caching

    Extreme false-positive probabilities slow down query execution

    On average, a false-positive probability of 1/64 leads to the lowest overall query evaluation times for this experiment.

    Query times for different false-positive probabilities

    3. Recommendations for data publishers

    Based on the conclusions of our experimental results, we derived the following guidelines for publishers who aim to use the AMF feature:

    • Enable HTTP caching with a tool such as NGINX.
    • Pre-compute AMFs (or at least cache) AMFs of size 10.000 or higher.
    • If AMFs can be cached, prefer Bloom filters over GCS.
    • Use a false-positive probability of 1/64.