Link Traversal


On this page

    Link-Traversal-based Query Processing (LTQP) is a querying paradigm that enables querying over an interlinked set of Linked Data documents by following links between them.

    If you're mainly interested in Link Traversal from a Solid perspective, you can find details here.

    Research is being done on LTQP through various implementations in Comunica. This page summarizes ongoing work, and provides links to demos.

    Experimental Implementations

    A dedicated (mono)repository has been created that contains actors for enabling LTQP inside Comunica.

    Since there are multiple approaches for handling LTQP, multiple configurations. We have configurations for the following use cases:

    Main findings

    Below, you can read the high-level findings of our link traversal experiments.

    We have implemented link discovery actors dedicated to the structural properties of Solid data pods, such as their reliance on LDP containers, and the Solid type index. We have evaluated their performance using the SolidBench benchmark.

    Learn more in our academic article.

    Structural assumptions about Solid pods significantly boost performance

    The table below shows a subset of the aggregated query results when using the dedicated LDP and Solid type index actors.

    We can observe that the traditional reachability semantics for link traversal (cNone, cMatch, cAll) are either unable to find all necessary documents in Solid pod to answer queries (low result accuracy acc) (cNone and cMatch), or they follow too many links that they result in a timeout (∑to) (cAll).

    However, when the add the Solid-specific actors (cNone-solid, cMatch-solid, cAll-solid), we gain higher levels of accuracy. The most optimal combination is cMatch with the Solid actors, which achieves an accuracy of more than 99% in this case.

    t~tt1~t1req∑ansacc∑to
    cNone400N/AN/A80.000.00%0
    cMatch1,791022,94624,4391,2750.000.00%1
    cAll128,320127,02128,44810,55400.633.13%8
    cNone-solid1,5521,00642533135720.5074.14%0
    cMatch-solid12,4832,3722,3099252,70839.1399.14%0
    cAll-solid123,979125,23548,38210,36816,6233.1317.40%7

    Even if queries are slow, first results can arrive quickly

    Some queries might take multiple seconds to finish. Since all query algorithms have been designed to process results in a streaming manner, results can arrive iteratively. This means that results can arrive after a few milliseconds, even if the final result only arrives after multiple seconds, as can be seen in the figure below.

    Query times for discovery query 2.3

    Type index discovery is slightly better than LDP discovery

    As shown in the figure below, using the Solid type index for discovering data in pods results in a significantly lower number of HTTP requests compared to LDP-based discovery.

    Relative number of HTTP requests for discover queries

    Even though this difference in number of HTTP requests is significant, this results in only a minor difference in execution time, as shown below.

    Relative execution time for discover queries

    Pod size and fragmentation impact performance

    When we fragment data inside our pods in different ways (composite, separate, single, location, time), or we increase the amount of data inside pods by a given factor (1, 5), we see a signficant impact on performance, as shown in the query result arrival times of a query below.

    Query times for discovery query 1.3

    Limitations and future work

    The current main limitation of this approach is that it only works well for non-complex queries. As soon as query complex increases, query execution times become too high to be practical. The root cause of this problem is the lack of proper query planning, which would need to happen adaptively as soon as pod-specific information is discovered.

    Try it out

    Below, we list links to several example configurations for LTQP that have been built as a Web client.