HDT


On this page

    HDT is a highly compressed RDF dataset format that enables efficient triple pattern querying. Comunica enables executing SPARQL queries over HDT files, as it is one of the supported source types.

    Querying over HDT requires Comunica SPARQL HDT package (@comunica/query-sparql-hdt).

    1. Installation

    Since Comunica runs on Node.js, make sure you have Node.js installed on your machine. HDT requires GCC 4.9 or higher to be available.

    Next, we can install Comunica SPARQL on our machine:

    $ npm install -g @comunica/query-sparql-hdt
    

    2. SPARQL querying over one HDT file

    After installing Comunica SPARQL HDT, you will be given access to several commands including comunica-sparql-hdt, which allows you to execute SPARQL queries from the command line.

    Just like comunica-sparql, this command requires one or more URLs to be provided as sources to query over. As last argument, as SPARQL query string can be provided.

    For example, the following query retrieves the first 100 triples a local HDT file:

    $ comunica-sparql-hdt hdt@path/to/myfile.hdt \
        "SELECT * WHERE { ?s ?p ?o } LIMIT 100"
    

    3. SPARQL querying over multiple HDT files

    Just like comunica-sparql, querying over multiple sources simply requires you to pass them after each other:

    $ comunica-sparql-hdt hdt@path/to/myfile1.hdt \
        hdt@path/to/myfile2.hdt \
        hdt@path/to/myfile3.hdt \
        "SELECT * WHERE { ?s ?p ?o } LIMIT 100"
    

    4. Learn more

    This guide only discussed the basic functionality of comunica-sparql-hdt. You can learn more options by invoking the help command, or by reading the Comunica SPARQL documentation:

    $ comunica-sparql-hdt --help
    

    The API for querying over HDT files in JavaScript apps is identical to Comunica SPARQL, and just requires importing @comunica/query-sparql-hdt instead of @comunica/query-sparql.

    In order to set up a SPARQL endpoint, comunica-sparql-hdt-http can be used, just like Comunica SPARQL.