Hypermedia


On this page

    Comunica enables hypermedia-driven query execution. This allows users to provide data sources by URL, and Comunica will automatically detect the querying capabilities for this source to determine an efficient query execution plan.

    This strategy makes it so that when providing a link to a SPARQL endpoint (e.g. https://dbpedia.org/sparql), communication will be done using SPARQL queries. While when providing a link to a plain RDF file (e.g. http://ruben.verborgh.org/profile/), the whole file will be downloaded and queried in-memory.

    This page only describes the handling of hypermedia for read queries. The handling of hypermedia for update queries happens in a very similar manner, with the main difference that the RDF Resolve Hypermedia bus is replaced by the RDF Update Hypermedia bus.

    Hypermedia actor

    The actor in Comunica that drives hypermedia handling is @comunica/actor-query-source-identify-hypermedia. This actor is registered to the Query Source Identify bus. This actor will be invoked once for each query during context preprocessing, and the identified source will be assigned to suboperations within the query.

    The SPARQL architecture shows how this hypermedia actor relates to all other actors and buses.

    Steps for handling hypermedia

    For each URL-based data source, the hypermedia actor will always go through the following steps:

    1. Dereference RDF (Dereference RDF bus)
    2. Split data and metadata streams (RDF Metadata bus)
    3. Extract metadata as object (RDF Metadata Extract bus)
    4. Determine links to other sources (RDF Resolve Hypermedia Links bus)
    5. Create a queue for managing links (RDF Resolve Hypermedia Links Queue bus)
    6. Handle source based on metadata (Query Source Identify Hypermedia bus)

    Hereafter, we go over these three steps using three example sources:

    1. https://dbpedia.org/sparql
    2. http://fragments.dbpedia.org/2016-04/en
    3. https://ruben.verborgh.org/profile/

    1. Dereference RDF

    An HTTP(S) request is done to retrieve the RDF data at the given location via content negotiation. Different ways of doing this may exist in the Dereference RDF bus. Concretely, the input is an URL, and the output is a stream of parsed RDF triples/quads.

    For example:

    1. https://dbpedia.org/sparql
    ns1:sparql	rdf:type	sd:Service ;
    	        sd:endpoint	ns1:sparql ;
    	        sd:feature	sd:UnionDefaultGraph ,
    		    sd:DereferencesURIs .
    @prefix ns3:	<http://www.w3.org/ns/formats/> .
    ns1:sparql	sd:resultFormat	ns3:SPARQL_Results_JSON ,
    		    ns3:SPARQL_Results_XML ,
    		    ns3:Turtle ,
    		    ns3:N-Triples ,
    		    ns3:N3 ,
    		    ns3:RDF_XML ,
    		    ns3:SPARQL_Results_CSV ,
    		    ns3:RDFa ;
    	        sd:supportedLanguage	sd:SPARQL10Query ;
    	        sd:url	ns1:sparql .
    
    1. http://fragments.dbpedia.org/2016-04/en
    <https://fragments.dbpedia.org/#dataset> hydra:member <https://fragments.dbpedia.org/2016-04/en#dataset>.
    <https://fragments.dbpedia.org/2016-04/en#dataset> a void:Dataset, hydra:Collection;
        void:subset <https://fragments.dbpedia.org/2016-04/en>;
        hydra:search _:triplePattern.
    _:triplePattern hydra:template "https://fragments.dbpedia.org/2016-04/en{?subject,predicate,object}";
        hydra:variableRepresentation hydra:ExplicitRepresentation;
        hydra:mapping _:subject, _:predicate, _:object.
    _:subject hydra:variable "subject";
        hydra:property rdf:subject.
    _:predicate hydra:variable "predicate";
        hydra:property rdf:predicate.
    _:object hydra:variable "object";
        hydra:property rdf:object.
    <https://fragments.dbpedia.org/2016-04/en> void:subset <https://fragments.dbpedia.org/2016-04/en>;
        a hydra:PartialCollectionView;
        dcterms:title "Linked Data Fragment of DBpedia 2016-04"@en;
        dcterms:description "Triple Pattern Fragment of the 'DBpedia 2016-04' dataset containing triples matching the pattern { ?s ?p ?o }."@en;
        dcterms:source <https://fragments.dbpedia.org/2016-04/en#dataset>;
        hydra:totalItems "1040358853"^^xsd:integer;
        void:triples "1040358853"^^xsd:integer;
        hydra:itemsPerPage "100"^^xsd:integer;
        hydra:first <https://fragments.dbpedia.org/2016-04/en?page=1>;
        hydra:next <https://fragments.dbpedia.org/2016-04/en?page=2>.
    <http://0-access.newspaperarchive.com.lib.utep.edu/us/mississippi/biloxi/biloxi-daily-herald/1899/05-06/page-6?tag=tierce+wine&rtserp=tags/tierce-wine?page=2> dbpprop:date "1899-05-06"^^xsd:date;
        dbpprop:isCitedBy <http://dbpedia.org/resource/Tierce_(unit)>;
        dbpprop:newspaper "Biloxi Daily Herald";
        dbpprop:page "6";
        dbpprop:title "A New System of Weights and Measures";
        dbpprop:url <http://0-access.newspaperarchive.com.lib.utep.edu/us/mississippi/biloxi/biloxi-daily-herald/1899/05-06/page-6?tag=tierce+wine&rtserp=tags/tierce-wine?page=2>.
    ...
    
    1. https://ruben.verborgh.org/profile/
    <https://ruben.verborgh.org/profile/>
        a foaf:Document, foaf:PersonalProfileDocument;
        rdfs:label "Ruben Verborgh’s FOAF profile"@en;
        foaf:maker :me;
        foaf:primaryTopic :me.
    :me a foaf:Person;
        foaf:name  "Ruben Verborgh"@en, "Ruben Verborgh"@nl;
        rdfs:label "Ruben Verborgh"@en, "Ruben Verborgh"@nl;
        vcard:fn   "Ruben Verborgh"@en, "Ruben Verborgh"@nl;
        con:preferredURI "https://ruben.verborgh.org/profile/#me";
        foaf:givenName "Ruben"@en, "Ruben"@nl;
        foaf:familyName "Verborgh"@en, "Verborgh"@nl;
    ...
    

    2. Split data and metadata streams

    Some RDF sources may include metadata inside the document, such as Triple Pattern Fragments. As such, there needs to be a way to distinguish between data and metadata triples, for which different strategies exist in the RDF Metadata bus.

    Learn more details about metadata objects.

    For example:

    1. https://dbpedia.org/sparql

    Data: empty

    Metadata:

    ns1:sparql	rdf:type	sd:Service ;
    	        sd:endpoint	ns1:sparql ;
    	        sd:feature	sd:UnionDefaultGraph ,
    		    sd:DereferencesURIs .
    @prefix ns3:	<http://www.w3.org/ns/formats/> .
    ns1:sparql	sd:resultFormat	ns3:SPARQL_Results_JSON ,
    		    ns3:SPARQL_Results_XML ,
    		    ns3:Turtle ,
    		    ns3:N-Triples ,
    		    ns3:N3 ,
    		    ns3:RDF_XML ,
    		    ns3:SPARQL_Results_CSV ,
    		    ns3:RDFa ;
    	        sd:supportedLanguage	sd:SPARQL10Query ;
    	        sd:url	ns1:sparql .
    
    1. http://fragments.dbpedia.org/2016-04/en

    Data:

    <http://0-access.newspaperarchive.com.lib.utep.edu/us/mississippi/biloxi/biloxi-daily-herald/1899/05-06/page-6?tag=tierce+wine&rtserp=tags/tierce-wine?page=2> dbpprop:date "1899-05-06"^^xsd:date;
        dbpprop:isCitedBy <http://dbpedia.org/resource/Tierce_(unit)>;
        dbpprop:newspaper "Biloxi Daily Herald";
        dbpprop:page "6";
        dbpprop:title "A New System of Weights and Measures";
        dbpprop:url <http://0-access.newspaperarchive.com.lib.utep.edu/us/mississippi/biloxi/biloxi-daily-herald/1899/05-06/page-6?tag=tierce+wine&rtserp=tags/tierce-wine?page=2>.
    ...
    

    Metadata:

    <https://fragments.dbpedia.org/#dataset> hydra:member <https://fragments.dbpedia.org/2016-04/en#dataset>.
    <https://fragments.dbpedia.org/2016-04/en#dataset> a void:Dataset, hydra:Collection;
        void:subset <https://fragments.dbpedia.org/2016-04/en>;
        hydra:search _:triplePattern.
    _:triplePattern hydra:template "https://fragments.dbpedia.org/2016-04/en{?subject,predicate,object}";
        hydra:variableRepresentation hydra:ExplicitRepresentation;
        hydra:mapping _:subject, _:predicate, _:object.
    _:subject hydra:variable "subject";
        hydra:property rdf:subject.
    _:predicate hydra:variable "predicate";
        hydra:property rdf:predicate.
    _:object hydra:variable "object";
        hydra:property rdf:object.
    <https://fragments.dbpedia.org/2016-04/en> void:subset <https://fragments.dbpedia.org/2016-04/en>;
        a hydra:PartialCollectionView;
        dcterms:title "Linked Data Fragment of DBpedia 2016-04"@en;
        dcterms:description "Triple Pattern Fragment of the 'DBpedia 2016-04' dataset containing triples matching the pattern { ?s ?p ?o }."@en;
        dcterms:source <https://fragments.dbpedia.org/2016-04/en#dataset>;
        hydra:totalItems "1040358853"^^xsd:integer;
        void:triples "1040358853"^^xsd:integer;
        hydra:itemsPerPage "100"^^xsd:integer;
        hydra:first <https://fragments.dbpedia.org/2016-04/en?page=1>;
        hydra:next <https://fragments.dbpedia.org/2016-04/en?page=2>.
    
    1. https://ruben.verborgh.org/profile/

    Data:

    <https://ruben.verborgh.org/profile/>
        a foaf:Document, foaf:PersonalProfileDocument;
        rdfs:label "Ruben Verborgh’s FOAF profile"@en;
        foaf:maker :me;
        foaf:primaryTopic :me.
    :me a foaf:Person;
        foaf:name  "Ruben Verborgh"@en, "Ruben Verborgh"@nl;
        rdfs:label "Ruben Verborgh"@en, "Ruben Verborgh"@nl;
        vcard:fn   "Ruben Verborgh"@en, "Ruben Verborgh"@nl;
        con:preferredURI "https://ruben.verborgh.org/profile/#me";
        foaf:givenName "Ruben"@en, "Ruben"@nl;
        foaf:familyName "Verborgh"@en, "Verborgh"@nl;
    ...
    

    Metadata: empty

    3. Extract metadata as object

    Using actors on the RDF Metadata Extract bus, relevant parts of the metadata stream are identified, and a convenient metadata object is constructed for later use.

    For example:

    1. https://dbpedia.org/sparql
    {
      "sparqlService": "https://dbpedia.org/sparql"
    }
    
    1. http://fragments.dbpedia.org/2016-04/en
    {
      "first": "https://fragments.dbpedia.org/2016-04/en?page=1",
      "next": "https://fragments.dbpedia.org/2016-04/en?page=2",
      "searchForms": {
        "values": [
          {
            "mappings": {
              "subject": "http://www.w3.org/1999/02/22-rdf-syntax-ns#subject",
              "predicate": "http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate",
              "object": "http://www.w3.org/1999/02/22-rdf-syntax-ns#object"
            },
            "template": "https://fragments.dbpedia.org/2016-04/en{?subject,predicate,object}"
          }
        ]
      },
      "cardinality": { "type": "estimate", "value": 1040358853, "dataset": "https://fragments.dbpedia.org/2016-04/en" }
    }
    
    1. https://ruben.verborgh.org/profile/
    {}
    

    Based on the detected metadata, links are extracted that can optionally be followed. These links are determined using actors on the RDF Resolve Hypermedia Links bus.

    For example:

    1. https://dbpedia.org/sparql: None
    2. http://fragments.dbpedia.org/2016-04/en: https://fragments.dbpedia.org/2016-04/en?page=2
    3. https://ruben.verborgh.org/profile/: None

    Using the RDF Resolve Hypermedia Links Queue bus, a ILinkQueue instance is created using which the order is determined to process links.

    By default, this will be a queue that processes links in FIFO order.

    6. Handle source based on metadata

    Finally, the Query Source Identify Hypermedia bus contains actors that can handle sources based on the extracted metadata.

    Concretely, the detected metadata will be given to each actor on the bus, and the actor that can handle it with the best filtering capabilities will be allowed to handle it.

    For example:

    1. https://dbpedia.org/sparql: SPARQL query to https://dbpedia.org/sparql
    2. http://fragments.dbpedia.org/2016-04/en: Fill in https://fragments.dbpedia.org/2016-04/en{?subject,predicate,object}, and follow all subsequent next-page links.
    3. https://ruben.verborgh.org/profile/: No hypermedia, so fallback to querying over all triples in the returned data stream.
    If multiple links are being followed, the metadata object corresponding to the current quad pattern will be incrementally updated after each link that is being followed. This is done using the rdf-metadata-accumulate bus, which has dedicated actors for handling how to merge specific metadata fields together.