RDF Parsing and Serializing


On this page

    Parsing from and serializing to RDF is of great importance within Comunica, as Comunica needs to be able to query over RDF files in different formats, and produce RDF query results in different formats.

    For this, Comunica provides the RDF Parse (@comunica/bus-rdf-parse) and RDF Serialize (@comunica/bus-rdf-serialize) bus. These buses respectively contain spec-compliant streaming parsers and serializers for the most important RDF formats.

    Calling a parser

    RDF parsing actors implement the ActorRdfParse abstract class, which can handle two types of actions:

    • Retrieval of supported media types (mediaTypes), such as 'text/turtle', application/ld+json, ...
    • Parsing for a given media type (handle).

    While the first action can be used to determine all available media types that can be parsed across all actors in a bus, the second action is typically used afterwards to parse RDF for a specific media type.

    Since there are two types of actions, calling an RDF parser involves two respective mediators. An example of such two mediators can be found in rdf-dereference.json. In TypeScript, these mediators will correspond to the following fields:

    public readonly mediatorRdfParseMediatypes: Mediator<
      Actor<IActionMediaTypesRdfParse, IActorTestMediaTypesRdfParse, IActorOutputMediaTypesRdfParse>,
      IActionMediaTypesRdfParse, IActorTestMediaTypesRdfParse, IActorOutputMediaTypesRdfParse>;
    public readonly mediatorRdfParseHandle: Mediator<
      Actor<IActionHandleRdfParse, IActorTestHandleRdfParse, IActorOutputHandleRdfParse>,
      IActionHandleRdfParse, IActorTestHandleRdfParse, IActorOutputHandleRdfParse>;

    All available media types can be retrieved as follows:

    const { mediaTypes } = await this.mediatorRdfParseMediatypes.mediate(
      { context, mediaTypes: true },
    );

    Parsing for a specific media type can be done as follows:

    const { quads } = (await this.mediatorRdfParseHandle.mediate(
      {
        context,
        handle: {
          baseIRI: 'http://example.org/',
          headers: undefined, // Optional HTTP fetch headers
          input: textStream,
        },
        handleMediaType: 'text/turtle',
      },
    )).handle;

    Input quadStream must always be a text stream, output quads is am RDF/JS stream.

    More examples on how these parses are used can be found in actors on the RDF Dereference bus or in the rdf-parse.js package.

    Calling a serializer

    RDF serialzation actors implement the ActorRdfSerialize, which can handle two types of actions:

    • Retrieval of supported media types (mediaTypes), such as 'text/turtle', application/ld+json, ...
    • Retrieval of supported media types as URLs (mediaTypeFormats), such as http://www.w3.org/ns/formats/N3, http://www.w3.org/ns/formats/JSON-LD, ...
    • Parsing for a given media type (handle).

    The first action can be used to determine all available media types that can be parsed across all actors in a bus, the second action is used to identify media types by URL in things like SPARQL service descriptions, and the third action is typically used afterwards to parse RDF for a specific media type.

    Since there are three types of actions, calling an RDF serializer involves three respective mediators. An example of such two mediators can be found in sparql-serializers.json. In TypeScript, these mediators will correspond to the following fields:

    public readonly mediatorRdfSerialize: Mediator<
      Actor<IActionSparqlSerializeHandle, IActorTestSparqlSerializeHandle, IActorOutputSparqlSerializeHandle>,
      IActionSparqlSerializeHandle, IActorTestSparqlSerializeHandle, IActorOutputSparqlSerializeHandle>;
    
    public readonly mediatorMediaTypeCombiner: Mediator<
      Actor<IActionSparqlSerializeMediaTypes, IActorTestSparqlSerializeMediaTypes, IActorOutputSparqlSerializeMediaTypes>,
      IActionSparqlSerializeMediaTypes, IActorTestSparqlSerializeMediaTypes, IActorOutputSparqlSerializeMediaTypes>;
    
    public readonly mediatorMediaTypeFormatCombiner: Mediator<
      Actor<IActionSparqlSerializeMediaTypeFormats, IActorTestSparqlSerializeMediaTypeFormats,
      IActorOutputSparqlSerializeMediaTypeFormats>,
      IActionSparqlSerializeMediaTypeFormats, IActorTestSparqlSerializeMediaTypeFormats,
      IActorOutputSparqlSerializeMediaTypeFormats>;

    All available media types can be retrieved as follows:

    const { mediaTypes } = await this.mediatorMediaTypeCombiner.mediate(
      { context, mediaTypes: true },
    );

    All available media type URLs can be retrieved as follows:

    const { mediaTypeFormats } = await this.mediatorMediaTypeFormatCombiner.mediate(
      { context, mediaTypeFormats: true },
    );

    Serializing for a specific media type can be done as follows:

    const { data } = (await this.mediatorRdfSerialize.mediate({
      context,
      handle: {
        type: 'quads',
        quadStream, // An RDF/JS Stream of RDF/JS quads.
      },
      handleMediaType: 'text/turtle',
    })).handle

    Input quadStream must always be an RDF/JS stream, output data is a text stream.

    More examples on how these parses are used can be found in the SPARQL RDF Serialize actor.