Release 2.0.0: A new major release with radical simplifications and performance improvements

Thursday, March 3, 2022


On this page

    Since its initial release a couple of years ago, Comunica has grown a lot, but it has always remained fully backwards-compatible with every update. However, as with every software project, there is sometimes a need to make breaking changes so that old mechanisms can be replaced with better, newer ones. With this update, we have aggregated several breaking changes into one large update, all of which should improve the lives of users one way or another. Below, the primary changes are listed.

    New query API

    For most people, the biggest change will be in the way you use Comunica for query execution, as the package names of the default query engines have been renamed, and the JavaScript API has been improved.

    New package names

    Up until now, you may have been using @comunica/actor-init-sparql (or a variant) as your main entry point for query execution. This main entrypoint has been moved to @comunica/query-sparql (or @comunica/query-sparql-file and @comunica/query-sparql-rdfjs). This means that your imports and the dependencies in your package.json file will require updates.

    The first reason for this renaming is the fact that the new names are shorter and easier to remember. The second reason is mainly for people that want to configure their own Comunica engines, where the Query Init actor has been decoupled from the query engine entrypoints to simplify the creation of new engines.

    Improved JavaScript query API

    Another major change is related to the way you create and use a Comunica query engine in JavaScript. Mainly, the following changes have been made:

    • newEngine has been replaced with the class QueryEngine that can be instantiated with the new keyword.
    • New result-dependent query methods have been added for simpler result consumption:
      • queryBindings for SELECT queries
      • queryQuads for CONSTRUCT and DESCRIBE queries.
      • queryBoolean for ASK queries
      • queryVoid for update queries
    • The general query method still exists, but has been changed.
    • The methods on a Bindings object have been changed and improved, and obtaining values for variables does not require the ? prefix anymore.
    • If you are using TypeScript, make sure to bump @rdfjs/types to at least 1.1.0.

    Learn more about the new API in the guide on querying in a JavaScript app.

    Below, you can see an example of a simple SPARQL SELECT query execution in the old and new versions of Comunica.

    Before (Comunica 1.x):

    const newEngine = require('@comunica/actor-init-sparql').newEngine;
    const myEngine = newEngine();
    
    const result = await myEngine.query(`
      SELECT ?s ?p ?o WHERE {
        ?s ?p <http://dbpedia.org/resource/Belgium>.
        ?s ?p ?o
      } LIMIT 100`, {
      sources: ['https://fragments.dbpedia.org/2015/en'],
    });
    
    result.bindingsStream.on('data', (binding) => {
      console.log(binding.get('?s').value);
    });
    bindingsStream.on('end', () => {});
    bindingsStream.on('error', (error) => console.error(error));
    

    After (Comunica 2.x):

    const QueryEngine = require('@comunica/query-sparql').QueryEngine;
    const myEngine = new QueryEngine();
    
    const bindingsStream = await myEngine.queryBindings(`
      SELECT ?s ?p ?o WHERE {
        ?s ?p <http://dbpedia.org/resource/Belgium>.
        ?s ?p ?o
      } LIMIT 100`, {
      sources: ['https://fragments.dbpedia.org/2015/en'],
    });
    
    bindingsStream.on('data', (binding) => {
      console.log(binding.toString()); // New: quick way to print bindings
      console.log(binding.get('s').value);
    });
    bindingsStream.on('end', () => {});
    bindingsStream.on('error', (error) => console.error(error));
    

    This new query API is largely aligned with the recently created RDF/JS query specification, which makes Comunica better interactable and interchangeable within the RDF JavaScript ecosystem.

    Easier engine modifications

    Based on the feedback we received from developers that configure their own Comunica engines or implement their own Comunica packages, we have refactored the internals of Comunica in several places to simplify these processes.

    Automatic generation of components files

    Comunica makes use of the dependency injection framework Components.js to load its configuration files. A requirement for this framework is that each package should expose a semantic description of its classes, i.e., the components files. These components files are located within the components/ directory of each package. While these files had to be manually created before, these files can now be automatically generated from the TypeScript sources using Components-Generator.js. This significantly reduces the effort when creating new Comunica packages. Learn more about this in the getting started with modification guides.

    Config restructuring

    Up until now, all configuration files were split up in smaller fragments, but using an arbitrary fragmentation strategy. With this update, all configuration files now use a consistent fragmentation strategy, where a separate sub-directory exists for each Comunica bus, in which one or more files can exist per actor. Furthermore, all configuration files have been moved to a new dedicated (zero-dependency) package @comunica/config-query-sparql, which simplifies reuse and extension of these config fragments. Learn more about this new config structure in the README of @comunica/config-query-sparql.

    Internal changes for better performance

    One primary aspect of our roadmap is to improve overall performance. In this update, we refactored the way in which join operations are handled, because these were not flexible enough before, which hindered optimizations.

    Concretely, Comunica used to handle most join operations within the Basic Graph Pattern actor, which made it impossible to use these join operators for joins with other types of operations, such as property paths, which thereby made these operations very slow. With this refactoring, the join operator implementations have been fully decoupled from the Basic Graph Pattern actor, which for example makes joins between triple patterns and property paths much more efficient.

    While performance will be much better in many cases, there are still a lot of opportunities open for further optimization. We welcome contributions for making these optimizations a reality.

    Learn more about joins in Comunica.

    Explaining query plans

    Most large-scale query engines offer some way of inspecting how exactly a query engine will execute a given query, which is something Comunica has been lacking so far.

    With this update, you can inspect in detail the exact query plan and actors that were used for executing a given query. This functionality exists both on the command-line (via --explain), as in the JavaScript API. For example, the command below shows an example of a physical plan that is printed for a given query:

    $ comunica-sparql https://fragments.dbpedia.org/2016-04/en \
      -q 'SELECT * { ?s ?p ?o. ?s a ?o } LIMIT 100' --explain physical
    
    {
      "logical": "slice",
      "children": [
        {
          "logical": "project",
          "variables": [
            "s",
            "p",
            "o"
          ],
          "children": [
            {
              "logical": "join",
              "children": [
                {
                  "logical": "pattern",
                  "pattern": "?s ?p ?o"
                },
                {
                  "logical": "pattern",
                  "pattern": "?s http://www.w3.org/1999/02/22-rdf-syntax-ns#type ?o"
                },
                {
                  "logical": "join-inner",
                  "physical": "bind",
                  "bindIndex": 1,
                  "bindOrder": "depth-first",
                  "cardinalities": [
                    {
                      "type": "estimate",
                      "value": 1040358853
                    },
                    {
                      "type": "estimate",
                      "value": 100022186
                    }
                  ],
                  "joinCoefficients": {
                    "iterations": 6404592831613.728,
                    "persistedItems": 0,
                    "blockingItems": 0,
                    "requestTime": 556926378.1422498
                  },
    ...
    

    Learn more about explaining query plans in Comunica.

    Webinar

    Due to all of these changes and simplifications, we are planning a public webinar in which the basic usage of Comunica will be explained. This will be useful for new developers that want to get started with Comunica, and developers that have used Comunica before, but want to learn about the new ways of using it. This is also a perfect time for new contributors to become part of the community, or possibly even the Comunica Association. More news on this webinar will follow later.

    Full changelog

    While this blog post explained the primary changes in Comunica 2.x, there are actually many more smaller changes internally that will make your lives easier. If you want to learn more about these changes, check out the full changelog.