Release 4.0: 🚄 Faster actor testing and modularized expressions

Tuesday, October 15, 2024


On this page

    Earlier this year, Comunica version 3.0 was released, which introduced better querying planning across data sources and several convenience features. Since then, we had several minor releases introducing multiple additions and performance improvements. In release 3.2, we introduced better techniques for discovering performance bottleneck. This allowed us identify some low-level bottlenecks in the internals of Comunica which limited performance. Fixing these issues requires breaking changes to the internal API of Comunica, which is the main reason for this major update. As a result of these changes, we see performance improvements of 20% to 40%. Breaking changes are limited to internal Comunica APIs. This means that developers that make use of Comunica can have their cake and eat it 🎂; no breaking changes and gaining a performance boost.

    🪢 New Actor.test contract for better performance

    To determine which actors can answer a certain action, all Comunica actors expose a test method, which used to look something like this:

    class MyActor extends Actor {
      public async test(action: IAction): Promise<IActorTest> {
        if (conditionNotMet(action)) {
          throw new Error('This actor can not handle the action');
        }
        return true;
      }
    }
    

    The problem with the above is that JavaScript engines such as V8 will eagerly build internal stacktraces when creating Error objects. Since Comunica has a large number of actors (240 at the time of writing), an average query execution can lead to a huge number of internal Error objects being created. According to our measurements, this produced a non-negligible performance overhead.

    As such, we refactored the contract of the test method to not rely on these Error objects anymore. Instead, test methods now make use of TestResult objects, which in practise look like this:

    class MyActor extends Actor {
      public async test(action: IAction): Promise<TestResult<IActorTest>> {
        if (conditionNotMet(action)) {
          return failTest('This actor can not handle the action');
        }
        return passTestVoid();
      }
    }
    

    For various benchmarks on in-memory triple stores, this change makes queries up to 20% faster.

    🧩 Modularization of expressions logic

    Thanks to Jitse De Smet's monumental effort, all expressions-related logic in Comunica is now fully modularized. Previously, the handling of filters and aggregates were all delegated to the singular sparqlee package. While this package did a great job of handling filters and aggregates, it lacked the modularity that existed for all other parts of query execution. For example, it was not possible to easily plug in your own actor to evaluate the SUM aggregator in a different way.

    With this release, the sparqlee has been split up into multiple buses and actors, which are responsible for term comparators, function, and aggregators. For this, we avoided any kind of performance degradation.

    Learn more about expressions evaluation in our documentation.

    🚄 Performance improvements

    Besides the changes mentioned above, there are a number of smaller changes that have a positive impact on performance that are worth mentioning:

    🤝 Contributors

    This release has been made possible thanks to the help of the following contributors (in no particular order):

    Full changelog

    While this blog post explained the primary changes in Comunica 4.x, there are actually many more smaller changes internally that will make your lives easier. If you want to learn more about these changes, check out the full changelog.