Release 4.0: 🚄 Faster actor testing and modularized expressions
Tuesday, October 15, 2024
On this page
Earlier this year, Comunica version 3.0 was released, which introduced better querying planning across data sources and several convenience features. Since then, we had several minor releases introducing multiple additions and performance improvements. In release 3.2, we introduced better techniques for discovering performance bottleneck. This allowed us identify some low-level bottlenecks in the internals of Comunica which limited performance. Fixing these issues requires breaking changes to the internal API of Comunica, which is the main reason for this major update. As a result of these changes, we see performance improvements of 20% to 40%. Breaking changes are limited to internal Comunica APIs. This means that developers that make use of Comunica can have their cake and eat it 🎂; no breaking changes and gaining a performance boost.
🪢 New Actor.test
contract for better performance
To determine which actors can answer a certain action,
all Comunica actors expose a test
method, which used to look something like this:
class MyActor extends Actor { public async test(action: IAction): Promise<IActorTest> { if (conditionNotMet(action)) { throw new Error('This actor can not handle the action'); } return true; } }
The problem with the above is that JavaScript engines such as V8
will eagerly build internal stacktraces when creating Error
objects.
Since Comunica has a large number of actors (240 at the time of writing),
an average query execution can lead to a huge number of internal Error
objects being created.
According to our measurements, this produced a non-negligible performance overhead.
As such, we refactored the contract of the test
method to not rely on these Error
objects anymore.
Instead, test
methods now make use of TestResult
objects,
which in practise look like this:
class MyActor extends Actor { public async test(action: IAction): Promise<TestResult<IActorTest>> { if (conditionNotMet(action)) { return failTest('This actor can not handle the action'); } return passTestVoid(); } }
For various benchmarks on in-memory triple stores, this change makes queries up to 20% faster.
🧩 Modularization of expressions logic
Thanks to Jitse De Smet's monumental effort,
all expressions-related logic in Comunica is now fully modularized.
Previously, the handling of filters and aggregates were all delegated to the singular sparqlee
package.
While this package did a great job of handling filters and aggregates,
it lacked the modularity that existed for all other parts of query execution.
For example, it was not possible to easily plug in your own actor to evaluate the SUM
aggregator in a different way.
With this release, the sparqlee
has been split up into multiple buses and actors,
which are responsible for term comparators, function, and aggregators.
For this, we avoided any kind of performance degradation.
Learn more about expressions evaluation in our documentation.
🚄 Performance improvements
Besides the changes mentioned above, there are a number of smaller changes that have a positive impact on performance that are worth mentioning:
- Optimize Bindings merge logic
- Fix internal cardinalities being wrong for SPARQL endpoints with VoID
- Hash Joins now always use 32-bit numbers, which speeds up operations in the V8 engine.
- Hash joins use the faster Murmur3 hash method
- Only consider overlapping vars when testing undef in join actors
- Refactor HTTP fetch and retry logic, which leads to more stable query execution over servers that make use of rate limits
🤝 Contributors
This release has been made possible thanks to the help of the following contributors (in no particular order):
- Jonni Hanski
- Jitse De Smet
- Ruben Eschauzier
- Karel Klíma
- Ieben Smessaert
- Maarten Vandenbrande
- Bryan-Elliott Tam
- Jesse Wright
- Ruben Taelman
Full changelog
While this blog post explained the primary changes in Comunica 4.x, there are actually many more smaller changes internally that will make your lives easier. If you want to learn more about these changes, check out the full changelog.