Benchmarking
On this page
This page lists guidelines on how to run experiments with Comunica. This can be useful for researchers that want to evaluate their modification, or for Comunica core developers that want to check performance.
Considerations when benchmarking
Running Node in production mode
If you want to do benchmarking with Comunica in Node.js, make sure to run Node.js in production mode as follows:
NODE_ENV=production node packages/some-package/bin/some-bin.js
The reason for this is that Comunica extensively generates internal Error
objects.
In non-production mode, these also produce long stacktraces, which may in some cases impact performance.
Taking into account startup time of the engine
If you want to run experiments, it is important to take into account the time it takes for the query engine to start. When measuring execution time, one should only measure the actual time it takes for the engine to execute the query, excluding the query engine's startup time.
As such, simply measuring the execution time via the command line is not advised.
Instead, one should either make use of the stats
writer on the command line,
or measure query execution via JavaScript.
Simple benchmarking using the stats writer
The easiest way to do simple benchmarking is to make use of the -t stats
result format.
$ NODE_ENV=production \ comunica-sparql https://fragments.dbpedia.org/2016-04/en \ "SELECT * WHERE { ?s ?p ?o } LIMIT 100" \ -t stats
This will output CSV in the form of:
Result,Delay (ms),HTTP requests 1,136.638436,2 2,137.211264,2 3,137.385467,2 ... 98,151.781901,2 99,151.838555,2 100,151.898222,2 TOTAL,152.175256,2
This tells us:
- The number of query results
- The cumulative time for each result to be emitted
- The cumulative number of HTTP requests required up until each result
Simple benchmarking in JavaScript
When creating a Comunica query engine from a JavaScript application, measuring a query's execution time can be done as follows:
// Start a timer console.time("myTimer"); const bindingsStream = await myEngine.queryBindings(` SELECT ?s ?p ?o WHERE { ?s ?p <http://dbpedia.org/resource/Belgium>. ?s ?p ?o } LIMIT 100`, { sources: ['http://fragments.dbpedia.org/2015/en'], }); bindingsStream.on('data', (binding) => { // Optionally do some logging }); bindingsStream.on('end', () => { // End the timer console.timeEnd("myTimer"); });
Measuring execution time from JavaScript gives you more flexibility compared to the command line.
Examples for more advanced benchmarking in JavaScript can be found in the examples repo.
Reproducible benchmarking via Comunica Bencher
Comunica Bencher is a Docker-based benchmarking framework for easily creating and running benchmarks with Comunica and LDF Server. It is useful if you want to compare different configurations of Comunica with each other.
Together with the (semantic) configuration files of Comunica and LDF Server, this tool completes the whole provenance chain of experimental results:
- Setup of sofware based on configuration
- Generating experiment input data
- Execution of experiments based on parameters
- Description of environment dependencies during experiments
- Reporting of results
- Archiving results into a single file for easy exchange