Benchmarking
On this page
This page lists guidelines on how to run experiments with Comunica. This can be useful for researchers that want to evaluate their modification, or for Comunica core developers that want to check performance.
Considerations when benchmarking
Running Node in production mode
If you want to do benchmarking with Comunica in Node.js, make sure to run Node.js in production mode as follows:
NODE_ENV=production node packages/some-package/bin/some-bin.js
The reason for this is that Comunica extensively generates internal Error
objects.
In non-production mode, these also produce long stacktraces, which may in some cases impact performance.
Taking into account startup time of the engine
If you want to run experiments, it is important to take into account the time it takes for the query engine to start. When measuring execution time, one should only measure the actual time it takes for the engine to execute the query, excluding the query engine's startup time.
As such, simply measuring the execution time via the command line is not advised.
Instead, one should either make use of a SPARQL endpoint,
the stats
writer on the command line,
or measure query execution via JavaScript.
Warming up the JavaScript engine
Since most modern JavaScript engines (such as the V8 engine used by Node.js) are based on Just In Time (JIT) compilation, they take some time to compile and to learn about the application's structure to apply optimizations. As such, it is important to warm up your query engine before doing measurements over it, unless you specifically want to measure the cold-start performance. The recommended way to do this is to set up a Comunica SPARQL endpoint, do some warmup queries over it, and only then execute the actual benchmark.
Engines such as V8 tend to each an optimal state rather quickly, so not too many warmup rounds are required for execution time to stabilize. The number of warmup rounds can depend on your engine's version, machine, and query set.
Simple benchmarking using the stats writer
The easiest way to do simple benchmarking is to make use of the -t stats
result format.
$ NODE_ENV=production \ comunica-sparql https://fragments.dbpedia.org/2016-04/en \ "SELECT * WHERE { ?s ?p ?o } LIMIT 100" \ -t stats
This will output CSV in the form of:
Result,Delay (ms),HTTP requests 1,136.638436,2 2,137.211264,2 3,137.385467,2 ... 98,151.781901,2 99,151.838555,2 100,151.898222,2 TOTAL,152.175256,2
This tells us:
- The number of query results
- The cumulative time for each result to be emitted
- The cumulative number of HTTP requests required up until each result
Simple benchmarking in JavaScript
When creating a Comunica query engine from a JavaScript application, measuring a query's execution time can be done as follows:
// Start a timer console.time("myTimer"); const bindingsStream = await myEngine.queryBindings(` SELECT ?s ?p ?o WHERE { ?s ?p <http://dbpedia.org/resource/Belgium>. ?s ?p ?o } LIMIT 100`, { sources: ['http://fragments.dbpedia.org/2015/en'], }); bindingsStream.on('data', (binding) => { // Optionally do some logging }); bindingsStream.on('end', () => { // End the timer console.timeEnd("myTimer"); });
Measuring execution time from JavaScript gives you more flexibility compared to the command line.
Examples for more advanced benchmarking in JavaScript can be found in the examples repo.
Reproducible benchmarking via JBR
JBR is a JavaScript-based benchmarking framework for easily creating and running various benchmarks with engines such as Comunica and LDF Server. It is useful if you want to compare different configurations of Comunica or other engines with each other.
Together with the (semantic) configuration files of Comunica and LDF Server, this tool completes the whole provenance chain of experimental results:
- Setup of sofware based on configuration
- Generating experiment input data
- Execution of experiments based on parameters
- Description of environment dependencies during experiments
- Reporting of results
- Archiving results into a single file for easy exchange