Querying with a custom configuration from the command line
On this page
While packages such as Comunica SPARQL ship with a default configuration that offer specific querying functionality, it is possible to override these configurations, so that you can modify the internal capabilities of your query engine.
In this guide, we will keep it simple, and we will just remove some parts of the config file to create a more lightweight query engine, and query it from the command line. In a next guide, we will look into querying with a custom config from a JavaScript app.
1. Requirements of a config file
Comunica is composed of a set of actors
that execute specific tasks.
For example, all SPARQL query operators (DISTINCT
, FILTER
, ASK
, ...)
have a corresponding actor that implements them in a certain way.
By modifying the Comunica config file, it is possible to plug in different implementations for certain SPARQL query operators, in case you for example have a more efficient implementation yourself.
Main config file
A Comunica config is written in JSON, and typically looks something like this:
{ "@context": [ "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/config-query-sparql/^3.0.0/components/context.jsonld" ], "@id": "urn:comunica:my", "@type": "Runner", "import": [ "ccqs:config/context-preprocess/actors.json", "ccqs:config/context-preprocess/mediators.json", "ccqs:config/http/actors.json", "ccqs:config/http/mediators.json", "ccqs:config/init/actors.json", "ccqs:config/optimize-query-operation/actors.json", "ccqs:config/optimize-query-operation/mediators.json", "ccqs:config/query-operation/actors.json", "ccqs:config/query-operation/mediators.json" ] }
Essentially, this config file contains a list of imports to smaller config files, which are loaded in when Comunica reads this config file.
These imported config files each represent a component on a particular bus.
For example ccqs:config/query-operation/actors.json
refers to all actors that are registered on the query operation bus,
and ccqs:config/query-operation/mediators.json
refers to the mediators that are defined over the query operation bus.
The ccqs:
prefix refers to the scope of the @comunica/config-query-sparql
package,
which means that all paths following it refer to files within this package.
Imported config file
For example, the imported config file ccqs:config/query-operation/actors.json
could look something like this:
{ "@context": [ "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/config-query-sparql/^3.0.0/components/context.jsonld" ], "import": [ "ccqs:config/query-operation/actors/query/ask.json", "ccqs:config/query-operation/actors/query/bgp.json", "ccqs:config/query-operation/actors/query/construct.json", "ccqs:config/query-operation/actors/query/describe.json", "ccqs:config/query-operation/actors/query/distinct.json", "ccqs:config/query-operation/actors/query/extend.json", "ccqs:config/query-operation/actors/query/filter.json", "ccqs:config/query-operation/actors/query/from.json", "ccqs:config/query-operation/actors/query/group.json", "ccqs:config/query-operation/actors/query/join.json", "ccqs:config/query-operation/actors/query/leftjoin.json", "ccqs:config/query-operation/actors/query/minus.json", "ccqs:config/query-operation/actors/query/nop.json", "ccqs:config/query-operation/actors/query/orderby.json", "ccqs:config/query-operation/actors/query/project.json", "ccqs:config/query-operation/actors/query/quadpattern.json", "ccqs:config/query-operation/actors/query/reduced.json", "ccqs:config/query-operation/actors/query/service.json", "ccqs:config/query-operation/actors/query/slice.json", "ccqs:config/query-operation/actors/query/sparql-endpoint.json", "ccqs:config/query-operation/actors/query/union.json", "ccqs:config/query-operation/actors/query/values.json" ] }
This example config file imports several smaller config files, where each config file contains a single actor that will be loaded into Comunica.
For example, the ccqs:config/query-operation/actors/query/ask.json
file could look as follows:
{ "@context": [ "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/runner/^3.0.0/components/context.jsonld", "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/actor-query-operation-ask/^3.0.0/components/context.jsonld" ], "@id": "urn:comunica:default:Runner", "@type": "Runner", "actors": [ { "@id": "urn:comunica:default:query-operation/actors#ask", "@type": "ActorQueryOperationAsk", "mediatorQueryOperation": { "@id": "urn:comunica:default:query-operation/mediators#main" } } ] }
Each configured actor fulfills a specific task, e.g.:
ActorQueryOperationAsk
: Executes SPARQLASK
queries.ActorQueryOperationDistinctHash
: Executes the SPARQLDISTINCT
operator.ActorQueryOperationFilterSparqlee
: Executes SPARQLFILTER
expressions.
2. Install Comunica SPARQL
Since we want to override the default config of Comunica SPARQL, we have to make sure its package is installed first:
$ npm install -g @comunica/query-sparql
3. Start from an existing config file
The easiest way to create a custom config, is to start from an existing one, and add/remove things to fit your needs.
Let's start by creating a new empty directory,
and create a file called config.json
.
In this guide, we will start from
the Comunica SPARQL default config file.
Let's copy its contents entirely into our config.json
:
{ "@context": [ "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/config-query-sparql/^3.0.0/components/context.jsonld" ], "import": [ "ccqs:config/context-preprocess/actors.json", "ccqs:config/context-preprocess/mediators.json", "ccqs:config/hash-bindings/actors.json", "ccqs:config/hash-bindings/mediators.json", "ccqs:config/http/actors.json", "ccqs:config/http/mediators.json", "ccqs:config/http-invalidate/actors.json", "ccqs:config/http-invalidate/mediators.json", "ccqs:config/init/actors.json", "ccqs:config/merge-bindings-context/actors.json", "ccqs:config/merge-bindings-context/mediators.json", "ccqs:config/optimize-query-operation/actors.json", "ccqs:config/optimize-query-operation/mediators.json", "ccqs:config/query-operation/actors.json", "ccqs:config/query-operation/mediators.json", "ccqs:config/query-parse/actors.json", "ccqs:config/query-parse/mediators.json", "ccqs:config/query-process/actors.json", "ccqs:config/query-process/mediators.json", "ccqs:config/query-result-serialize/actors.json", "ccqs:config/query-result-serialize/mediators.json", "ccqs:config/query-source-identify/actors.json", "ccqs:config/query-source-identify/mediators.json", "ccqs:config/query-source-identify-hypermedia/actors.json", "ccqs:config/query-source-identify-hypermedia/mediators.json", "ccqs:config/dereference/actors.json", "ccqs:config/dereference/mediators.json", "ccqs:config/dereference-rdf/actors.json", "ccqs:config/dereference-rdf/mediators.json", "ccqs:config/rdf-join/actors.json", "ccqs:config/rdf-join/mediators.json", "ccqs:config/rdf-join-entries-sort/actors.json", "ccqs:config/rdf-join-entries-sort/mediators.json", "ccqs:config/rdf-join-selectivity/actors.json", "ccqs:config/rdf-join-selectivity/mediators.json", "ccqs:config/rdf-metadata/actors.json", "ccqs:config/rdf-metadata/mediators.json", "ccqs:config/rdf-metadata-accumulate/actors.json", "ccqs:config/rdf-metadata-accumulate/mediators.json", "ccqs:config/rdf-metadata-extract/actors.json", "ccqs:config/rdf-metadata-extract/mediators.json", "ccqs:config/rdf-parse/actors.json", "ccqs:config/rdf-parse/mediators.json", "ccqs:config/rdf-parse-html/actors.json", "ccqs:config/rdf-resolve-hypermedia-links/actors.json", "ccqs:config/rdf-resolve-hypermedia-links/mediators.json", "ccqs:config/rdf-resolve-hypermedia-links-queue/actors.json", "ccqs:config/rdf-resolve-hypermedia-links-queue/mediators.json", "ccqs:config/rdf-serialize/actors.json", "ccqs:config/rdf-serialize/mediators.json", "ccqs:config/rdf-update-hypermedia/actors.json", "ccqs:config/rdf-update-hypermedia/mediators.json", "ccqs:config/rdf-update-quads/actors.json", "ccqs:config/rdf-update-quads/mediators.json" ] }
4. Execute with Comunica SPARQL
While we usually use comunica-sparql
to invoke Comunica SPARQL on the command line,
we can instead call comunica-dynamic-sparql
with exactly the same arguments
to allow loading in a custom config file.
In order to specify a custom config file,
we have to set the path to our config file via the COMUNICA_CONFIG
environment variable:
$ export COMUNICA_CONFIG="config.json"
If you now execute comunica-dynamic-sparql
,
it will load in your config.json
file.
Let's try a simple query to see if this works:
$ comunica-dynamic-sparql http://fragments.dbpedia.org/2016-04/en \ "CONSTRUCT WHERE { ?s ?p ?o } LIMIT 100"
COMUNICA_CONFIG
environment variable,
comunica-dynamic-sparql
will fallback to the default Comunica SPARQL config file.
comunica-dynamic-sparql
has a significant startup delay compared to comunica-sparql
,
since it now have to load in, parse, and interpret a config file.
comunica-dynamic-sparql
should therefore only be used for simple testing
before you use your query engine in a separate package.
5. Removing RDF serialization actors
As an example, we will remove all actors that can output results in any RDF format.
All of these actors are defined in the ccqs:config/rdf-serialize/actors.json
config file.
Before we make any changes to our config file, let us inspect the result formats that are currently available:
$ comunica-dynamic-sparql --listformats application/ld+json application/trig application/n-quads text/turtle application/n-triples text/n3 stats tree table application/sparql-results+xml text/tab-separated-values application/sparql-results+json text/csv simple application/json
The first 6 of those formats are RDF serialization formats,
which are mainly used for outputting CONSTRUCT
query results.
If we want to remove those actors from the config file,
we can remove the following line from our config.json
:
- "ccqs:config/rdf-serialize/actors.json",
If we now inspect the available result formats, we get the following:
$ comunica-dynamic-sparql --listformats stats tree table application/sparql-results+xml text/tab-separated-values application/sparql-results+json text/csv simple application/json
As you can see, the 6 RDF serialization formats are not present anymore. This is because Comunica has not loaded them in because we have removed them from our config file.
6. Only allowing SELECT
queries
Let's take our config modifications a step further,
and let's say our goal is to build a query engine that can only execute SELECT
queries,
and we don't want to be able to execute CONSTRUCT
and DESCRIBE
queries.
This will require us to remove some more actors.
While the actors for CONSTRUCT
and DESCRIBE
are defined in ccqs:config/query-operation/actors.json
,
we can not just simply remove that file from our imports,
because it also contains actors for other SPARQL query operators which we don't want to remove, such as SELECT
.
Instead of just removing ccqs:config/query-operation/actors.json
,
we will remove it and copy its contents directly into our config file.
6.1. Inline an imported config
To do this, first remove the following line from our config.json
:
- "ccqs:config/query-operation/actors.json",
Next, copy the "import"
entries from ccqs:config/query-operation/actors.json
(GitHub),
and paste it after the current "import"
entries in our config.json
.
Your config.json
file should have the following structure now:
{ "@context": [ "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/config-query-sparql/^3.0.0/components/context.jsonld" ], "import": [ "ccqs:config/context-preprocess/actors.json", "ccqs:config/context-preprocess/mediators.json", ... "ccqs:config/rdf-update-quads/actors.json", "ccqs:config/rdf-update-quads/mediators.json", "ccqs:config/query-operation/actors/query/ask.json", "ccqs:config/query-operation/actors/query/bgp.json", "ccqs:config/query-operation/actors/query/construct.json", ... "ccqs:config/query-operation/actors/update/load.json", "ccqs:config/query-operation/actors/update/move.json" ] }
comunica-dynamic-sparql
.
6.2. Remove actors
Next, we will remove the query operation actors we don't need. Concretely, we will remove the following imports to actors:
ccqs:config/query-operation/actors/query/construct.json
: HandlesCONSTRUCT
queries.ccqs:config/query-operation/actors/query/describe.json
: HandlesDESCRIBE
queries.
For this, remove the following lines:
- "ccqs:config/query-operation/actors/query/construct.json", - "ccqs:config/query-operation/actors/query/describe.json",
6.3. Test changes
After this change, you should now be unable to execute CONSTRUCT
or DESCRIBE
queries.
Try this out by executing the following:
$ comunica-dynamic-sparql http://fragments.dbpedia.org/2016-04/en \ "CONSTRUCT WHERE { ?s ?p ?o } LIMIT 100"
Executing a SELECT
query will still work:
$ comunica-dynamic-sparql http://fragments.dbpedia.org/2016-04/en \ "SELECT * WHERE { ?s ?p ?o } LIMIT 100"
You have now successfully built your own custom Comunica engine that is a bit more lightweight than the default one.
Just like the CONSTRUCT
and DESCRIBE
actors,
you can remove any other actors you don't want to make it even more lightweight.