Querying with a custom configuration from the command line
On this page
While packages such as Comunica SPARQL ship with a default configuration that offer specific querying functionality, it is possible to override these configurations, so that you can modify the internal capabilities of your query engine.
In this guide, we will keep it simple, and we will just remove some parts of the config file to create a more lightweight query engine, and query it from the command line. In a next guide, we will look into querying with a custom config from a JavaScript app.
1. Requirements of a config file
Comunica is composed of a set of actors
that execute specific tasks.
For example, all SPARQL query operators (DISTINCT, FILTER, ASK, ...)
have a corresponding actor that implements them in a certain way.
By modifying the Comunica config file, it is possible to plug in different implementations for certain SPARQL query operators, in case you for example have a more efficient implementation yourself.
Main config file
A Comunica config is written in JSON, and typically looks something like this:
{ "@context": [ "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/config-query-sparql/^3.0.0/components/context.jsonld" ], "@id": "urn:comunica:my", "@type": "Runner", "import": [ "ccqs:config/context-preprocess/actors.json", "ccqs:config/context-preprocess/mediators.json", "ccqs:config/http/actors.json", "ccqs:config/http/mediators.json", "ccqs:config/init/actors.json", "ccqs:config/optimize-query-operation/actors.json", "ccqs:config/optimize-query-operation/mediators.json", "ccqs:config/query-operation/actors.json", "ccqs:config/query-operation/mediators.json" ] }
Essentially, this config file contains a list of imports to smaller config files, which are loaded in when Comunica reads this config file.
These imported config files each represent a component on a particular bus.
For example ccqs:config/query-operation/actors.json refers to all actors that are registered on the query operation bus,
and ccqs:config/query-operation/mediators.json refers to the mediators that are defined over the query operation bus.
The ccqs: prefix refers to the scope of the @comunica/config-query-sparql package,
which means that all paths following it refer to files within this package.
Imported config file
For example, the imported config file ccqs:config/query-operation/actors.json could look something like this:
{ "@context": [ "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/config-query-sparql/^3.0.0/components/context.jsonld" ], "import": [ "ccqs:config/query-operation/actors/query/ask.json", "ccqs:config/query-operation/actors/query/bgp.json", "ccqs:config/query-operation/actors/query/construct.json", "ccqs:config/query-operation/actors/query/describe.json", "ccqs:config/query-operation/actors/query/distinct.json", "ccqs:config/query-operation/actors/query/extend.json", "ccqs:config/query-operation/actors/query/filter.json", "ccqs:config/query-operation/actors/query/from.json", "ccqs:config/query-operation/actors/query/group.json", "ccqs:config/query-operation/actors/query/join.json", "ccqs:config/query-operation/actors/query/leftjoin.json", "ccqs:config/query-operation/actors/query/minus.json", "ccqs:config/query-operation/actors/query/nop.json", "ccqs:config/query-operation/actors/query/orderby.json", "ccqs:config/query-operation/actors/query/project.json", "ccqs:config/query-operation/actors/query/quadpattern.json", "ccqs:config/query-operation/actors/query/reduced.json", "ccqs:config/query-operation/actors/query/service.json", "ccqs:config/query-operation/actors/query/slice.json", "ccqs:config/query-operation/actors/query/sparql-endpoint.json", "ccqs:config/query-operation/actors/query/union.json", "ccqs:config/query-operation/actors/query/values.json" ] }
ccqs: allow you to refer to config files from npm packages.
If you want to import a local file for quick testing, you may omit this prefix
to refer to files on the local file system, such as "relative/path/to/actor.json"
This example config file imports several smaller config files, where each config file contains a single actor that will be loaded into Comunica.
For example, the ccqs:config/query-operation/actors/query/ask.json file could look as follows:
{ "@context": [ "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/runner/^3.0.0/components/context.jsonld", "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/actor-query-operation-ask/^3.0.0/components/context.jsonld" ], "@id": "urn:comunica:default:Runner", "@type": "Runner", "actors": [ { "@id": "urn:comunica:default:query-operation/actors#ask", "@type": "ActorQueryOperationAsk", "mediatorQueryOperation": { "@id": "urn:comunica:default:query-operation/mediators#main" } } ] }
Each configured actor fulfills a specific task, e.g.:
ActorQueryOperationAsk: Executes SPARQLASKqueries.ActorQueryOperationDistinctHash: Executes the SPARQLDISTINCToperator.ActorQueryOperationFilterSparqlee: Executes SPARQLFILTERexpressions.
2. Install Comunica SPARQL
Since we want to override the default config of Comunica SPARQL, we have to make sure its package is installed first:
$ npm install -g @comunica/query-sparql
3. Start from an existing config file
The easiest way to create a custom config, is to start from an existing one, and add/remove things to fit your needs.
Let's start by creating a new empty directory,
and create a file called config.json.
In this guide, we will start from
the Comunica SPARQL default config file.
Let's copy its contents entirely into our config.json:
{ "@context": [ "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/config-query-sparql/^3.0.0/components/context.jsonld" ], "import": [ "ccqs:config/context-preprocess/actors.json", "ccqs:config/context-preprocess/mediators.json", "ccqs:config/hash-bindings/actors.json", "ccqs:config/hash-bindings/mediators.json", "ccqs:config/http/actors.json", "ccqs:config/http/mediators.json", "ccqs:config/http-invalidate/actors.json", "ccqs:config/http-invalidate/mediators.json", "ccqs:config/init/actors.json", "ccqs:config/merge-bindings-context/actors.json", "ccqs:config/merge-bindings-context/mediators.json", "ccqs:config/optimize-query-operation/actors.json", "ccqs:config/optimize-query-operation/mediators.json", "ccqs:config/query-operation/actors.json", "ccqs:config/query-operation/mediators.json", "ccqs:config/query-parse/actors.json", "ccqs:config/query-parse/mediators.json", "ccqs:config/query-process/actors.json", "ccqs:config/query-process/mediators.json", "ccqs:config/query-result-serialize/actors.json", "ccqs:config/query-result-serialize/mediators.json", "ccqs:config/query-source-identify/actors.json", "ccqs:config/query-source-identify/mediators.json", "ccqs:config/query-source-identify-hypermedia/actors.json", "ccqs:config/query-source-identify-hypermedia/mediators.json", "ccqs:config/dereference/actors.json", "ccqs:config/dereference/mediators.json", "ccqs:config/dereference-rdf/actors.json", "ccqs:config/dereference-rdf/mediators.json", "ccqs:config/rdf-join/actors.json", "ccqs:config/rdf-join/mediators.json", "ccqs:config/rdf-join-entries-sort/actors.json", "ccqs:config/rdf-join-entries-sort/mediators.json", "ccqs:config/rdf-join-selectivity/actors.json", "ccqs:config/rdf-join-selectivity/mediators.json", "ccqs:config/rdf-metadata/actors.json", "ccqs:config/rdf-metadata/mediators.json", "ccqs:config/rdf-metadata-accumulate/actors.json", "ccqs:config/rdf-metadata-accumulate/mediators.json", "ccqs:config/rdf-metadata-extract/actors.json", "ccqs:config/rdf-metadata-extract/mediators.json", "ccqs:config/rdf-parse/actors.json", "ccqs:config/rdf-parse/mediators.json", "ccqs:config/rdf-parse-html/actors.json", "ccqs:config/rdf-resolve-hypermedia-links/actors.json", "ccqs:config/rdf-resolve-hypermedia-links/mediators.json", "ccqs:config/rdf-resolve-hypermedia-links-queue/actors.json", "ccqs:config/rdf-resolve-hypermedia-links-queue/mediators.json", "ccqs:config/rdf-serialize/actors.json", "ccqs:config/rdf-serialize/mediators.json", "ccqs:config/rdf-update-hypermedia/actors.json", "ccqs:config/rdf-update-hypermedia/mediators.json", "ccqs:config/rdf-update-quads/actors.json", "ccqs:config/rdf-update-quads/mediators.json" ] }
4. Execute with Comunica SPARQL
While we usually use comunica-sparql to invoke Comunica SPARQL on the command line,
we can instead call comunica-dynamic-sparql with exactly the same arguments
to allow loading in a custom config file.
In order to specify a custom config file,
we have to set the path to our config file via the COMUNICA_CONFIG environment variable:
$ export COMUNICA_CONFIG="config.json"
If you now execute comunica-dynamic-sparql,
it will load in your config.json file.
Let's try a simple query to see if this works:
$ comunica-dynamic-sparql http://fragments.dbpedia.org/2016-04/en \ "CONSTRUCT WHERE { ?s ?p ?o } LIMIT 100"
COMUNICA_CONFIG environment variable,
comunica-dynamic-sparql will fallback to the default Comunica SPARQL config file.
comunica-dynamic-sparql has a significant startup delay compared to comunica-sparql,
since it now have to load in, parse, and interpret a config file.
comunica-dynamic-sparql should therefore only be used for simple testing
before you use your query engine in a separate package.
5. Removing RDF serialization actors
As an example, we will remove all actors that can output results in any RDF format.
All of these actors are defined in the ccqs:config/rdf-serialize/actors.json config file.
Before we make any changes to our config file, let us inspect the result formats that are currently available:
$ comunica-dynamic-sparql --listformats application/ld+json application/trig application/n-quads text/turtle application/n-triples text/n3 stats tree table application/sparql-results+xml text/tab-separated-values application/sparql-results+json text/csv simple application/json
The first 6 of those formats are RDF serialization formats,
which are mainly used for outputting CONSTRUCT query results.
If we want to remove those actors from the config file,
we can remove the following line from our config.json:
- "ccqs:config/rdf-serialize/actors.json",
If we now inspect the available result formats, we get the following:
$ comunica-dynamic-sparql --listformats stats tree table application/sparql-results+xml text/tab-separated-values application/sparql-results+json text/csv simple application/json
As you can see, the 6 RDF serialization formats are not present anymore. This is because Comunica has not loaded them in because we have removed them from our config file.
6. Only allowing SELECT queries
Let's take our config modifications a step further,
and let's say our goal is to build a query engine that can only execute SELECT queries,
and we don't want to be able to execute CONSTRUCT and DESCRIBE queries.
This will require us to remove some more actors.
While the actors for CONSTRUCT and DESCRIBE are defined in ccqs:config/query-operation/actors.json,
we can not just simply remove that file from our imports,
because it also contains actors for other SPARQL query operators which we don't want to remove, such as SELECT.
Instead of just removing ccqs:config/query-operation/actors.json,
we will remove it and copy its contents directly into our config file.
6.1. Inline an imported config
To do this, first remove the following line from our config.json:
- "ccqs:config/query-operation/actors.json",
Next, copy the "import" entries from ccqs:config/query-operation/actors.json (GitHub),
and paste it after the current "import" entries in our config.json.
Your config.json file should have the following structure now:
{ "@context": [ "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/config-query-sparql/^3.0.0/components/context.jsonld" ], "import": [ "ccqs:config/context-preprocess/actors.json", "ccqs:config/context-preprocess/mediators.json", ... "ccqs:config/rdf-update-quads/actors.json", "ccqs:config/rdf-update-quads/mediators.json", "ccqs:config/query-operation/actors/query/ask.json", "ccqs:config/query-operation/actors/query/bgp.json", "ccqs:config/query-operation/actors/query/construct.json", ... "ccqs:config/query-operation/actors/update/load.json", "ccqs:config/query-operation/actors/update/move.json" ] }
comunica-dynamic-sparql.
6.2. Remove actors
Next, we will remove the query operation actors we don't need. Concretely, we will remove the following imports to actors:
ccqs:config/query-operation/actors/query/construct.json: HandlesCONSTRUCTqueries.ccqs:config/query-operation/actors/query/describe.json: HandlesDESCRIBEqueries.
For this, remove the following lines:
- "ccqs:config/query-operation/actors/query/construct.json", - "ccqs:config/query-operation/actors/query/describe.json",
6.3. Test changes
After this change, you should now be unable to execute CONSTRUCT or DESCRIBE queries.
Try this out by executing the following:
$ comunica-dynamic-sparql http://fragments.dbpedia.org/2016-04/en \ "CONSTRUCT WHERE { ?s ?p ?o } LIMIT 100"
Executing a SELECT query will still work:
$ comunica-dynamic-sparql http://fragments.dbpedia.org/2016-04/en \ "SELECT * WHERE { ?s ?p ?o } LIMIT 100"
You have now successfully built your own custom Comunica engine that is a bit more lightweight than the default one.
Just like the CONSTRUCT and DESCRIBE actors,
you can remove any other actors you don't want to make it even more lightweight.