A query language transpiler framework for JavaScript
Modular transformer to transform a SPARQL 1.1 AST generated by Traqula to SPARQL 1.1 algebra. There is also support for 'non-algebra' entities such as ASK, FROM, etc. to make sure the output contains all relevant information from the query.
npm install @traqula/algebra-sparql-1-1
or
yarn add @traqula/algebra-sparql-1-1
Either through ESM import:
import { toAst, toAlgebra } from '@traqula/algebra-sparql-1-1';
or CJS require:
const { toAst, toAlgebra } = require('@traqula/algebra-sparql-1-1');
Input for the translation function should be a Traqula AST, by calling Traqula parser.
Algebra operations are modeled as objects of the structure { name: string; input: Operation[] }.
The transformation is detailed by section 18.2 of the SPARQL specification.
The example bellow demonstrates this package usage where we parse a query string and transform it to algebra, from the algebra we will go back to a query string. Note that unlike the AST level, we do not provide round tripping on algebra level.
import { Parser } from '@traqula/parser-sparql-1-1';
import { Generator } from '@traqula/generator-sparql-1-1';
import { toAlgebra, fromAlgebra } from '@traqula/algebra-sparql-1-1';
// Initialize required variables.
const parser = new Parser();
const generator = new Generator();
const query = `SELECT * { ?s ?p ?o }`;
const ast = parser.parse(query);
const algebra = toAlgebra(ast);
const generatedAst = toAst(algebra);
const generatedQuery = generator.generate(generatedAst);
The algebra object contains a types object,
which contains all possible values for the type field in the output results.
Besides that it also contains all the TypeScript interfaces of the possible output results.
The output of the toAlgebra function will always be an Algebra.Operation instance.
The best way to see what output would be generated is to look in the test folder,
where we have many SPARQL queries and their corresponding algebra output.
Traqula's core library provides a way to easily modify trees.
This functionality is exported specifically for the algebra tree under the mapOperation function in @traqula/algebra-transformations-1-1.
Query engines such as Comunica use the algebra operators as a way to guide the execution of a query.
Query execution may be optimized by manipulating the query operations in the algebra tree e.g.
filter pushdown, and for this task the mapOperation function is be instrumental.
Furthermore, using toAst, one can convert an algebra tree to an AST, which can in turn be used to generate a SPARQL query using the @traqula/generator-sparql-1-1.
Exactly this functionality is used in Comunica in order to support the SERVICE operator and support federated queries.
Whether you manipulate queries on AST level or algebra level is entirely use case dependent. AST level provides full control over the query that will be generated, but provides no abstraction. Algebra level provides abstraction over the grammatical component of the language, allowing you to focus on the functional component, but leaves no control over the generated query string.
This implementation tries to stay as close to the SPARQL 1.1 specification, but some changes were made for ease of use. These are mostly based on the Jena ARQ implementation. What follows is a non-exhaustive list of deviations:
This is the biggest visual change. The functions no longer take an ordered list of parameters but a named list instead. The reason for this is to prevent having to memorize the order of parameters and also due to seeing some differences between the spec and the Jena ARQ SSE output when ordering parameters.
The functions toMultiset and toList have been removed for brevity.
Conversions between the two are implied by the operations used.
The translate function has an optional second parameter
indicating whether patterns should be translated to triple or quad patterns.
In the case of quads the graph operation will be removed
and embedded into the patterns it contained.
The default value for this parameter is false.
PREFIX : <http://www.example.org/>
SELECT ?x WHERE {
GRAPH ?g {?x ?y ?z}
}
Default result:
{
"type": "project",
"input": {
"type": "graph",
"input": {
"type": "bgp",
"patterns": [{
"type": "pattern",
"termType": "Quad",
"subject": { "termType": "Variable", "value": "x" },
"predicate": { "termType": "Variable", "value": "y" },
"object": { "termType": "Variable", "value": "z" },
"graph": { "termType": "DefaultGraph", "value": "" }
}]
},
"name": { "termType": "Variable", "value": "g" }
},
"variables": [{ "termType": "Variable", "value": "x" }]
}
With quads:
{
"type": "project",
"input": {
"type": "bgp",
"patterns": [{
"type": "pattern",
"termType": "Quad",
"subject": { "termType": "Variable", "value": "x" },
"predicate": { "termType": "Variable", "value": "y" },
"object": { "termType": "Variable", "value": "z" },
"graph": { "termType": "Variable", "value": "g" }
}]
},
"variables": [{ "termType": "Variable", "value": "x" }]
}
Several binary operators that can be nested, such as the path operators, can take an array of input entries to simply this notation. For example, the following SPARQL:
SELECT * WHERE { ?x || ?z }
outputs the following algebra:
{
"type": "project",
"input": {
"type": "path",
"subject": { "termType": "Variable", "value": "x" },
"predicate": {
"type": "alt",
"input": [
{ "type": "link", "iri": { "termType": "NamedNode", "value": "http://a.a" } },
{ "type": "link", "iri": { "termType": "NamedNode", "value": "http://b.b" } },
{ "type": "link", "iri": { "termType": "NamedNode", "value": "http://c.c" } }
]
},
"object": { "termType": "Variable", "value": "z" },
"graph": { "termType": "DefaultGraph", "value": "" }
},
"variables": [
{ "termType": "Variable", "value": "x" },
{ "termType": "Variable", "value": "z" }
]
}
For the VALUES block we return the following output:
PREFIX : <http://example.org/book/>
SELECT ?book ?title {
VALUES (?book ?title) { ( :book1 UNDEF ) ( :book3 'Fantastic Mr. Fox' ) }
}
{
"type": "project",
"input": {
"type": "values",
"variables": [{ "termType": "Variable", "value": "book" }, { "termType": "Variable", "value": "title" }],
"bindings": [
{
"book": { "termType": "NamedNode", "value": "http://example.org/book/book1" }
},
{
"book": { "termType": "NamedNode", "value": "http://example.org/book/book3" },
"title": {
"termType": "Literal",
"value": "Alan",
"datatype": {
"termType": "NamedNode",
"value": "http://www.w3.org/2001/XMLSchema#string"
}
}
}
]
},
"variables": [
{ "termType": "Variable", "value": "book" },
{ "termType": "Variable", "value": "title" }
]
}
Some differences from Jena (again, non-exhaustive):
no prefixes are used (all uris get expanded)
and the project operation always gets used (even in the case of SELECT *).
Every test consists of a sparql file and a corresponding json file containing the algebra result.
Tests ending with (quads) in their name are tested/generated with quads: true in the options.