WARNING: V2 will come shortly and will have lots of breaking changes.
Traqula core contains core components of Traqula. Most importantly, its lexer builder, parser builder, and generator builder. This library heavily relies on the amazing Chevrotain package. Knowing the basics of that package will allow you to quickly generate your own grammars.
npm install @traqula/core
or
yarn add @traqula/core
Each parser contains two steps:
Sometimes grammar definitions and abstract syntax tree generation is split into separate steps. In this library, we choose to keep the two together when building a parser.
To tackle the first step, a lexer should be created.
This is a system that separates different groups of characters into annotated groups.
In human language for example the sentence 'I eat apples' is lexed into different groups called tokens namely words
and spaces
:
I
,
, eat
,
, apples
.
To create a token definition, you use the provided function createToken
like:
const select = createToken({ name: 'Select', pattern: /select/i, label: 'SELECT' });
Lexer definitions are then put in a list and when a lexer is build, the lexer will match a string to the first token in the list that matches. Note that the order of definitions in the list is thus essential.
We therefore use a lexer builder which allows you to easily:
Creating a builder is as easy as:
const sparql11Tokens = LexerBuilder.create(<const> [select, describe]);
A new lexer can be created from an existing one by calling:
const sparql11AdjustTokens = sparql11Tokens.addBefore(select, BuiltInAdjust);
The grammar builder is used to link together grammar rules such that they can be converted into a parser.
Grammar rule definitions come in the form of ParserRule objects.
Each ParserRule
object contains its name and its returnType.
Optionally, it can also contain arguments that should be provided to the SUBRULE calls.
A simple example of a grammar rule is the rule bellow that allows you to parse booleanLiterals.
/**
* Parses a boolean literal.
* [[134]](https://www.w3.org/TR/sparql11-query/#rBooleanLiteral)
*/
export const booleanLiteral: ParserRule<'booleanLiteral', LiteralTerm> = <const> {
name: 'booleanLiteral',
impl: ({ CONSUME, OR, context }) => () => OR([
{ ALT: () => context.dataFactory.literal(
CONSUME(l.true_).image.toLowerCase(),
context.dataFactory.namedNode(CommonIRIs.BOOLEAN),
) },
{ ALT: () => context.dataFactory.literal(
CONSUME(l.false_).image.toLowerCase(),
context.dataFactory.namedNode(CommonIRIs.BOOLEAN),
) },
]),
};
The impl
member of ParserRule
is a function that receives:
You cannot unpack the context entry in the function definition itself because the parser uses a recording phase to optimize itself. During this phase, the context entry will be undefined, as such, it can only be accessed within the ACTION
function.
The result of an impl
call is a function called a rule
.
Rules can be parameterized, although I have not found a scenario where that is usefully.
Personally I create a function that can be used to create multiple ParserRule
objects.
The result of a rule should match the type provided in the ParserRule
definition, and is the result of a call of SUBRULE
with that rule.
When a rule definition calls to a subrule using SUBRULE(mySub)
, the implementation itself is not necessarily called.
That is because the SUBRULE function will call the function with the same name as mySub
that is present in the current grammarBuilder.
A builder is thus free to override definitions as it pleases. Doing so does however break the types and should thus only be done with care. An example patch is:
const myBuilder = Builder
.createBuilder(<const> [selectOrDescribe, selectRule, describeRule])
.patchRule(selectRuleAlternative);
When selectOrDescribe
calls what it thinks to be selectRule
,
it will instead call selectRuleAlternative
since it overwrote the function selectRule
with the same name.
The generator builder function in much the same as the parser builder.
Your builder expects objects of type GeneratorRule,
containing the implementation of the generator in the gImpl
member.
The gImpl
function gets essential functions to create a generator rule (capitalized members),
returning a function that will get the AST and context, returning a string.
For generator rules, you can unpack the context since no recording phase is present in this case.
The idea is that GeneratorRules and ParserRules can be tied together in the same object, as such, similar behaviour is grouped together.
/**
* Parses a named node, either as an IRI or as a prefixed name.
* [[136]](https://www.w3.org/TR/sparql11-query/#riri)
*/
export const iri: GeneratorRule<'iri', IriTerm> = <const> {
name: 'iri',
gImpl: () => ast => ast.value,
};