Expression Evaluator

On this page

To evaluate expressions, Comunica uses a collection of packages that are part of the Comunica monorepo. Two buses specifically are of importance:

@comunica/bus-expression-evaluator-factory: Creates an expression evaluator, more info listed bellow.
@comunica/bus-function factory: creates function, more specifically it creates objects that are able to evaluate the desired function given the arguments.

Two different kind of functions are used TermFunctions and ExpressionFunctions, and TermFunction extends ExpressionFunction. An ExpressionFunction is a function that takes control over the evaluation of its arguments, meaning that the argument of an ExpressionFunction are Expressions and not Terms. The evaluation of the function is async. A TermFunction on the other hand does not take control over the evolution of its arguments, and is synchronous. In scenarios where you already have the term and can only are in a synchronous context, you can use a TermFunction. Besides easier usage of TermFunctions, they are also easier to implement since the declare function of the expression evaluator utils package can be used. This declare function allows for easy definition of functions that have function overloading. Functions created using declare use the OverloadTree, thereby also allowing for type promotion and subtype substitution. TLDR: Use TermFunction when you can using declare, and ExpressionFunction when you need to.

Using The Expression Evaluator

import type { MediatorExpressionEvaluatorFactory } from '@comunica/bus-expression-evaluator-factory';
import { translate } from "sparqlalgebrajs";
import { stringToTerm } from "rdf-string";

// An example SPARQL query with an expression in a FILTER statement.
// We translate it to SPARQL Algebra format ...
const query = translate(`
  SELECT * WHERE {
     ?s ?p ?o
     FILTER langMatches(lang(?o), "FR")
    }
`);

// ... and get the part corresponding to "langMatches(...)".
const expression = query.input.expression;

// We create an evaluator for this expression.
// A sync version exists as well.
const evaluator = await mediatorExpressionEvaluatorFactory
    .mediate({ algExpr: expression, context });

// We can now evaluate some bindings as a term, ...
const result: RDF.Term = await evaluator.evaluate(
  Bindings({
    ...
    '?o': stringToTerm("Ceci n'est pas une pipe"@fr),
    ...
  })
);

// ... or as an Effective Boolean Value (e.g. for use in FILTER)
const result: boolean = await evaluator.evaluateAsEBV(bindings);
// ... or as an inetrnal Expression
evaluateAsEvaluatorExpression.evaluateAsEvaluatorExpression(bindings);

Config

Just like many other actors, the ExpressionEvaluatorFactoryDefault expects a context object. The following keys are of importance:

KeysInitQuery.extensionFunctionCreator: A function that creates an extension function.
KeysInitQuery.extensionFunctions: A map of function names to function implementations.
KeysInitQuery.queryTimestamp: The timestamp to use for functions requiring a notion of "now".
KeysInitQuery.functionArgumentsCache: see later in this document.
KeysExpressionEvaluator.defaultTimeZone: The default timezone to use for date functions, if none given, extracts the timezone from the queryTimestamp value. It can be desired to set it explicitly so implicitTimezone does not change over time (i.e., it is not dependent on daylight saving time).
KeysExpressionEvaluator.superTypeProvider: A way of interacting with the type system, it's a callback that given a type unknown to the system, returns the super type of that type.
KeysExpressionEvaluator.baseIRI: The base IRI to use for functions that require it.

Errors

The utils-expression-evaluator exports an Error class called ExpressionError from which all SPARQL related errors inherit. These might include unbound variables, wrong types, invalid lexical forms, and much more. These errors can be caught, and may impact program execution in an expected way. All other errors are unexpected, and are thus programmer mistakes or mistakes in the context of the expression evaluator.

There is also the utility function isExpressionError for detecting these cases.

// Make sure to catch errors if you don't control binding input
try {
  const result = await evaluator.evaluate(bindings);
  consumeResult(result);
} catch (error) {
  if (isExpressionError(error)) {
    console.log(error); // SPARQL related errors
    ...                 // Move on, ignore result, ...
  } else {
    throw error;        // Programming errors or missing features.
  }
}

Aggregates

The aggregation of bindings is handled by the bus-bindings-aggregator-factory. Given a request for a certain aggregator, the factory will return an aggregator that can be used to aggregate bindings. After all bindings have been put onto the aggregator, the result can be retrieved. The aggregators tend to make use of other expression evaluation related busses like the bus-term-comparator-factory, bus-function-factory, and most will use the bus-expression-evaluator-factory. Because of the dependency on these buses, the type system can also be used.

Additionally, you should also note the order of calling and awaiting put while using the GroupConcat aggregator.

functionArgumentsCache

An functionArgumentsCache allows the expression evaluator to cache the implementation of a function provided the| argument types. This decreases the overhead caused by function overloading. When not providing a cache in the context, the evaluator will create one.

This cache can be reused across multiple evaluators. Manual modification is not recommended.

Context dependant functions

Some functions (BNODE, NOW, IRI) need a (stateful) context from the caller to function correctly according to the spec. This context can be passed as an argument to the evaluator (see the config section for exact types). If they are not passed, the evaluator will use a naive implementation that might do the trick for simple use cases.

BNODE

spec actor

Blank nodes are very dependent on the rest of the SPARQL query, therefore, we provide the option of delegating the entire responsibility back to you by accepting a blank node constructor callback. If this is not found, we create a blank node with the given label, or we use uuid (v4) for argument-less calls to generate definitely unique blank nodes of the shape blank_uuid.

bnode(input?: string) => RDF.BlankNode

Now

spec actor

All calls to now in a query must return the same value, since we aren't aware of the rest of the query, you can provide a timestamp (now: Date). If it's not present, the evaluator will use the timestamp of evaluator creation, this at least allows evaluation with multiple bindings to have the same now value.

IRI

spec actor

To be fully spec compliant, the IRI/URI functions should take into account base IRI of the query, which you can provide as baseIRI: string to the config.

SPARQL 1.2

The expression evaluator package looks to the future and already implements some SPARQL 1.2 specification functions.

Currently, this is restricted to the extended date functionality. Please note that the new sparql built-in ADJUST function has not been implemented due to package dependencies.

Type System

The type system of the expression evaluator is tailored for doing (supposedly) quick evaluation of overloaded functions.

A function definition object consists of a tree-like structure with a type (e.g. xsd:float) at each internal node. Each level of the tree represents an argument of the function (e.g. function with arity two also has a tree of depth two). The leaves contain a function implementation matching the concrete types defined by the path of the tree.

When a function is called with some arguments, a depth first search, to find an implementation among all overloads matching the types of the arguments, is performed in the tree.

Subtype substitution is handled for literal terms. What this means is that for every argument of the function, and it's associated accepted type, When a function accepts a type, it also accepts all subtypes for that argument. These sub/super-type relations define the following type tree:

Type scheme known by expression evaluator

So, when expecting an argument of type xsd:integer we could provide xsd:long instead and the function call would still succeed. The type of the term does not change in this operation.

The expression evaluator also handles type promotion. Type promotion defines some rules where a types can be promoted to another, even if there is no super-type relation. Examples include xsd:float and xsd:decimal to xsd:doubleand xsd:anyURI to xsd:string. In this case, the datatype of the term will change to the type it is promoted to.

Deviations from the SPARQL specification

Two functions have known deviations from the SPARQL specification in a few minor edge-cases. These are the regex and replace functions. These two functions require the implementation of a Regular Expression Engine. Instead of implementing and bundling our own implementation of such an engine, we use the implementation provided by the JavaScript language (in unicode-mode, without Annex B). This choice saves bundle size and probably execution time in comparison to implementing our own engine. Furthermore, it reduces implementation and maintenance cost on our side. As a result of using the JS Regex Engine, our implementation of those functions has some known non-spec compliant edge cases, examples of which can be found in the skipped test blocks in op.regex-test.ts and op.replace-test.ts.