Querying Linked Data with Comunica

Tutorial at ESWC 2019, June 3rd 2019 (morning session: 9:30 - 12:30)


Abstract

Querying Linked Data on the Web is a non-trivial endeavour because of the heterogeneity of Linked Data publication interfaces and the large variety of querying algorithms. We recently introduced a meta query engine, called Comunica, as a research platform that offers a way to cope with this complexity. To enable researchers to easily get started with Comunica. we offer this introductory tutorial.


The tutorial consists of an overview of the capabilities of this platform, and demonstrates its usage for research purposes. As a result, participants from different backgrounds will be able to query Linked Data with Comunica, and modify the querying process with custom algorithms. Ultimately, this will reduce the effort needed to develop and evaluate new Linked Data querying techniques.

Contents

If you intend to participate in the hands-on part of this tutorial, make sure to install the required software.

Topic and Relevance

Querying is a key aspect of the Semantic Web Stack [1], and a large variety of query algorithms for different kinds of Linked Data interfaces already exist. There are still plenty of open problems that elicit research on new querying techniques and combinations of existing ones.

Comunica [2] is a meta query engine that was introduced as a tool to facilitate the development, testing, and evaluation of such querying capabilities. As such, it is a tool with which query engines can be created, based on a set of modules, and a flexible configuration system to wire them together. One example of such a query engine instantiation is Comunica SPARQL, which implements the SPARQL 1.1 [3] specification using an efficient set of modules. This engine is able to federate queries over heterogeneous interfaces, for example, one can execute a SPARQL query over the combination of a SPARQL endpoint and a collection Linked Data documents on the Web. This functionality is demonstrated via our Web application. Thanks to the modularity of Comunica, support for new kinds of datasources can be added by writing a custom module, and plugging it in.

The modularity and flexibility of Comunica is useful for research purposes, as it allows you to easily plug in different algorithms of certain query operators, and compare their performance. Not only does this lower the barrier towards such evaluations, it also makes these evaluations fair, because algorithms are implemented in the same engine, instead of comparing completely different engines with a potentially different stack.

The Comunica platform is fully open-source, which makes it easy to learn from the code when new modules need to be implemented. Furthermore, it is written in JavaScript, which makes it possible to run engines anywhere, both locally on your machine, or on the Web via a browser.

This is the first tutorial that is dedicated to Comunica. Before that, Comunica was used as a tool to demonstrate various querying capabilities in a tutorial on Knowledge Representation as Linked Data [4] (http:/​/​rml.io/cikm2018tutorial/) at the CIKM conference 2018.

Audience

This tutorial focuses on all researchers that are active in the domain of querying Linked Data, and developers that are interested in a flexible querying platform. We assume elementary knowledge on RDF and SPARQL, and more advanced concepts will be tackled depending on the competences of the audience. We expect participants to bring a laptop that has Node.js installed, which is required for setting up the Comunica development environment during the tutorial. Furthermore, an editor or IDE with JavaScript and/or TypeScript support will come in handy.

The end-goal is to provide participants with sufficient knowledge and experience within the Comunica platform to use it to query, tweaking its configuration, and implementing custom modules.

In order to attract participants, we will announce it over the relevant querying-related Semantic Web mailing lists, via Twitter, and directly via Gitter chat channels focused on Linked Data development.

Outline

We plan a half day for this tutorial as shown in Table 1. First, we will cover the basics of Comunica. In the next session, the more advanced concepts will be handled. Both sessions will start with short presentations, and the end will be practical.

  Topic Duration
Part 1: basic (1:30) Introduction to Comunica 0:30
  Browser usage examples 0:15
  Usage inside an application 0:45
Break
Part 2: advanced (1:30) Configuration with Components.js 0:30
  Adding a custom algorithm 1:00

Table 1: Planning of the Comunica tutorial (9:00 - 12:30), split into basic and advanced sessions.

The first session will consist of an introductory presentation in which the motivation and purpose of Comunica will be explained, followed by an overview of its architecture. Next, we will demonstrate the usage of Comunica with several SPARQL queries in the browser-based client. After that, we will guide the audience through the installation and usage of Comunica within a JavaScript application.

In the second session, we will focus on the configuration and extensibility of Comunica. We will first give an overview of Components.js [5], which is a dependency injection framework that Comunica relies upon to handle the configuration of engines. After that, we will guide the audience through plugging in a simple new querying algorithm inside Comunica using the configuration system.

The tutorial will be guided by slides, which will be shared online after the session. For the hands-on coding sessions, we will provide a git repository with separate branches for all sequantially completed tasks. This will allow participants that are unable to complete a certain task, to still begin with the next task by checking out a different branch.

Resources

Requirements

Basic part

Advanced part

Organizers

This tutorial will be presented by Ruben Taelman and Joachim Van Herwegen from Ghent University – imec. Ruben and Joachim are finishing PhD students who are active in the research domain of querying Linked Data. Furthermore, they are the main designers and developers of Comunica, which makes them uniquely qualified to present this tutorial. Ruben and Joachim both have presented tutorials at conferences in the past [4, 6], and actively coach students within the practical sessions of the Web Development course by Ruben Verborgh at Ghent University.