Demonstration of Comunica, a Web framework for querying heterogeneous Linked Data interfaces

Abstract

Linked Data sources can appear in a variety of forms, going from SPARQL endpoints to Triple Pattern Fragments and data dumps. This heterogeneity among Linked Data sources creates an added layer of complexity when querying or combining results from those sources. To ease this problem, we created a modular engine, Comunica, which has modules for evaluating SPARQL queries and supports heterogeneous interfaces. Other modules for other query or source types can easily be added. In this paper we showcase a Web client that uses Comunica to evaluate federated SPARQL queries through automatic source type identification and interaction.

Introduction

There are a multitude of ways we can access Linked Data these days. Some of the more commonly used ones are SPARQL endpoints [1], Triple Pattern Fragments (TPF) [2] and its variations [3, 4], Linked Data documents [5] and data dumps. These all have their own methods on how they can be accessed and help solve SPARQL queries [6]. While a SPARQL endpoint can execute queries on its own and can require a significant amount of server effort, data dumps will require client-side processing to produce more granular results and is less intensive for servers. This trade-off is measured as client cost on the Linked Data Fragments axis [2].

Having all these heterogeneous interfaces greatly complicates federated queries. While resolving such a query, different actions have to be taken depending on the source that is being accessed. Different solutions might also be required depending on the combination of sources. In case of a single SPARQL endpoint, a single query will suffice. On the other hand, if all sources are data dumps they all have to be downloaded and parsed client-side. But what if some sources are SPARQL endpoints and some are data dumps?

To this end we created a modular Linked Data client called Comunica [7]. In our ISWC 2018 article we describe how this client can easily be extended to support a variety of sources and algorithms. This allows everyone to quickly set up a federated SPARQL client without having to worry about the sources, and to easily extend it should more types be required.

In this article we describe how we will showcase the heterogeneous features of Comunica. We created a Web client capable of executing federated SPARQL queries over heterogeneous interfaces, as an extension of the Triple Pattern Fragments Web client [8]. Additionally, this client will automatically identify the type of all sources. This way the end-user only has to provide the URLs and query; Comunica will take care of the rest.

Comunica

Comunica [7] is a modular meta engine that enables the instantiation of specific engines with their functionality described by modules through semantic configuration files. We released 80+ modules that can be combined to fully replicate all features of the original TPF client. Comunica is not limited to simply solving SPARQL queries: new modules can be added to solve new problems and add additional features. Similarly, existing functionality can easily be switched out for others to quickly compare different implementations and ideas.

Every module consists of two parts: the source code and its semantic description. The second part is a collection of Linked Data documents describing the functionality of the corresponding module. These are used by the Components.js [9] dependency injection framework to instantiate and link all modules together.

The current collection of Comunica modules offers more functionality than the original TPF client. The default configuration allows users to query different kinds of Linked Data besides TPF interfaces. We also provide support for SPARQL endpoints, Linked Data documents and HDT [10] files. These can all be combined in a single federated query by making use of the federated TPF algorithm and utilizing Comunica modules to allow triple pattern queries on those different source types.

Demonstration overview

In this demonstration, we offer the possibility for executing SPARQL queries over a federation of heterogeneous interfaces. This demonstration can be used directly within the browser, and is available on the Comunica website.

This demonstration is an adaption of the Triple Pattern Fragments Web client [8], with the main difference that instead of using the Triple Pattern Fragments engine for querying, it uses the Comunica engine. The implementation of this Web client is available on GitHub, under the open MIT license so that it can be reused for different use cases.

We provide a collection of example queries with a predefined set of sources, where some queries federate over different heterogeneous sources. Fig. 1 shows an example query that federates over a Triple Pattern Fragments interface and a Linked Data document. Additionally, users can also write custom queries, and add more datasources by their URL.

[Comunica Web Client Screenshot]

Fig. 1: Example SPARQL query in the Comunica Web client that federates over the DBpedia Triple Pattern Fragments interface and a FOAF profile.

At the time of writing, SPARQL endpoints, Triple Pattern Fragments interfaces and raw RDF files can be queried. Internally, Comunica will identify the source type through a set of heuristics. SPARQL endpoints are tested using a simple ASK query through the SPARQL protocol [1]. Triple Pattern Fragments interfaces are tested by checking if the required set of hypermedia controls is available. Finally, RDF files are tested with the lowest priority by checking their content type.