Oct 15th 2024

Potential and Challenges of combining LionWeb and Langium

Johannes MeierJohannes Meier

When developing complex software systems, you often need to apply various steps in an iterative way, with the following scenario as an example: First you use a Langium-based editor to develop textual specifications for your system, then you want to get an overview about the specification with a graphical diagram which allows you to rearrange some parts, then you simulate your updated specification with your company’s simulation tool, and finally you want to see the simulation results annotated at your textual specification as guidelines for improving your specification. And everything has to be accessible in your browser nowadays.

Sounds like a vision? That are strong demands for developing big systems in software and systems engineering in practise nowadays! What is needed to make this vision happen? Since there is no ready-to-use single tool for that, this requires multiple web-based tools which exchange data with each other. Tools for modeling and language engineering in the web including Langium are successfully developed since a few years. But an initiative for their integration approached just recently with LionWeb.

In this blog post we discuss, how such data exchange looks like with Langium as example for developing textual notations. Never heard about LionWeb or Langium? No problem, we will explain both projects before we discuss their combination, benefits and challenges.

LionWeb

LionWeb is a community-driven initiative to establish a modeling and language engineering ecosystem in the web. For that, data interoperability between existing tools is achieved in order to reuse existing tools and to combine them for the needs of large modeling and language engineering projects. The community includes engaged persons from academia and industry.

Reference architecture of LionWeb

Reference architecture of LionWeb, taken from [1]

This figure shows the main concepts of LionWeb in form of a reference architecture [2] in order to achieve the proposed objectives. Since the roots of LionWeb are in modeling and language engineering, it uses terminology from modeling: All collections of informations are called models (on meta-level M1) in the context of LionWeb, including ASTs which are produced by Langium at runtime, according to the principle “everything is a model” [1].

The concepts of the static structures of models are described by models for these models, which are called metamodels. In other words, models on meta-level M1 conform to metamodels on M2. The concepts for metamodels which are supported by LionWeb are summarized as meta-metamodel on M3 and are similar to Ecore. Transformations having models as input and creating new models as output, are called model transformations. Here, input and output models conform to different metamodels.

LionWeb distinguishes models into original models and derived models: Original models contain information which cannot be calculated from other models and need to be actively provided and maintained by users, e.g. ASTs parsed from users using Langium-based domain-specific languages (DSLs). Derived models contain information which is automatically calculated from other models, e.g. validation or simulation results. Original models are stored in the central LionWeb repository (A in the figure), which manages also the storage and access of derived models.

Located around the central repository are editors like Langium-based DSL editors or graphical diagrams (G, F) and processors which operate without user interaction on original models and produce derived models like generators, validators or type checkers (C, D, E). The edges in the figure represent flow of models between the repository and the processors and editors according to LionWeb protocols. The models to exchange are often created by model transformations.

The LionWeb community is working on specifications for the meta-metamodel, a serialization format, and protocols for model exchange. Integrated tools to operate with LionWeb include MPS, Starlasu, Freon and Ecore.

Langium

Langium is the language workbench for developing textual domain-specific languages (DSLs) in the web. Langium is MIT-licenced and therefore open-source and free to use even in commercial and closed-source projects. We at TypeFox initiated Langium in 2021 and pushed its development since then, while the Langium project is hosted by the Eclipse Foundation since 2023. Langium is parser-based with an LL(*) algorithm and uses an EBNF-inspired grammar language. Written in TypeScript with inherent LSP-support, Langium can be seen as the successor of Xtext for the web, but without dependencies to EMF and Eclipse UI.

For starting with Langium, use the playground for writing your first grammar in the browser and let Langium parse conforming text on-the-fly, create a ready-to-execute VSCode extension with Yeoman, or let the showcases inspire you.

Langium artifacts and processes

Langium artifacts and processes

The Langium Parser is interpreter-based at runtime and takes Text conforming to a Langium Grammar as input and creates a corresponding AST. At development time, the Langium Generator generates TypeScript interfaces according to the grammar to describe the structure of the resulting AST (TypeScript AST Types) in order to provide a statically typed programming experience for language engineers when, among others, writing custom validations in TypeScript. The Langium Generator automatically infers these TypeScript types for the AST from the rules in the grammar as default case, or transforms explicit type declarations in the grammar language to TypeScript types. Language engineers can write such explicit type declarations in order to keep the AST types stable. Both the generator and the parser are already provided by Langium.

Combining LionWeb and Langium

The following picture concretizes the LionWeb reference architecture for the motivating scenario:

Example scenario according to the LionWeb architecture

Example scenario according to the LionWeb architecture

Langium is used as editor for a textual DSL. Thanks to the LionWeb prototocol, Langium ASTs can be transformed to LionWeb models, which can be transformed to models of other integrated processors and editors, including graphical diagrams. After editing with a graphical diagram, changes can be transformed back to the LionWeb model in the central repository. This updated model can be given to all other processors and editors, e.g. to update the Langium AST and the input for simulation tools. In this way the underlying LionWeb model can be edited in iterative way and with different editors and processors, which supports the needs of different development phases and stakeholders.

Potentials and Benefits

This combination of LionWeb and Langium has several potential benefits. In particular, it extends the ecosystems of both LionWeb and Langium.

For LionWeb, Langium is another component which is compatible with the LionWeb ecosystem. While focusing on projectional editing at the moment, Langium contributes parser-based editors for textual DSLs in LionWeb. Since Langium runs in the browser without the need for additional servers, Langium-based editors are quite lightweight and easy to integrate.

For Langium, LionWeb is another component which is compatible with the Langium ecosystem. While Langium-based editors usually store just the written text, now the parsed AST can be transformed into the LionWeb repository and stored there. Additionally, the initial text could be loaded from the the LionWeb repository and presented to the users of the Langium editor as initial editor content. But the LionWeb repository is more than an additional data source and data sink for Langium ASTs, since LionWeb models can be used by all the other editors and processors which are integrated with LionWeb:

Transformations in the example scenario

Transformations in the example scenario

Thanks to the LionWeb prototocol, Langium ASTs can be transformed to LionWeb models, which can be transformed to models of other integrated editors and processors. After editing the model, it can be transformed back to a LionWeb model, which can be transformed back to an (updated) Langium AST.

Langium is used as editor for a textual DSL. Since users manually work with this editor, Langium provides the corresponding AST as original model, which is given to the central LionWeb repository (step 1 in this example scenario). The graphical diagram receives its data from the repository (2) as well. Updates after graphical editing a propagated back to the repository (3). The updated model is given to the simulation tool (and the Langium editor) as step 4. The simulation tools returns a derived model with the simulation results (5), which is given to the Langium editor afterwards in order to be shown as validation hints (6).

This data interoperability between different editors and processors enables bigger projects with different technologies for different use cases, for which different tools and frameworks needs to share and exchange their data. Depending on the current use case, written text in Langium editors don’t need to be always the single source-of-truth, e.g. the initial text could be loaded as model from the LionWeb repository and saved back into the LionWeb repository after editing. There is no need to save the written text anymore. Therefore it is possible to edit only parts of the models which are stored in the LionWeb repository, which is beneficial e.g. for huge models, for partial models for which there is no grammar, or for output models of other processors.

While the required initial integration is challenging (see the following section), the integration effort is spent only once and not for each project, which uses Langium and other LionWeb components. Additionally, Langium needs to be made compatible only once with the LionWeb protocol (O(1)), not with each other tool (O(n)), thanks to the two-step-transformations with the LionWeb repository as intermediate representation. After the initial integration, also its maintenance doesn’t need to be done for each project, since the community can do it for the integration of Langium and LionWeb in general.

Challenges

Before combining LionWeb and Langium in order to enjoy the benefits, we need to overcome some challenges, mainly data transformations between Langium and LionWeb need to be established.

Transformations of Models

In order to exchange data between Langium and LionWeb, some transformations are required, which are depicted in the following graphic. Since actual content is exchanged at runtime, these transformations are on the model (or instance) level (M1).

Model transformations between Langium and LionWeb

Model transformations between Langium and LionWeb

After parsing text with Langium according to the grammar, the first transformation from Langium to LionWeb (ast2model) transforms the resulting AST into a LionWeb model. This transformation can be written in TypeScript, since Langium is written in TypeScript and LionWeb provides an API binding for TypeScript.

The second transformation from LionWeb to Langium (model2ast) takes a LionWeb model, transforms it into a Langium AST, and serializes the resulting AST into text conforming to the Langium grammar.

Serializer for Langium

After transforming a LionWeb model to a Langium AST, it needs to be serialized as text according to the current Langium grammar. At the moment, Langium does not have such a serializer component (Serializer) to serialize new or updated ASTs back to text according to the textual notation defined by the grammar.

Some of the following challenges need to handled by a generic serializer for Langium:

  • Unassigned calls of data type rules and terminal rules match some characters, but these characters are not stored in the AST. When serializing this AST, those characters are missing and cannot be put into the text.
  • Parser rules with overlapping alternatives like MyRule: 'student' Person | 'teacher' Person; are critical, when the AST contains no information which alternative is the correct one.
  • Hidden terminals like comments and whitespace are not part of the AST and therefore cannot be reproduced.

As a work-around for a given language, it is possible to write a code generator which takes an AST and writes text according to the textual notation of this language. But compared to a generic serializer, such code generators are specific for a single language and need to be developed from scratch for each language.

Therefore, TypeFox is looking for (partial) funding to implement a generic serializer for Langium! It is not only useful for transformations between Langium and LionWeb, but enables in-place transformations to work on the AST instead of on the original text in Langium projects in general.

Incrementality

The LionWeb protocol supports not only full batch transformations (“bulk transformation”), but aims to support incremental transformations (“delta transformation”) as well. The main idea of incrementality is to handle only changed data again, while unchanged data don’t need to be processed again, in order to improve performance, since the effort depends no longer on the model size, but on the size of the changes in the model.

Since Langium doesn’t support incremental parsing, the Langium integration for LionWeb might just provide only full batch transformations and not support incrementality. As a work-around, the differences between two AST versions could be calculated (the “delta”) and given as the input for the incremental transformation. Whether the square effort for difference calculation outperforms the savings by the incremental mode of the other transformations and processors, needs to be investigated in detail for the actual use case.

Transformations of Metamodels

In order to show a LionWeb model as textual DSL with Langium, it is not enough to transform it to a text to be shown in a Langium editor. Additionally, the grammar is required to guide, among others, the parsing with error marker creation. Since the LionWeb metamodel describes only which data shall be stored, but does not describe how these data should be visualized with a textual notation, it is not possible to automatically generate a great grammar from the metamodel. Instead it is manual work to design a grammar with the wanted visual notation for the concepts described by the metamodel. In order to simplify this challenging work, a “default” grammar with a “default” notation applicable to any metamodel similar to JSON could be automatically created (metamodel2grammar). Language engineers could use this default grammar as starting point to improve the grammar step-by-step. In order to ensure, that the manually finalized grammar still provides an AST whose types correspond to the LionWeb metamodel, metamodel2grammar generates the explicit declarations for these types in the grammar language as well.

Metamodel transformations between Langium and LionWeb

Metamodel transformations between Langium and LionWeb

For the other direction, the LionWeb repository requires the metamodel for storing DSL-specific ASTs as models. Since Langium grammars describe not only the visual notation, but also the resulting AST node types, it is possible to automatically generate the corresponding LionWeb metamodel.

Mapping heterogeneous Metamodels

The transformations discussed so far target same models and same metamodels, which are represented in different technical spaces, i.e. transformations between Langium (meta)models and LionWeb (meta)models. But if the same concepts are represented with different metamodels, since e.g. two different processors use different fixed metamodels, some more transformations are required to transform models conforming to the metamodel of the first processor to models conforming to the metamodel of the second processor and vice versa.

AS an example, to describe flow charts like …

Flow chart example

… there are different ways to describe their concepts as metamodel, depicted as UML class diagram: Data flow edges can be represented as from-to-links between DataNodes (left part) or as instances of DataFlow which link from one from node to one to node (right part). Additionally, different names might be used for attributes (label vs name) and meta-classes (DataNode vs Node).

Two different metamodels to describe the same flow charts

Two different metamodels to describe the same flow charts

If the LionWeb repository uses the metamodel on the left for flow charts and a processor to integrate uses the metamodel on the right, additional transformations are required to transform Nodes and DataFlows into DataNodes and vice versa. These model transformations can be done within the same technical space, e.g. in the LionWeb repository. This scenario is an instance of the general challenge to ensure consistency between multiple, overlapping (meta)models [3]. Since this challenge might occur for various processors and editors integrated in LionWeb, it might be senseful to provide means for it within the LionWeb repository, e.g. dedicated model transformation or mapping languages.

Conclusion

Summarizing, the combination of LionWeb and Langium has important benefits in the mid-term and long-term. Langium-based editors benefit from an easier integration with external modeling and language engineering frameworks, while LionWeb gets a parser-based light-weight textual language workbench running in the web. The integration extends the Langium and LionWeb ecosystems with additional components.

The challenges are worth to mention, in particular, the Langium serializer requires lots of effort, but it is one-time effort with little maintenance effort, since both the LionWeb’s meta-metamodel and the Langium grammar language are quite stable.

Therefore, TypeFox would like to contribute to LionWeb as another open-source engagement. In order to speed-up the integration of Langium and LionWeb, we are looking for (partial) funding to implement a generic serializer for Langium.

As an outlook, TypeFox might contribute another component for LionWeb, i.e. an integration of Typir for type checking in the web, which is currently under development and whose early state will be presented at LangDev’24 and OCX/EclipseCon’24.

Contact us@TypeFox, the Langium community, or the LionWeb community and start collaborations for integrating Langium and LionWeb, for developing a generic Langium serializer or for doing language engineering with Langium and LionWeb in the large! We are looking forward to your ideas!

References

[1] Bézivin, Jean (2005): On the unification power of models, Software & Systens Modeling, vol. 4(2), pp. 171–188, Springer. https://doi.org/10.1007/s10270-005-0079-0

[2]: LionWeb initiative (2023): Reference Architecture with the graphic for the LionWeb reference architecture

[3]: Meier, Johannes; Werner, Christopher; Klare, Heiko; Tunjic, Christian; Aßmann, Uwe; Atkinson, Colin; Burger, Erik; Reussner, Ralf; Winter, Andreas (2020): Classifying Approaches for Constructing Single Underlying Models, In: Hammoudi, Slimane; Pires, Luis Ferreira; Selic, Bran (eds): Model-Driven Engineering and Software Development, MODELSWARD 2019, vol. 1161, Communications in Computer and Information Science, pp. 350-375, Springer. https://doi.org/10.1007/978-3-030-37873-8_15

About the Author

Johannes Meier

Johannes Meier

Solving recurring problems in software engineering only once is Johannes’ main motivation, first at university with developing, publishing and teaching various approaches for modeling and programming, now at TypeFox with pushing open source frameworks for facilitating developers work.