A discussion on misconceptions with Linux and functional safety
Author: Dr. Michael Armbruster, Safety Expert at emlix GmbH
Elektrobit corbos Linux for Safety Applications is an embedded Linux system that allows to run safety applications directly on a Linux system. It is unique on the market because it has successfully achieved a fully positive independent safety assessment encompassing the software itself, but also, and more importantly, the systems that can be developed using it. The independent safety assessor, TÜV Nord, have certified that a system using this Linux System is suitable to perform safety functions up to SIL2 according to EN 61508 or equally to fulfil safety requirements up to ASILB according to ISO 26262.
This embedded Linux based approach ensures that safety-related applications execute correctly; a notification is provided if this is not the case while the features offered by the solution allow to implement the most appropriate fault reaction for the specific needs of a customer’s project.
The effort required to positively assess a system based on this solution with embedded Linux at its core is reduced to the minimum in terms of analysis, documentation and testing.
And even for the achievement of higher SILs or ASILs, a broad range of possibilities is offered that can be exploited on a project-specific basis.Different independent virtual domains in the user space and also at a lower level of abstraction from the hardware (in the form of L4Re applications) are provided. Further virtual domains can be added with a reasonable effort because their independence is largely covered by the already achieved positive safety assessment. The above mentioned independent domains allow to apply techniques (e.g. diversity, redundancy or cross-check) able to achieve the required level of safety integrity for the most demanding projects.
Misconceptions using Linux in a Safety Context
Embedded Linux has been in use in various safety-related applications for years. However, it is ordinarily reserved for non-safety-critical functions, most commonly communication, networking, update-functionality, and visualisation. Using Linux for safety-related applications demands engineers to tackle the barriers that come with the nature of open-source projects and the immanent architecture of Linux, which has been designed as a monolithic system. And virtualization is a powerful means which can be used to separate different aspects of concern.
In the ligth of the experience made, the following misconceptions keep system and software architects from considering Linux and hypervisors as core building blocks in context of safety related systems:
- open source processes and software cannot be used in context of safety-related systems
- Linux needs to be assessed as monolithic software-blob.
- Virtualization is a source of additional problems with extra efforts in qualification and maintenance and demands an expertise rarely available on the market.
- Each single element of an execution environment including its libraries need to be „safe“
- There is only one architectural approach to meet all functional and non-functional requirements using Linux
As part of an innovative development project together with Elektrobit Automotive GmbH in Erlangen, emlix has developed a Linux Safety Supervisor which reliably supervises the Linux kernel. As mentioned earlier, the solution is suitable to perform safety functions up to SIL2 according to EN 61508 or equally to fulfil safety requirements up to ASILB according to ISO 26262. And it allows bridging the gap between safety and open source solutions which has existed until now. The implementation is based on the following strategy:
- Change the paradigm: use the strength of Linux and detect when things go wrong rather than trying to prevent faults.
- Let the kernel run as usual. Do not attempt to change it. Use it. And instead, “put it into a box”.
- Focus on an application’s data space. Ensure correctness. Make it a dependable data space (Dependability herein is understood as „ability to perform as and when required“ as of IEC 60050)
- Just indicate once integrity cannot be ensured and allow fault-reactions to be added as needed by the project.
- Separate lifecycles of all building blocks (hypervisor, safety-monitor, kernel, apps).
- Allow long term maintenance (up to e.g. 15 yrs).
- Support a multiplicity of architectures
The chosen solution is based on a "supervisor" software layer that detects and prevents undesirable behaviours of the Linux kernel. This supervision over the Linux kernel is enabled by a hypervisor that leverages functionalities provided by the hardware platform. This approach decouples the lifecycle of the open-source Linux kernel, which remains unaffected, from that of other software elements, including the "supervisor" software layer, the hypervisor, userland libraries, and the application software itself, all of which must comply with applicable safety standards.
The concept does make use of virtual address spaces with its settings and tables for all the translations which are often called „translation regimes“. And while the OS controls the set of translations from virtual memory to what the OS sees as physical memory, a hypervisor can control another set of translations. Those map the addresses with its settings as seen by the OS to the real physical address space. With this approach, a so called intermediate physical address space is placed in between the virtual one and the physical one. This layer is under full control of a HV. A much more complete description on this is given at Arm’s AArch64 memory management Guide.
The Linux safety supervisor together with the hypervisor maintain the intermediate physical representation and control access (read/ write/ execute). Two translation regimes are necessary: one for safety-related and one for non safety-related software. Whenever the OS wants to access memory, it sees the non safety-related intermediate physical representation and whenever a safety-related application is running, it sees the safety-related intermediate physical representation. Access is granted whenever needed and appropriate according to the rules defined. Fig.1 depicts the main idea.
As virtualization does not only focus on memory but on HW in general, the aforementioned description is also applied to CPU registers and other HW used.
Customers can realize various architectures when using the embedded Linux system with the Linux safety supervisor. And it can easily be applied while sticking to established system call interfaces and development paradigms. A safety manual and test-cases give a lot of guidance for developers, integrators and safety managers. The main architectural alternatives described in the following paragraphs are:
- single virtual domain/ virtual machine setup
- dual virtual domain/ virtual machine setup
- dual virtual domain/ virtual machine setup with support for mixed-criticality
- extending virtual domain running on low level of abstraction
Single virtual domain/ virtual machine setup
A very basic architecture as shown in Fig. 2 runs one virtual machine which is supervised by Linux safety supervisor, which is also hosted in a hypervisor environment. A customer can integrate the safety-related application(s) in the Linux userland. The Linux safety supervisor takes care on aspects like:
- correct startup with initialization
- correct memory mappings and modifications
- correct execution state
- correct switchover in between the the different translation regimes for the non safety-related and safety-related intermediate physical representation
From the perspective of an application developer, there is no specific difference to any other embedded Linux as long as the implementation is commensurate to the prescriptions given in the safety manual.
Dual virtual domain/ virtual machine setup
The pattern shown in Fig. 3 makes use of a dual virtual domain setup. One hosts an embedded Linux for non safety-related applictions and the other a Linux that is supervised and used to run safety-related applications. The supervision is the same than the one described in context of a single virtual domain setup. This pattern helps to reduce and limit the scope of dependability engineering to where it is needed. Once all applications would run in the same userland without any further separation of concern, all those would need to ensure the same level of dependability.
And this level of dependability is the highest one required for the functionalities performed by safety-related applications. This includes all dependencies, especially al the libraries in the userland.
This approach comes with advantages like
- a clear separation of concern
- a rich set of embedded Linux features within the virtual machine that runs the non safety-related applications
- the flexibility to use libraries that are commensurate for being used in non safety-related applications witih all non safety-related applications.
Customers can use the virtual machine with the safety-related applications to run a supervisor on the output of the other virtual machine or to run a less complex algorithm that has been identified to be safety-related.
Dual virtual domain/ virtual machine setup with support for mixed-criticality
While „separation of concern“ is a good and established design principle with regard to e.g. architectural clarity or level of coupling, performance needs may lead to the wish to host non safety-related as well as safety-related applications on the same kernel within the same virtual machine. This is supported with the mixed-criticality capability used in this pattern. The Linux safety supervisor operates on the tevel safety-related applications as it does in all other architectural patterns. But given the ability to differ in between safety-related and non safety-related ones, the latter can run freely in the sense, that the specifications given in the safety manual do only apply.
The hypervisor and the Linux safety supervisor take care on the separation of concern in this setup while offering a much closer coupling of the two domains compared to the architectural setup that does use two separate virtual machines w/o support for mixed criticality.
Extending virtual domain running on low level of abstraction
Any architecural setup as mentioned before can be extended with an additional virtual domain running at low level of abstraction (in the form of L4Re applications running within the EB corbos Hypervisor respectively. The specific hypervisor with its execution environment is based on the L4re runtime environment. This runtime environment for non-safety related applications is open-source.).
The environment comes with a C- and C++ environment that can be used by customers to implement functionality, that is closely related to HW. Functionality is supported also and separation of concern is given by the hypervisor.
All this runs in parallel to the safety-related execution environment that does use embedded Linux.
How does all this address the misconceptions mentioned before?
The solution discussed herein properly balances a mutliplicity of non-functional requirements. It is an efficient Operating system solution based on embedded Linux, that is compliant to ASIL B/ SIL 2 safety requirements. The supervision is based on core features given in hardware and available to a hypervisor. With this solution, lifecycles can be decoupled between the Linux kernel, the hypervisor and the supervisor. Reflecting on the misconceptions mentioned in the beginning, the following can be stated:
open source processes and open source software can easily be used in context of safety-related systems once state of the art HW-features are effectively orchestrated
Linux as monolithic SW-blob does not need to be assessed to have a lifecycle commensurate to the prescriptions in the specific standard on functional safety. With the help of virtualisation and the Linux safety supervisor, separation of concern can be applied
Virtualization adds another layer of abstraction which is one of the core concepts described before. An intermediate physical representation of the Hardware is key for the Linux safety supervisor. And while this adds complexity, it can easily be abstracted within the hypervisor, the Linux safety supervisor. Supplemental modularization helps to separate the life-cycle for different building blocks such as Linux kernel, applications, hypervisor and Linux safety supervisor. This is key, to implement an effective and efficient maintenance process and CVE security monitoring.
Architectural patterns support the separation of concern also in the userland. And this helps to minimize the scope of safety-related applications for which evidence needs to be provided that their lifecycle is commensurate to the prescriptions of the specific standard on functional safety (e.g. EN 61508) and that the features are correctly implemented and integrated.
Additional virtual domains running at a low level of abstraction help to enable customers to gain from the richness of Linux while implementing low-level software and features that require extra fast startup-times directly on the „not so rich“ interface that comes with the hypervisor’s runtime environment.
Further information
Please, contact us, if you want to get more details or want to know whether this approach will work for your specific development project:
Your contact partner
Our emlix safety experts
Phone +49 551 304460
solutions@emlix.com