Transformers and explainable sensor-fusion are the key technologies behind the Mapless Autonomy Platform

When designing the algorithms behind the Mapless Autonomy Platform, we combined recent breakthroughs in artificial intelligence - namely transformer neural network architectures -  with a geometrically-interpretable and explainable sensor-fusion approach. This combination is especially promising for applications within safety-critical environments, such as operating design domains where humans and autonomous vehicles are going to work alongside each other. Certification for these environments requires thorough quality, safety and security processes throughout the entire product lifecycle with respect to state-of-the-art norms and regulations.

Under the hood:

The driveblocks Mapless Autonomy Platform approaches the detection and classification problem with a set of transformer neural networks running in parallel. Each sensor data stream is evaluated with a feature generating backbone network and several detection heads, responsible for different tasks such as line or object detection. This multi-task approach allows for efficient implementations on embedded systems, as the parameter and operation heavy parts of the network are located mainly in the backbone of the network and are shared amongst the detection tasks.

One of the key architectural choices in the neural networks is to leverage attention-mechanisms within transformer architectures. These allow the network to learn global relations for each output object and achieve performance improvements by leveraging more context than previously used architectures. The same technology has been an enabler for the step increase in large language models, such as Chat-GPT or LLaMa. Our focus is to make these approaches automotive-grade. By configuring them to work on embedded ECUs and cooperating closely with the certification bodies such as the TÜV Süd, we ensure that our technology can meet the requirements for safe AI applications.

The detections of each individual sensor data stream are made available to the sensor-fusion algorithm. It leverages results from probabilistic theory to continuously update a full three-dimensional representation of the environment and the objects around the autonomous vehicle. One of the key advantages of this approach is that it completely removes the need to apply error-prone inverse perspective mapping techniques to transform detections from a camera image into a birds-eye-view or 3D representation. In addition, it is consistent-via-construction and therefore less prone to creation of ghost-objects through miscalibrated sensors or ambiguities through occlusions.

In addition, the sensor-fusion technology behind the Mapless Autonomy Platform is modular, extendable and can be combined with information from classical perception algorithms. While the first two properties ensure that it can be adopted to various customer vehicle platforms with minimal effort, the integration with classical perception algorithms allows to implement a different type of detection streams to ensure the perception stack can handle objects and data patterns which it hasn’t seen during training of the deep-learning based detection pipelines. Instead of relying on pattern recognition, they apply explainable algorithms to ensure the overall system can deal with the so-called "unknown unknowns" (very rare objects which haven't been seen by test vehicles during testing and development) appropriately. This combination of approaches allows to meet the requirements for safety-critical applications.

Conclusion:

In summary, the advantages of our approach are:

·       The decomposition of the deep-learning tasks into sensor-individual processing pipelines decreases the input dimensions of the networks significantly with respect to approaches which build on a single neural network for the full perception task. This leads to significantly decreased data requirements for training and validation, which is a key enabler for cost-efficient and certifiable autonomous driving applications.
·       Leveraging the superior performance of transformer neural networks in conjunction with deployment possibilities on automotive-grade ECUs and consideration of safety critical software and AI regulations.
·       A geometrically-interpretable sensor-fusion approach to construct a consistent environment model, including object detection, classification as well as lane structures and drivable space. The high degree of explainability of this approach is beneficial with respect to certification and allows to combine transformer neural networks with classical algorithms for “unknown unkowns detection”.
·       The sensor-fusion allows to handle various sensor positions, sensor modalities as well as postprocessing techniques in a modular and flexible way. Moving or replacing a sensor during the development phase of a new automated vehicle does require only partial re-training of neural networks.
·       The multi-pipeline approach allows scaling according to performance requirements and leads to a gradual degradation in case of an individual sensor-failure. Since the other sensor pipelines still work as expected, the system can switch into a fail-safe operating mode with decreased performance. This is one of the major advantages in contrast to pure end-to-end deep-learning approaches where the whole pipeline fails as soon as a single sensor malfunctions.