Can deep learning and sensor fusion provide a more economically viable approach to vehicle autonomy? This was the question we asked our deep learning and autonomy teams while contemplating the hurdles to mass-market adoption of autonomous vehicles. It spurred a development project which resulted in a breakthrough AI system that is an economic gamechanger.

As the project technical lead, I want to take you behind the scenes of the development of the EnfuseNet system, which fuses data from extremely low-cost sensors and cameras to generate high-resolution depth data – the ideal reference point for autonomous systems. Perception of the external world underpins everything in a long processing chain that must be completed before a vehicle can proceed with a decision. Making this low cost and reliable opens the way to a broader market.

Currently, reliable perception solutions come at extremely high price points. This creates a challenge for an automotive industry striving to find an attractive business model to advance their levels of autonomy beyond the luxury vehicle segment and into mid-range market segments. The lack of a financially viable way to create an accurate depth picture is one of the biggest factors holding back vehicle autonomy development in these automotive mass markets.

Meanwhile, another major challenge is the barrier to innovation caused by the reliance on capturing the massive amounts of real-world data needed for training autonomous systems. This makes it hard for new participants to enter the race to autonomy, instead favouring a few well-funded early leaders.

Rewriting the economics of vehicle autonomy

When it comes to creating an accurate depth picture, the resolution offered by radar is too low to rely on by itself, while traditional spinning LiDAR is expensive, and the moving parts raise concerns about wear and tear and their physical durability. Time of Flight cameras don’t work outside in strong sunlight. They show a range of artefacts in their output – from depth aliasing to phantom depth from reflections, to no depth at all from particularly absorbent surfaces. Meanwhile, movements in academia to develop neural networks capable of ‘monocular depth’ are immature and unreliable. This led us to investigating the use of deep learning to intelligently fuse data from low-cost sensors and cameras to generate high-resolution depth data.

We help our customers achieve the most complex breakthrough innovations at rapid speed.

Deep learning has enabled advances in many areas significant to autonomous vehicles but does also come with well-established challenges. Some of these, such as the need for collecting, labelling, and curating large datasets for training, can be reduced with techniques like generative adversarial networks (GANs) and Active Learning. The challenge that interests us in this context though is traceability.

The industry must develop novel approaches to tackle the challenge of traceability in autonomous vehicle decision making, as they are safety-critical components. The ‘black box’ nature of some neural networks has historically made it difficult to understand or explain how decisions are made. Without innovative approaches to address this, manufacturers and technology suppliers are unable to diagnose or fix the issue without completely replacing the network. This concern is sometimes considered an obstacle for wide-scale acceptance of neural networks in disciplined automotive product lines.

Another part of traceability is understanding and recognising when errors arise and acting appropriately. We therefore must ensure networks don’t make overconfident predictions. Historically, this has happened in situations which they weren’t trained for and where they lacked appropriate experience. As you can imagine, a missed detection of an obstacle in the road, or incorrect depth information, could lead to a vehicle taking inappropriate action.

Our team of deep learning experts developed EnfuseNet to overcome these challenges. It’s a novel neural network that conducts low-level sensor fusion on the outputs from low-coast depth sensors and high-resolution RGB cameras. The aim is to outperform high-end LiDAR at a drastically lower price point. This is part of a strategy of workflow modularisation that is breaking networks down into smaller components. It makes individual elements easier to build and validate – and subsequently easier to update and replace if necessary. 

Our team was committed to ensure the network architecture was designed to be agnostic to the input depth source, and indeed could be used to fuse depth data from different sensors. Considering the highest cost-benefit factors for the market sector, we were sure our AI model would be best paired with a typical and cost-effective synthesised aperture array radar. It is intended to improve the resolution and correct for obvious sensor error modes such as depth of reflections.   

Our approach to the overconfidence issue was to train the network to understand its limitations with a special loss function. In addition to predicting a depth, it predicts a measure of confidence in its depth prediction. We effectively allow the network to declare when it realises it doesn’t have enough information to predict the depth of an object accurately and allow processes downstream to make suitably informed decisions. For example, when the network predicts the depth of a pedestrian without any direct input depth, it flags this with higher uncertainty so appropriate caution can be taken by a secondary system running in parallel.

A virtual learning environment

It’s becoming clear that virtual learning environments are essential to reduce cost and speed up the development of robust, safety-critical mobility technologies. This understanding was key to the development of EnfuseNet, which was created within our own modular and flexible AV simulation environment. This was put in place to help our clients evaluate and validate advanced sensors and algorithms, including next-generation Radar and LiDAR.

Training from simulated environments enables faster and cheaper development of the model. It also means that the complexity of sensor fusion calibration, real-world interference, noise and other effects can be dealt with at the early stages of model development. Generating training data using our AV simulator allowed us to create the type of data we needed, in the quantities we wanted, at relatively low cost. We estimate that generating the training data in our AV simulator saved over 80% of the costs of conducting this in a real-world environment.

This also allowed us to leverage multiple-objective training, whereby we simultaneously trained the network to segment the image by object type, as object labels come cheaply and more accurately from synthetic data, compared to human image labelling. Multiple-objective training is known to help improve learning on the individual sub-tasks as it helps the network learn higher order concepts and inter-relations between them and the data.

We have progressed to introducing the network to real-world data – which has delivered impressive initial results. To rapidly accelerate network development, we trained on synthetic data as a starting point so that training on the real-world data could be viewed as a finishing school to refine the network. To have started with real-world data, with sensor set up, lack of labelling, noise and incomplete truth data, may have hindered training the network from scratch.

Now your turn…

Please visit www.enfusenet.com to explore the system and experience the superior depth outputs. They are set to help vehicle OEMs and mobility technology providers save thousands of dollars per unit, creating high-resolution depth systems that bring the potential of vehicle autonomy to the mass market.

We have decades of experience in helping our clients achieve the most complex breakthrough innovations at rapid speed – channelling deep expertise in sensing, wireless connectivity, AI and edge computing. We would be delighted to discuss how our expertise can help you unlock the huge potential of autonomy in your business. Meanwhile, you can discover more about our work in mobility technologies here.

Author
Douglas O’Rourke
Principal Scientist

Douglas is a Principle Scientist in the Algorithms and Analytics group. He has a PhD in Astrophysics (computational modelling of galaxy evolution and the spectra of dusty galaxies), and a MSci in Physics, both from the University of Cambridge. He has worked on projects across a wide range of sectors including agritech, industrial, automotive, sports and fitness, medical devices, and security.

Connect on LinkedIn