The pathway from automation to autonomy
It’s a common fallacy that the task of perfecting navigation for robots and autonomous vehicles is relatively straightforward. Following the great strides made in industrial automation over recent years, many assume these machines are quickly becoming adept at collision avoidance and other vital manoeuvres. But here’s the thing: there is an ocean of difference between operating on a production line and on a busy street. Navigation is a tough problem to crack – which is why we’re so enthused by our development of an algorithm that has the potential to unlock new commercial applications for a wide range of industry sectors, from agriculture and construction to transport and defence.
A multidisciplinary team here at CC has applied one of the newer approaches in collision avoidance – Reinforcement Learning (RL) – to create a navigation algorithm that imitates human behaviour, anticipates movements of multiple obstacles and weighs up the pros and cons of possible actions. Not only that, but we’ve taken the technology out of the computer, tested in on physical robotic hardware, and proven its effectiveness out in the real world using real sensors and off-the-shelf components.
There are two primary benefits of our technology that we should make clear right away. The first is speed of operation, in that our system navigates a crowd of people without having to halt constantly. The second is acceptance. Because it mimics human movement, a robot will be more readily accepted by those around it. The use of simulated training environments to accelerate development and reduce cost are also key features of the project.
It’s not a localisation, tracking or perception algorithm – simultaneous location and mapping is a different matter entirely. But when our navigation model is given a map with locations and velocities of people (as well as its own location and velocity) it will decide on the best action to get to its goal. Such a capability will be crucial piece in the automation to autonomy jigsaw for many businesses. As our recent whitepaper highlights, sectors such as smart infrastructure, agritech and retail and logistics have much to gain from safe robotic navigation in busy environments.
Avoiding collisions with humans
In this article, we want to explore both the development of the algorithm and our key challenge – how to avoid collisions with human beings not noted for being entirely predictable. The problem has been around a long time in robotics. In its simplest case, collision avoidance is about avoiding static structures like walls and doors. Robots have been successfully doing that for many years – and in the main the systems work just fine.
The real difficulties arise when robots need to enter an interactive, dynamic situation where they are trying to avoid moving people who, in turn, want to avoid it. The other challenge is the need to deal with other common behaviours such as merging into a crowd or cutting across a group of people. And you know that feeling when you are walking towards someone and both of you try to anticipate each other’s change of direction? This is the sort of complexity and potential confusion we needed to address.
At the heart of the matter is the simple fact that when circumstances are moving dynamically you need to have some idea of what the future might look like. To confront this issue, we had to look beyond traditional approaches such as cost maps. What are cost maps? Essentially, you build a map and assign a value to the cells within in. The question then becomes how expensive it is to enter a particular part of the grid. The value is determined as a penalty cost. What this approach lacks though is that crucial predictive element. Yes, the robot thinks it can plan a route and follow it through the cost map. But because the systems can’t look into the future or take into account any velocities of people it is severely limited.
There are many classical algorithms which work extremely well at planning a path through a static map in this way, including very fast algorithms that will find and optimise the shortest path through. But they fall down in dynamic scenes. The key problem is that a robot doesn’t know it needs to avoid an obstacle until the last moment, when something has moved into its path. There is a further problem, in that if the scene becomes too dense with lots of people, the robot can’t plan a path and it just sits still – the classic robot halting problem. In our dynamic human world, we make space by gently moving forward to earn our right of way, or at least to encourage others to move out of our way.
Neural network training
So how do you enable robot navigation that embodies such behaviour and is effectively able to plan a route when one doesn’t yet exist? Our task was to mimic natural human behaviour – and this is where AI and RL came into our thinking. RL is a way of training a neural network to make decisions with the use of lots of simulated episodes of the kind of environment you want it to work in. It’s impossible to know the future of course, but the model learns over time a sense of what is likely to happen next and, therefore, what the best action to take might be. It is then able to continue moving forward to its target while avoiding people.
The training itself is very much along carrot-and-stick lines. The model gets rewarded for good behaviour, which is actually rare because it actually means reaching its ultimate goal. More often than not it gets punished for making too many motions, because effectively this means it is taking too long at the task in hand. A big punishment is inflicted if it hits something of course.
Crucially in terms of training speed and efficiency, the system generates its own episodes – each one being an unfolding scenario of particular movements. One person goes here, one person goes there, one person stops entirely, and so on. The scenario runs until the robot crashes or meets its goal. Then the system resets with another random scenario.
This doesn’t require a pre-collected and labelled data set; we can continuously run simulations which the training model generates for itself. Consequently, human input into the training is low – resource is needed to set up the situation with people walking around in a realistic manner, but once that’s done scenes can be randomised. From then on all that’s needed is plenty of server power, grinding away to process the results.
There are plenty of tools available to help create a huge number of variations to a scenario to challenge the agent with. The choice depends on the level of fidelity needed – and for this project we were able to use a simple 2D version because we were chiefly concerned with a dynamic simulation with people moving around. If it was necessary to enter more complex information such as camera data into the equation, then more advanced simulation such as a gaming engine would be required. That said, we have tested our algorithm in a Unity simulation, which we are also using to demo the technology. But for training this level of sophistication is not necessary – it would demand a lot more data and computer resource of no particular advantage. The main point to highlight here is that the system was trained in a simple, 2D simulation – but works in the real world!
Naturally adaptive approach
Our approach successively delivered a level of adaptability far in excess of what cost map planning is capable of. Traditionally you plan your path, follow it, and if something gets in your way you replan your path – which is not at all efficient. With our approach, the algorithm is constantly re-evaluating its situation based on the current decision and the ever-changing circumstances. This is a much more naturally adaptive approach.
Want to learn more?
Another key advantage is scalability of the learning process. As training develops, new and different behaviours can be introduced incrementally. Cost mapping works as it works; it is always going to do the same thing. RL allows for much richer learning. The system is taught to work with people, then elderly people moving at different speeds, then bicycles and so on. Our complex and powerful algorithm has taken us well beyond traditional pathfinding and collision avoidance territory.
As we said at the beginning, this is exciting stuff with the potential to help enable the commercialisation of future products. The simulated training process means that products can be brought to market quicker – saving time and costs for businesses. And let’s face it, machines are the future… not just robots per se but any type of automated machine operating in a dynamic, interactive environment. So please do drop us a line if you have any questions or would like to discuss the topic in more detail. It’ll be great to continue the conversation.