There are two primary benefits of our technology that we should make clear right away. The first is speed of operation, in that our system navigates a crowd of people without having to halt constantly. The second is acceptance. Because it mimics human movement, a robot will be more readily accepted by those around it. The use of simulated training environments to accelerate development and reduce cost are also key features of the project.
Collision avoidance for autonomous vehicles
In this article, we want to explore both the development of the algorithm and our key challenge – how to avoid collisions with human beings not noted for being entirely predictable. The problem has been around a long time in robotics. In its simplest case, collision avoidance is about avoiding static structures like walls and doors. Robots have been successfully doing that for many years – and in the main the systems work just fine.
The real difficulties arise when robots need to enter an interactive, dynamic situation where they are trying to avoid moving people who, in turn, want to avoid it. The other challenge is the need to deal with other common behaviours such as merging into a crowd or cutting across a group of people. And you know that feeling when you are walking towards someone and both of you try to anticipate each other’s change of direction? This is the sort of complexity and potential confusion we needed to address.
At the heart of the matter is the simple fact that when circumstances are moving dynamically you need to have some idea of what the future might look like. To confront this issue, we had to look beyond traditional approaches such as cost maps. What are cost maps? Essentially, you build a map and assign a value to the cells within in. The question then becomes how expensive it is to enter a particular part of the grid. The value is determined as a penalty cost. What this approach lacks though is that crucial predictive element. Yes, the robot thinks it can plan a route and follow it through the cost map. But because the systems can’t look into the future or take into account any velocities of people it is severely limited.
There are many classical algorithms which work extremely well at planning a path through a static map in this way, including very fast algorithms that will find and optimise the shortest path through. But they fall down in dynamic scenes. The key problem is that a robot doesn’t know it needs to avoid an obstacle until the last moment, when something has moved into its path. There is a further problem, in that if the scene becomes too dense with lots of people, the robot can’t plan a path and it just sits still – the classic robot halting problem. In our dynamic human world, we make space by gently moving forward to earn our right of way, or at least to encourage others to move out of our way.
Enabling robot navigation
So how do you enable robot navigation that embodies such behaviour and is effectively able to plan a route when one doesn’t yet exist? Our task was to mimic natural human behaviour – and this is where AI and RL came into our thinking. RL is a way of training a neural network to make decisions with the use of lots of simulated episodes of the kind of environment you want it to work in. It’s impossible to know the future of course, but the model learns over time a sense of what is likely to happen next and, therefore, what the best action to take might be. It is then able to continue moving forward to its target while avoiding people.
The training itself is very much along carrot-and-stick lines. The model gets rewarded for good behaviour, which is actually rare because it actually means reaching its ultimate goal. More often than not it gets punished for making too many motions, because effectively this means it is taking too long at the task in hand. A big punishment is inflicted if it hits something of course.
Crucially in terms of training speed and efficiency, the system generates its own episodes – each one being an unfolding scenario of particular movements. One person goes here, one person goes there, one person stops entirely, and so on. The scenario runs until the robot crashes or meets its goal. Then the system resets with another random scenario.
This doesn’t require a pre-collected and labelled data set; we can continuously run simulations which the training model generates for itself. Consequently, human input into the training is low – resource is needed to set up the situation with people walking around in a realistic manner, but once that’s done scenes can be randomised. From then on all that’s needed is plenty of server power, grinding away to process the results.
There are plenty of tools available to help create a huge number of variations to a scenario to challenge the agent with. The choice depends on the level of fidelity needed – and for this project we were able to use a simple 2D version because we were chiefly concerned with a dynamic simulation with people moving around. If it was necessary to enter more complex information such as camera data into the equation, then more advanced simulation such as a gaming engine would be required. That said, we have tested our algorithm in a Unity simulation, which we are also using to demo the technology. But for training this level of sophistication is not necessary – it would demand a lot more data and computer resource of no particular advantage. The main point to highlight here is that the system was trained in a simple, 2D simulation – but works in the real world!
Naturally adaptive approach
Our approach successively delivered a level of adaptability far in excess of what cost map planning is capable of. Traditionally you plan your path, follow it, and if something gets in your way you replan your path – which is not at all efficient. With our approach, the algorithm is constantly re-evaluating its situation based on the current decision and the ever-changing circumstances. This is a much more naturally adaptive approach.