Publication: Playing Minecraft with Behavioural Cloning
Autors: A. Kanervisto, J. Karttunen, V. Hautamäki
Last year, I got an incredible opportunity to join a team at University of Eastern Finland and compete in the MineRL competition. The competition, which was sponsored by Microsoft, aimed to push the boundaries of state-of-the-art reinforcement learning and sample efficient learning.
The competition setup was the following: create a deep learning agent that learns to play a very popular video game, Minecraft, by mining a diamond in the game. Contestants were provided a dataset of human gameplays, which could be used for training. The training in the game itself was limited to 8 million training steps, which is a very short time compared to usual reinforcement learning algorithms, which may require hundreds of millions of training samples.
While the “mine a diamond” – task may sound simple, it is anything but that. It requires multiple sub-steps to complete:
- starting from punching a tree to get a wooden log
- creating planks from it to create tools
- finding more materials to create better tools
and so on, to finally be able to mine the diamond. And finding it is another thing. For an experienced player, this whole process could take up to 5 – 15 minutes.
To learn, the deep learning agent gets only images from the game, like any regular human player would. It can take different actions in the game, like move forward, turn camera left or craft a wooden pickaxe (only if the required materials are in inventory). The agent gets rewards based on the progression it makes towards completing the task. It doesn’t know any prior knowledge from the game, so it first needs to learn how to interpret the image pixel values, learn to recognize different objects like trees, what each action does in the environment, and so on. It is like a baby who is asked to complete a long sequential task an open world, before it has even learned to walk and see.
Our team eventually reached #5 final position in the 2nd round of the competition and we published a research paper of our results, which was kindly accepted to the NeurIPS 2019 Competition & Demonstration Track post-proceedings. You can read more about our findings from here.
I want to say a huge thank you to my team for all the efforts they put towards the competition. It was a great learning experience!
At Karelics, our main focus is on robotics, so what all of this has to do with robots? Essentially, the setup is not too different in video game that it would be in real world robotics. The agent takes actions in environment, which returns images as states, and rewards. We could similarly define a task “find a hammer” for the robot, which it could learn to complete by using deep reinforcement learning.
But here sample efficiency is the keyword. We cannot train RL algorithms by training the robot hundreds of hours in real world, which tries what happens when different actions are taken. In many situations, one wrong action can break the robot. And there is no way we could let a robot freely explore and try to learn in an environment where there are people. So that’s why learning from expert human samples and self-exploration in a sample efficient way is important. And this is what the whole competition was all about.
Video of the overall best AI agent in competition from another team