MuJoCo can be described as a great physics simulator with decent rendering. Game development tools such as Unity and Unreal Engine can be described as great rendering engines with decent physics. Beyond physics simulation and rendering, one has to implement control functionality that generates behaviors. MuJoCo users currently do this on their own often using a Python wrapper for MuJoCo Pro while game developers usually do it within the framework of their game engine.
The workflow we envision here is closer to the MuJoCo use case: the executable generated by Unity receives MuJoCo model poses over a socket and renders them, while the actual physics simulation and behavior control take place in the user's environment running MuJoCo Pro.
This is why we use the terms "game engine" and "rendering engine" interchangeably: we are aware that modern game engine do more than rendering, but here we only use their rendering capabilities as well as their sophisticated visual editing tools.MarI/O - Machine Learning for Video Games
Thus the present version of the software is not suitable for more traditional game development, however we are considering future extensions which will make it suitable, see Gaming. Below are some of the images produced by the interactive demo included in the release. To create this demo we started with a static scene from the Unity Asset store, imported a MuJoCo model with the Import script, adjusted the appearance of the MuJoCo model elements in the Unity Editor, added Unity-specific elements that only affect visualization e.
The software has the following directory structure on Windows. The MacOS and Linux releases are similar except for file extensions.
Note that the Plugins directory contains shared libraries for all operating systems. This is because Unity can generate executables for multiple targets for example we generated the Windows, MacOS and Linux demos using the Windows version of Unity.
All files available as source as listed below, with a brief description of their role. More extensive documentation is provided later. The goal of this software is to combine MuJoCo physics and Unity rendering. This is challenging because MuJoCo and Unity were not designed to work together.
In particular, MuJoCo has its own renderer which is aimed at scientific visualization with fixed-function OpenGL, and lacks many of the features available it the Unity renderer. Similarly, Unity relies on a previous-generation physics simulator and lacks many of the simulation-related capabilities of MuJoCo not so much in terms of features but in terms of simulation speed, accuracy and stability.
Instead of trying to reconcile this partly-overlapping functionality, we made a design choice which is somewhat drastic but nevertheless we believe it is the right choice.
Here MuJoCo is used exclusively for physics and its rendering is altogether ignored. Unity is used exclusively for rendering and its physics are altogether ignored. The MuJoCo and Unity representations of the same model are connected at the level of geometry. These are imported as meshes in Unity or as geometric primitives when possible. Unity sees them as static meshes which will be animated by a user script, and which do not collide, send events or otherwise interact with other Unity elements.
Chapter 2: Computation
Thus the corresponding GameObjects do not have Collider or Rigidbody components attached to them. At runtime the MuJoCo Plugin provides the positions, orientations and scales of all renderable objects, which determine the corresponding GameObject transforms in Unity.
The scripting capabilities of Unity are not used to modify these properties doing so will introduce a discrepancy between the MuJoCo simulation and the Unity rendering. With these preliminaries in mind, the workflow enabled by the software is as follows. The release includes two demos: remote and simulate.
The remote demo renders the standard MuJoCo humanoid. When you run it the label "Waiting" appears in the top-left corner, meaning that it is waiting for socket connection. Now if you run the script testremote. In this case the image resolution is x This is the offscreen rendering resolution as specified in the MuJoCo model, and is independent of the Unity window size.
On our Windows machine with Quadro P video card roughly equivalent to GTX we get frames per second over the socket at this resolution. Unity itself is able to render at around FPS, and the difference is due to sending data around we are using localhost so the communication should be optimized by the OS, but still, this is a lot of data.
Smaller images say x can be sent at speeds close to the GPU rendering limit.Gym is a toolkit for developing and comparing reinforcement learning algorithms.
It makes no assumptions about the structure of your agent, and is compatible with any numerical computation library, such as TensorFlow or Theano. The gym library is a collection of test problems — environments — that you can use to work out your reinforcement learning algorithms. These environments have a shared interface, allowing you to write general algorithms.
Simply install gym using pip :. If you prefer, you can also clone the gym Git repository directly. Download and install using:. You can later run pip install -e. This requires installing several more involved dependencies, including cmake and a recent pip version. This will run an instance of the CartPole-v0 environment for timesteps, rendering the environment at each step. You should see a window pop up rendering the classic cart-pole problem:.
More on that later. Environments all descend from the Env base class. Let us know if a dependency gives you trouble without a clear instruction to fix it. Installing a missing dependency is generally pretty simple. In fact, step returns four values. These are:. Each timestep, the agent chooses an actionand the environment returns an observation and a reward. The process gets started by calling resetwhich returns an initial observation. So a more proper way of writing the previous code would be to respect the done flag:.
This should give a video and output like the following. You should be able to see where the resets happen.
But what actually are those actions? These attributes are of type Spaceand they describe the format of valid actions and observations:. The Discrete space allows a fixed range of non-negative numbers, so in this case valid action s are either 0 or 1.Mujoco is an awesome simulation tool. Mujoco provides super fast dynamics simulation with a focus on contact dynamics. It turns out this is relatively easy in Mujoco. There are two parts to defining a model: 1 The STL files, which are 3D models of the robot components, and 2 the XML file, which specifies the kinematic and dynamic relationships in the model.
For STL manipulation, we use the program SketchUp because it is freely available with both offline and online versions. Shout out to my colleague Pawel Jaworski who worked through this process and wrote up the first of this document! Before opening your STL, in the open file window select your file and click on the Options button next to Import. Select the units that the model was defined in. If you import your object and cannot see it, it is likely that the units selected during the import were incorrect and the object is simply too small to see.
Mujoco uses the units specified in your STL models. ABR Control uses metres, and things generally are easiest when everyone is using the same units.
If you are like me then auto-snapping is often a huge pet peeve. It can be helpful to set the default view to xray mode so you can see inside the model when manipulating the components. A common starting point for modeling is that you have a 3D model of the full robot. When this is the case, the first step in building a Mujoco model is to generate separate STL files for each of the components of the robot that you want to be able to move independently. For each of these component STL files, we want the point where it connects to the joint to be at the origin 0, 0, 0.
If your 3D model is already broken up into each dynamic component i. Make sure you have the model unlocked. If the outline is red, it is locked. To unlock it, right click and choose Unlock. Repeat this until you have exported every part as its own STL. For each component, identify where it will connect to the previous component, i.Emo Todorov Roboti Publishing, Seattle. This is an online book about the MuJoCo physics simulator.
It contains all the information needed to use MuJoCo effectively. It includes introductory material, technical explanation of the underlying physics model and associated algorithms, specification of MJCF which is MuJoCo's XML modeling format, user guides and reference manuals. Additional information, answers to user questions as well as a collection of models can be found on the MuJoCo Forum. It is a physics engine aiming to facilitate research and development in robotics, biomechanics, graphics and animation, machine learning, and other areas where fast and accurate simulation of complex dynamical systems is needed.
MuJoCo is a general-purpose simulator, yet knowing about its specific origin can help the reader understand it better. Development started in It was motivated by the realization that existing tools were inadequate for our research on optimal control, state estimation and system identification at the Movement Control LaboratoryUniversity of Washington.
MuJoCo quickly became a cornerstone in our efforts to build more intelligent controllers for both simulated and physical systems, and has now fueled a long list of research projects in the user community. These projects typically use physics simulation in the inner loop of numerical optimization - which imposes stringent accuracy and stability requirements, because optimizers automatically search for loopholes in the physics.
At the same time such applications need access to derivatives or samples of the dynamics, which in turn calls for faster than real-time simulation. Thus our design requirements exceeded the needs of traditional simulation, prompting us to develop new algorithms and fine-tune the implementation aggressively.
Our efforts paid off. For typical robotic systems in multiple contacts with their environment, MuJoCo outperforms other physics engines in terms of both speed and accuracy, as shown on the Benchmark page.
While MuJoCo provides the infrastructure for the optimization-related applications mentioned above, these applications are based on research code, and MuJoCo itself does not yet provide a commercial-grade optimizer except for convex optimization over constraint forces which is done internally at each time step. We currently developing a new product called Optico which will add such an optimizer. MuJoCo has layered design combining user convenience with computational efficiency.
The runtime simulation module is written in C and is tuned to maximize performance. It operates on low-level data structures, which are generated offline by the built-in XML parser and model compiler. The user specifies models in the native MJCF format - which is an XML file format designed to be as human readable and editable as possible. URDF model files can also be loaded. Note that the add-ons are not yet updated to the 2.
There are several entities called "model" in MuJoCo. The software can then create multiple instances of the same model in different media file or memory and on different levels of description high or low. This is why we have two levels of modeling. The high level exists for user convenience: its sole purpose is to be compiled into a low level model on which computations can be performed. The resulting mjModel can be loaded and saved into a binary file MJBhowever it cannot be decompiled, thus models should always be maintained as XML files.
There is a plan to develop a C wrapper around it, but for the time being the parser and compiler are always invoked together, and models can only be created in XML. The following diagram shows the different paths to obtaining an mjModel again, the second bullet point is not yet available :. It defines a plane fixed to the world, a light to better illuminate objects and cast shadows even though there is a built-in headlight which is often sufficientand a floating box with 6 DOFs this is what the "free" joint does.
The built-in OpenGL visualizer renders this model as: If this model is simulated, the box will fall on the ground. Basic simulation code for the passive dynamics, without rendering, is given below. This example of course is just a passive dynamical system. Things get more interesting when the user specifies controls or applies forces and starts interacting with the system.
Next we provide a more elaborate example illustrating several features of MJCF. This section provides brief descriptions of all elements that can be included in a MuJoCo model.Previously, I was a Research Scientist leading the learning team at Latent Logic where our team focused on Deep Reinforcement Learning and Learning from Demonstration techniques to generate human-like behaviour that can be applied to data-driven simulators, game engines and robotics.
My main research focused on investigating the underlying algorithms employed by the human brain for object representation and inference. I stayed for a Postdoc in the lab to continue my research on investigating the dynamics of uncertainty in sensorimotor perception. I have also worked on several projects building machine learning solutions for a variety of problems as part of a technology consultancy start-up I co-founded.
Details Code Project. Details PDF.
Humanoid Imitation Learning from Diverse Sources
Details PDF Book chapter. My blog post for our recent paper, which presents a novel method for learning from demonstration in the wild that can leverage abundance of freely available videos of natural behaviour. We propose ViBe, a new approach to learning models of behaviour that requires as input only unlabelled raw video data.
Our method calibrates the camera, detects relevant objects, tracks them reliably through time, and uses the resulting trajectories to learn policies via imitation. In this project, I set out to train an automatic curriculum generator using a teacher network Multi-Armed Bandit which keeps track of the progress of the student network IMAPALAand proposes new tasks as a function of how well the student is learning. CraftEnv is a 2D crafting environment that supports a fully flexible setup of hierarchical tasks, with sparse rewards, in a fully procedural setting.
Applying end-to-end learning to solve pixel-driven control where learning is accomplished using Asynchronous Advantage Actor-Critic A3C method with sparse rewards. Find the session notes here! Toggle navigation Feryal Behbahani. Education Ph. Selected Publications. Learning from Demonstration in the Wild Learning from demonstration LfD is useful in settings where hand-coding behaviour or a reward function is impractical.
It has succeeded in a wide range of problems but typically relies on artificially generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviour that was occurring anyway using sensors that were already deployed for another purpose, e. We propose video to behaviour ViBea new approach to learning models of road user behaviour that requires as input only unlabelled raw video data of a traffic scene collected from a single, monocular, uncalibrated camera with ordinary resolution.
Our approach calibrates the camera, detects relevant objects, tracks them through time, and uses the resulting trajectories to perform LfD, yielding models of naturalistic behaviour. We apply ViBe to raw videos of a traffic intersection and show that it can learn purely from videos, without additional expert knowledge. Reverse-Engineering Human Visual and Haptic Perceptual Algorithms Intelligent behaviour is fundamentally tied to the ability of the brain to make decisions in uncertain and dynamic environments.
Chapter 3: Modeling
In neuroscience, the generative framework of Bayesian Decision Theory has emerged as a principled way to predict how the brain acts in the face of uncertainty. In the first part of my thesis, I study the question of how humans learn to perform a visual object categorisation task. I present a novel experimental paradigm to assess whether people use generative Bayesian principles as a general strategy.
We found that humans indeed perform in a generative manner, but resort to approximate inference when faced with complex computations. In the second part, I consider how one would build a Bayesian ideal observer model of human haptic perception and object recognition, using MuJoCo as an environment. Our model can, using only noisy contact point information on the surface of the hand and noisy hand proprioception, simultaneously infer the shape of simple objects together with an estimation of the true hand pose in space.This chapter is the MJCF modeling guide.
The reference manual is available in the XML Reference chapter. MJCF models can represent complex dynamical systems with a wide range of features and model elements. Accessing all these features requires a rich modeling format, which can become cumbersome if it is not designed with usability in mind.
Therefore we have made an effort to design MJCF as a scalable format, allowing users to start small and build more detailed models later. It enables users to rapidly create new models and experiment with them. Experimentation is further aided by numerous options which can be used to reconfigure the simulation pipeline, and by quick re-loading that makes model editing an interactive process.
One can think of MJCF as a hybrid between a modeling format and a programming language. There is a built-in compiler, which is a concept normally associated with programming languages. While MJCF does not have the power of a general-purpose programming language, a number of sophisticated compile-time computations are invoked automatically depending on how the model is designed.
Alternatively a previously saved mjModel can be loaded directly from a binary MJB file - whose format is not documented but is essentially a copy of the mjModel memory buffer. The conversion depends on the model format - which is inferred from the top-level element in the XML file, and not from the file extension. Recall that a valid XML file has a unique top-level element. Even though loading and compilation are presently combined in one step, compilation is independent of loading, meaning that the compiler works in the same way regardless of how mjCModel was created.
Both the parser and the compiler perform extensive error checking, and abort when the first error is encountered. The resulting error messages contain the row and column number in the XML file, and are self-explanatory so we do not document them here.
The parser uses a custom schema to make sure that the file structure, elements and attributes are valid. The compiler then applies many additional semantic checks. Finally, one simulation step of the compiled model is performed and any runtime errors are intercepted. The entire process of parsing and compilation is very fast - less than a second if the model does not contain large meshes or actuator lengthranges that need to be computed via simulation.
This makes it possible to design models interactively, by re-loading often and visualizing the changes. The MJB is a stand-alone file and does not refer to any other files.This chapter describes the mathematical and algorithmic foundations of MuJoCo. The overall framework is fairly standard for readers familiar with modeling and simulation in generalized or joint coordinates. Therefore we summarize that material briefly. Most of the chapter is devoted to how we handle contacts and other constraints.
This approach is based on our recent research and is unique to MuJoCo, so we take the time to motivate it and explain it in detail. Additional information can be found in the paper below, although some of the technical ideas in this chapter are new and have not been described elsewhere. Todorov Robots as well as humans interact with their environment primarily through physical contact.
Given the increasing importance of physics modeling in robotics, machine learning, animation, virtual reality, biomechanics and other fields, there is need for simulation models of contact dynamics that are both physically accurate and computationally efficient.
One application of simulation models is to assess candidate estimation and control strategies before deploying them on physical systems. Another application is to automate the design of those strategies - usually through numerical optimization that uses simulation in an inner loop. The latter application imposes an additional constraint: the objective function defined with respect to the contact dynamics should be amenable to numerical optimization.
The contact model underlying MuJoCo has benefits along these and other relevant dimensions. In the following sections we discuss its benefits, while clarifying the differences from the linear complementarity LCP family of contact models which are the de facto standard. Many of the advantages of our contact model can be traced to the fact that we drop the strict complementarity constraint at the heart of the LCP formulation.
We will call this family of models convex; see References for related work. For frictionless contacts dropping the explicit complementarity constraint makes no difference, because the Karush-Kuhn-Tucker KKT optimality conditions for the resulting convex quadratic program are equivalent to an LCP.
But for frictional contacts there are differences. If one sees convex models as approximations to LCP, the logical question to ask is how good that approximation is. However we do not see it that way. Instead, we see both LCP models and convex models as different approximations to physical reality, each with its strengths and weaknesses.
The immediate consequence of dropping strict complementarity and replacing it with a cost is that complementarity can be violated - meaning that force and velocity in the contact normal direction can be simultaneously positive, and frictional forces may not be maximally dissipative.
A related phenomenon is that the only way to initiate slip is to generate some motion in the normal direction. These effects are numerically small yet undesirable. This shortcoming however has little practical relevance, because it is premised on the assumption of hard contact. Yet all physical materials allow some deformation. This is particularly important in robotics, where the parts of the robot that come in contact with the environment are usually designed to be soft.
For soft contacts complementarity has to be violated: when there is penetration and the material is pushing the contacting bodies apart, both the normal force and velocity are positive.