Model Predictive Control and Forward Dynamics Learning

I developed a Model Predictive Controller (MPC) and integrated it with a learned dynamic model

Problem Statement

In this previous project, I've developed a learned forward dynamic model, which can predict the next_state given the current state-action pair. In other words, how will the robotic arm move given the torque applied. Now, the new problem is: given a desired end-effector position in the workspace, how to get the actions (torque) needed to reach the position?
Solving this problem will also help with the following problem: Previously, the training data of forward dynamic model learning was collected by applying a range of torques to the ground truth model. This method has a major drawback: we have NO control over the end-effector position. We ended up having data where the end-effector only covers a small portion of the workspace (shown below). We could still do random sampling, but how can we make sure we're covering the full range of goal positions? We need a controller that can help with getting state-action pair that covers the full range in the workspace efficiently.

My Approach

MPC implementation

I developed an MPC controller can effectively manage the actions of a robot arm to reach a desired goal while minimizing the cost, which is a combination of distance to the goal and energy consumption. The controller predicts the robot arm's future states, computes a cost, and iteratively adjusts the actions to minimize this cost. This ensures that the robot arm moves efficiently towards its target.

MPC Algorithm (Action computation)

Key Components of MPC:

Collect data using MPC and develop forward kinematics learning model

I basically repeated the forward dynamics learning project again, but this time, instead of sampling random actions, I sampled goals (covering the bottom semi-circle) and query the MPC for the best actions to reach these goals.

Note that here MPC uses true dynamics for action and trajectory computation. These actions (torques) are more valuable training data as the end position covers the whole area. The data quality will be impacted by MPC quality, a good model is based on good data!

Result

The evaluation was done on the performance of my MPC working together with the learned model.

I tested the arm on 16 random test goals, the MPC controller applied action for 10 timesteps and each test runs the arm for 2.5 second. In 15 out of the 16 tests, the distance to goal < 0.2 and end-effector velocity < 0.5.

Jony Chen