A study on position control of continuum arm using MAML(Model-Agnostic Meta-Learning) for adapting changing conditions

—Non linear deformation of spring based continuum manipulators cause difﬁculty in predicting the position of the tip. When we put different tools on the tip for various purposes, the difﬁculty further increases. Model less control of the manipulator has shown great success in tip positioning of these types of manipulators. One of the major drawbacks of model less control is the requirement of a large data set and time. Hence, this paper studies the effect of the implementation of MAML(Model-Agnostic Meta-Learning) for fast adaptation of different offset conditions. The effects have been studied in the simulation environment and on the real prototype. The continuum arm used for the experimentation is a tendon based non constant curvature spring based manipulator. An average error of 0.03m has been achieved on the prototype. MAML was successful in bringing down the relative tip positioning error of the manipulator from 7.02% to 1.55% in the simulation environment. It also showed success in bringing down the relative tip positioning error from 11.06% to 4.09% on real prototype. We also studied the effectiveness of the same in trajectory following.


I. INTRODUCTION
Conventional manipulators with rigid links and joints are highly successful in industrial floors and choreographed environments. High precision and relatively easier manipulation are there strengths. However, manipulation becomes difficult in cluttered environment. It lacks the required degrees of freedom to navigate through it. Conventional manipulators are also not safe to work along side human beings due to their rigid links [1].
If we look in to the nature, we can find that snakes [2]- [4], tentacles [5]- [7], elephant trunks [8]- [11] can pass through cluttered environments to reach their target. Their bodies are so flexible that they can bend in any direction at any point of their body. This flexibility helps them navigate through any constrained environment with minimal effect on the environment. If we can mimic them properly, we can have efficient manipulators for search and rescue operations.
Most of the continuum manipulators had uniform backbone [12], [13]. The reason behind it was that the mathematical modelling of uniform backbones are relatively easy. However, they failed to mimic the elephant trunks and tentacles properly.
Alok R. Sahoo, is with Robotics and Machine intelligence lab, Department of IT, IIIT Allahabad, India(e-mail: aloksh90@ieee.org) Pavan Chakraborty is with Robotics and Machine intelligence lab, Department of IT, IIIT Allahabad, India(e-mail: pavan@iiita.ac.in).
Whenever one try to bend the tip section whole body up to the proximal sections bend in that direction. Hence, the robot fail to achieve the purpose of using soft robot. To address the problem, some tapered designs [10], [14], [15] were developed. The mathematical formulation of their kinematics are quite difficult. Most of them used constant curvature model [16], [17]. It reduced the complexity but the accuracy deteriorated when the robot has variable backbone and external force is significant [18]. Piece-wise constant curvature(PCC) was introduced to counter this problem [10], [19]. Mode shape function [20], [21], euler spiral methods [22], pythagorean hodograph methods [23] were also used. Considering the mechanics of the model, mechanics modelling [24], [25], FEM based approches [26], Cosserat rod approaches [27], [28], euler-beam mechanics based approches [29], lumped system modeling [30], [31], absolute nodal coordinate formulation approches (ANCF) have been used [32]. However, most of them are computationally expensive and relatively slow.
Hence, model less controllers were introduced. It was shown that model less control can achieve higher accuracy in positioning [33] the manipulator. Further, researchers used goal babbling technique [34], RNN based control [35], [36], reinforcement learning [37], deep reinforcement learning [38] for it. However, these models suffer from following drawbacks. They require high amount of data and long period of time to get trained. A small amount of changes in the characteristics of the manipulator can lead to failure in achieving the accuracy. The model needs to be trained from the scratch for the new conditions. Some model less algorithms [38] have shown good performance in noisy conditions or slightly changed conditions. However, the performance decreases as the magnitude of variation increases. They don't have a method to bring back the original accuracy. Satheeshbabu et al. [38] have shown the effect of increase in payload on the tip positioning. This affects the capability of the model to be used in real life. Some operations need the tool placed on the tip to be changed frequently. We need our model to adapt to the new condition quickly without any significant change in the accuracy. Meta learning has the potential to provide a solution to that. 1) What is meta learning??: Humans can generalize things after a few similar experiences. Players can adapt to changes in their instruments after a few practices. Similarly, our manipulator should be intelligent enough to adapt to changes in the tip conditions. We know deep learning based controls are highly successful. Obviously, we would not like to train our model from scratch to get similar accuracy in changed conditions.
Hence, meta learning provides a paradigm where we don't need to train a model from scratch. The model could be trained with different sets of similar tasks. It uses this experience to improve the performance in any future task. This strategy is called "learning to learn". This is similar to how humans learn from their experiences. It also improves performance for a single task with multiple episodes.
The base learner or inner layer tries to do the basic task i.e classification or regression. The meta learner or the outer layer tries to update the base learner to improve the outer objective. The outer objective could be task generalization performance or learning time [39].
2) Why MAML??: The main purpose of MAML is that it can solve a new task with a very small number of training samples [40]. It is applicable to any model which is trained with gradient descent. It can be applied to classification, regression, reinforcement learning etc. It also requires a small number of gradient steps for good generalization of that task. Therefore, it has the potential to adapt to the changes in the manipulator within a very short span of time and a very small number of training data after the changes.

A. Problem definition
For using continuum manipulators in search and rescue operations or grasping, we need a kinematic model which can position the tip of the manipulator in desired position despite of the change in conditions due to grasping of a body or change in tool on the tip. Model based controllers fail measurably to tackle the changing conditions. MAML shows promising results in adapting to the changes with a very small updated training data set. Hence, in this paper we are trying to study the efficiency of MAML to adapt to the changing conditions. We have first tested it on the kinematic model of the manipulator from the previous work of the author [41]. Then, we have tested it on the prototype.

II. FORMULATION OF LEARNING ARCHITECTURE
To model kinematics of a robot, one has to pass through two steps. First, there should be a mapping between actuator space (cable length/pneumatic force) to configuration space(κ,φ,l). This mapping will be unique for any individual robot. Hence, it is called robot specific. Then, one can use its geometry or DH parameters to find the mapping with the task space(tip position). This approch is generally followed to find the forward kinematics. Similarly, for inverse kinematics, one has to go backwards from task space to actuator space. In our approch, we are going to find the mapping directly from task space to actuator space(figure1). We followed the model by Thuruthel et. el [42] for our work.
Let us consider a function ψ, which maps the actuator space to task space.
Here, a ∈ R m , is the actuator space and p ∈ R n is the task space. We know, m < n for a redundant manipulator. Hence, direct inversion of it is not possible. To simplify it, we can linearize it for a 0 . We can use the jacobian matrix to see the transformation of actuator velocity to end effector velocity.
Here, a i and a i+1 corresponds to the current and next position of the actuator respectively. Similarly, p i and p i+1 corresponds to the current and next position of the end effector respectively. From this, we can have a position level controller. Now, we can train our MAML model to give a mapping (p i+1 , a i ) → a i+1 . However, we know for search and rescue operations we will have to deal with constrained environment. Hence, a feedback of the current position of the tip is essential. Thuruthel et. el [42] have also shown that error percentage in closed loop controller is significantly less. Therefore, we will use a closed loop model for mapping (p i+1 , a i , p i ) → a i+1 . However, our model will explore the whole task space without using the concept of generalization. We will use data for whole

A. Neural network architecture
The neural net consists of 4 hidden layers each of size 128. The input layer has inputs in the form of target tip position, current tip position and current actuator position. Hence, we have 10 nodes in the input layer. Similarly, the out put layer has 4 nodes for target actuator positions. The neural network uses sigmoid transfer function and ReLU as activation function. Adam optimizer is used as the optimizer for gradient descent. α and β (learning rates) are kept at 0.01. The MSE(mean-square error) has been used as the loss function. The implementation has been done in pytorch. It was found that, small k and high number of epochs is more efficient than high number of k and small number of epochs for the case of continuum manipulator. Hence, we have taken k=2 in our case.

A. Data collection in simulation
This simulation model is based on the mechanics modelling of the manipulator. The detailed modelling and charecterization of the manipulator has been given in the previous work of the author [41]. The mechanics model takes the compression of each section in to consideration. The bending stifness and axial stifness of each section found from experimentations have been used for the simulation. As we already know the mapping of actuator space to task space(figure2), we collected total 4000 sample points by random actuation of the actuators(motor babbling) using rand() function. We have actuated only the tip of the model with four actuators. To  ensure uniform data collection, datas were collected for a single actuation of the manipulator restricting a i+1 − a i to 10 percent. This gave us 400 sample points. Next, 3600 sample points were collected from random actuation. Figure 3 shows the kinematic controller of it.

B. Experiments and results
The main purpose of this experimentation is to see the efficiency of MAML to adapt to the changes in the characteristics of the manipulator. Hence, we did 3 tests with offsets on the tip section. We have tested them with offset of 0.02 m, 0.04 m and 0.06 m respectively. For each change, we collected data for 10 shot difference until we get back the original accuracy.
From figure 4a, we see that even 0.02m offset can produce a maximum error of 0.03m. After 50 shots, it gave slightly better accuracy than the original accuracy( figure 4d). It follows same trend for 0.02m, 0.04m and 0.06m respectively. As we keep on increasing the offset, the requirement of number of shots keeps on increasing. From the table III, it can be noted that accuracy increases as we keep on increasing number of shots for lower offsets (0.02m,0.04m). For higher offset (0.06m), increasing number of shots does not increase the accuracy after a certain accuracy is reached. From the histograms, it is evident that as number of shots increases maximum error decreases. From the simulation, it is proved that, MAML is highly successful in adopting small changes. It is partially effective for higher changes. We could achieve the accuracy with only 3 percent of the original data. Now, we need to see its efficiency in real world.

IV. EXPERIMENTAL SETUP
The manipulator is a 4 section double spring based tapered continuum manipulator. Total length of the manipulator is We have used a softer pvc skin on the tip section. This was done to make the tip section more flexible. The change in characteristics for the same has been incorporated in the simulation model.

A. Data collection from prototype
For data collection, first we discretized the actuator space with respected to the encoder positions. A minimum difference of 50000 counts between current encoder value and next encoder value was maintained. First, each actuator was given (e) (f) Fig. 5: Histogram of the errors for 0.04m offset a next actuator value of +200000 or -200000 counts. It was continued till the maximum range of the actuator. The maximum range was set to be 2000000 counts from zero position. The same procedure was followed for all 4 actuators. Then, actuator commands were given in couples following the aforesaid rules. This gave us 400 samples. Now random actuator targets were given using rand() function. A total of 4000 samples were collected from the prototype similar to simulation. Here, we have used different offsets to test the efficiency of MAML. It should be noted that placing different offsets also changes the dynamics of the manipulator. Hence, it would pose difficulty for the algorithm to adapt to the changes compared to the simulation environment.

B. Use of kinect V2 for position feedback
We used kinect v2 for track the tip of the position. A yellow color strip was placed on the tip of the manipulator. The work space and the background had no yellow color. Color thresholding was done to detect the yellow pixels. Centroid of the area was found. Corresponding XYZ co-ordinates were found using the kinect toolbox from pylibfreenect2.

A. Random point target Experimentation
We have used offsets of 0.02m, 0.04m, 0.06m for testing. We gave 40 random points from the updated task space as Fig. 6: Histogram of the errors for 0.06m offset the target. Then, we found out the average tip positioning error and histogram for each case. It should be noted that the neural network is not getting trained during testing. These experiments are being conducted to find out minimum number of shots required to reach the original accuracy. From the figure 9, it is evident that the average of the manipulator without any change of conditions is 0.0302m. The reason behind less accuracy may be due to compression of the sections. From the table IV, we see that the offset of 0.02m, 0.04m, 0.06m required 30, 40, 60 shots respectively. These are comparatively less number of shots required to get back to the default accuracy.This tells us that very high accuracy requires higher number of shots than lower accuracy. We see increase in number of shots also increase accuracy of the manipulator. For offset of 0.06m, we find the accuracy remains nearly same for 50 and 60 shots. From the histograms, it is evident   Maximum number of times we find error of smaller magnitude as number of shots increases.We could achieve the original accuracy with only 3 percent of the original data in the real prototype as well. These results are similar to the results found in the simulation.

B. Trajectory following
In this experiment, we tried to find the capability of the manipulator to a trajectory following task. Here, we took two random points. Distance between the two points was 0.2872m. The difference in x direction , y direction and z direction were 0.20m, 0.20m and 0.05m respectively. We had given 19 control points in between them. Each control point was given as input one after another. We waited for 3s after each actuation of the   figure 15a, we see that average error goes up to 0.065m and maximum error touches up to 0.13m. After 40 shots, the result improved significantly. The average error and maximum error came down to 0.035m and 0.06m respectively. The average error is higher than the result we got in random point experimentation.

VI. CONCLUSION
From the above experiments, it can be concluded that MAML can be successfully implemented to adapt to the changing conditions. MAML was effective to bring down the relative tip positioning error of the manipulator from 7.02% to 1.55% in simulation environment. It also showed success to bring down the relative tip positioning error from 11.06% to 4.09% on real prototype. We had to use only 3 percent of the original training data to get back the original accuracy.When the magnitude of disturbance is small, then MAML can bring back the accuracy efficiently. Further increase in number of shots also show better accuracy than the original accuracy. When the magnitude of the disturbance is relatively large(above 10 percent of the manipulator length), then the algorithm fails to bring back the original accuracy. However, up to a certain number of shots the accuracy will increase but after a certain number of shots, increasing number of shots wont have any positive effect in increasing the accuracy.