loading page

Convolutional Neural Network based Visual Servoing for Eye-to-Hand Manipulator
  • Fuyuki Tokuda ,
  • Shogo Arai ,
  • Kazuhiro Kosuge
Fuyuki Tokuda
Tohoku University, Tohoku University

Corresponding Author:[email protected]

Author Profile
Shogo Arai
Author Profile
Kazuhiro Kosuge
Author Profile


We propose a CNN based visual servoing scheme for precise positioning of an eye-to-hand manipulator in which the control input of a robot is calculated directly from images by a neural network. In this paper, we propose Difference of Encoded Features driven Interaction matrix Network (DEFINet), a new convolutional neural network (CNN), for eye-to-hand visual servoing. DEFINet estimates a relative pose between desired and current end-effector from desired and current images captured by an eye-to-hand camera. DEFINet includes two branches of the same CNN that share weights and encode target and current images, which is inspired by the architecture of Siamese network. Regression of the relative pose from the difference of the encoded target and current image features leads to a high positioning accuracy of visual servoing using DEFINet. The training dataset is generated from sample data collected by operating a manipulator randomly in task space. The performance of the proposed visual servoing is evaluated through numerical simulation and experiments using a six-DOF industrial manipulator in a real environment. Both simulation and experimental results show the effectiveness of the proposed method.
*The complete version of this preprint paper is published in IEEE Access. Please refer to “https://ieeexplore.ieee.org/abstract/document/9464907”.
2021Published in IEEE Access volume 9 on pages 91820-91835. 10.1109/ACCESS.2021.3091737