1、PDF外文:http:/ A Multi-modal Force/Vision Sensor Fusion in 6-DOF PoseTracking AbstractSensor based robot control allows manipulation in dynamic and uncertain environments. Vision can be used to estimate 6-DOF pose of an object by model-based poseestimation methods, but the estimate is not accura
2、te in all degrees of freedom. Force offers a complementary sensor modality allowing accurate measurements of local object shape when the tooltip is in contact with the object. As force and vision are fundamentally different sensor modalities, they cannot be fused directly.We present a method which f
3、uses force and visual measurements using positional information of the end-effector. By transforming the position of the tooltip and the camera to a same coordinate frame and modeling the uncertainties of the visual measurement, the sensors can be fused together in an Extended Kalman filter. Experim
4、ental results show greatly improved pose estimates when the sensor fusion is used. I. INTRODUCTION Robot control in unstructured environments is a challengingproblem. Simple position based control is not adequate,if the position of the workpiece is unknown during manipulationas uncertainties present
5、 in robot task prevent therobot from following a preprogrammed trajectory. Sensorbased manipulation allows a robot to adapt to a dynamicand uncertain environment. With sensors the uncertaintiesof the environment can be modeled and the robot cantake actions based on the sensory input. In visual servo
6、ingthe robot is controlled based on the sensory input from avisual sensor. A 3-D model of the workpiece can be createdand 6-DOF pose of the object can be determined by poseestimation algorithms. Visual servoing enables such tasksas tracking a moving object with an end-effector mountedcamera. However
7、, a single camera visual measurement isoften not accurate in all degrees of freedom. Only theobject translations perpendicular to the camera axis can bedetermined accurately. Object translation along the cameraaxis is difficult to measure as even a large change in objectdistance induces only a small
8、 change in image. The sameapplies for the rotations as only the rotation around thecamera axis can be determined accurately whereas rotationsaround the off axes yield only a diminishing change in image. Vision can be complemented by other sensor modalities inorder to alleviate these problems. With a
9、 tactile or forcesensor the local shape of the object can be probed. When the tooltip is in contact with an object and the position of the tooltip is known, information about the object canbe extracted. However, a single tooltip measurement canonly give one point on the object surface. Without other
10、information this measurement would be useless as we donot know on which location of the object the measurementis taken. Also if the object is moving the point of contactcan move even if the position of the tooltip is stationary.Combining a force sensor with vision would seem appealingas these two se
11、nsors can complement each other. Since the force and vision measure a fundamentally differentsensor modality the information from these sensors cannot befused directly. Vision can give the full pose of an object withrespect to the camera, but force sensor can measure forcesonly locally. When the for
12、ce sensor is used only to detect if the tooltip is in contact with the object, no other informationcan be gained. Combining this binary information with visualmeasurement requires that both the position of the tooltipand the camera are known in the same coordinate frame.This can be achieved as the i
13、ncremental encoders or jointangle sensors of the robot can determine the position of therobot end-effector in world coordinates. If also the hand-eyecalibration of the camera and the tool geometries are known,both of the measurements can be transformed into worldcoordinate frame. A single tooltip me
14、asurement can only giveconstraints to the pose of the object but not the full pose.Therefore a single measurement is meaningless unless it canbe fused with other sensor modalities or over time. Combiningseveral sensor modalities or multiple measurements over time can reduce the uncertainty of the me
15、asurements,but in order to fuse the measurements the uncertainty ofeach individual measurement must be estimated. Also thesensor delay of the visual measurements must be taken intoaccount when fusing the measurements. Especially, eye-inhandconfiguration requires accurate synchronization of thepositi
16、onal information and visual measurement. Otherwisevision will give erroneous information while the end-effectoris in motion. In this paper, we present how vision and force can befused together taking into account the uncertainty of eachindividual measurement. A model based pose estimationalgorithm i
17、s used to extract the unknown pose of a movingtarget. The uncertainty of the pose depends on the uncertaintyof the measured feature points in image plane and this uncertainty is projected into Cartesian space. A tooltipmeasurement is used to probe the local shape of the objectby moving on the object
18、 surface and keeping a constantcontact force. An Extended Kalman filter (EKF) is thenused to fuse the measurements over time bytaking intoaccount the uncertainty of each individual measurement. To our knowledge this is the first work using contact informationto compensate the uncertainty of visual t
19、racking while thetooltip is sliding on the object surface. II. RELATED WORK Reduction of measurement errors and fusion of severalsensory modalities using aKalman filter (KF) frameworkis widely used in robotics, for example, in 6-DOF posetracking 1. However, in visual servoing context Kalmanfilters a
20、re typically used only for filtering uncertain visualmeasurements and do not take into account the positionalinformation of the end-effector. Wilson et al. 2 proposed tosolve the pose estimation problem for position-based visualservoing using the KF framework as this will balance theeffect of measur
21、ement uncertainties. Lippiello et al. proposea method for combining visual information from severalcameras and the pose of theendeffector together in KF 3.However, in their approaches the KF can be understood as asingle iteration of an iterative Gauss-Newton procedure forpose estimation, and as such
22、 is not likely to give optimalresults for the non-linear pose estimation problem. Control and observation are dual problems. Combiningof force and vision is often done on the level of control4, 5, 6. As there is no common representation for thetwo sensor modalities combining the information in oneob
23、servation model is notstraightforward. Previous work oncombining haptic information with vision inobservationlevel primarily uses the two sensors separately. Vision is usedto generate a 3D model of an object and a force sensor toextract physical properties such as stiffness of the object 7.Pomares et al. combined a force sensor and an