Actas de las IX Jomadas Argentinas de Robótica 15-17 de noviembre, Córdoba, Argentina

Validation of an IMU-camera fusion algorithm using an industrial robot

Validación de un algoritmo de fusión de IMU y cámara usando un robot industrial

Presentación: 25/09/2017

Aprobación: 02/12/2017

Gonzalo Perez-Paina

Centro de Investigación en Informática para la Ingeniería, Facultad Regional Córdoba, Universidad Tecnológica Nacional - Córdoba, Argentina

gperez@frc.utn.edu.ar

Claudio Paz

Centro de Investigación en Informática para la Ingeniería, Facultad Regional Córdoba, Universidad Tecnológica Nacional - Córdoba, Argentina

cpaz@frc.utn.edu.ar

Martín Pucheta

Centro de Investigación en Informática para la Ingeniería, Facultad Regional Córdoba, Universidad Tecnológica Nacional y Consejo Nacional de Investigaciones Cientíﬁcas y Técnicas - Córdoba, Argentina

mpucheta@frc.utn.edu.ar

Bruno Bianchini

Centro de Investigación en Informática para la Ingeniería, Facultad Regional Córdoba, Universidad Tecnológica Nacional - Córdoba, Argentina

bruno.bian.26@gmail.com

Fernando Martínez

Centro de Investigación en Informática para la Ingeniería, Facultad Regional Córdoba, Universidad Tecnológica Nacional - Córdoba, Argentina

fermartinez51@gmail.com

Martín Nievas

Centro de Investigación en Informática para la Ingeniería, Facultad Regional Córdoba, Universidad Tecnológica Nacional - Córdoba, Argentina

mnievas@frc.utn.edu.ar

Resumen

La integración de una cámara orientada hacia abajo con un sensor de unidad de medición inercial (IMU, por las siglas en inglés de Inertial Measurement Unit) hace posible proporcionar un sistema de estimación de pose liviano y de bajo costo para vehículos aéreos no tripulados (UAVs, por Unmanned Aerial Vehicle) y micro-UAVs (MAVs, por Micro Unmanned Aerial Vehicle). Recientemente, los autores desarrollaron un algoritmo de un filtro de fusión de señales de una IMU y un sensor exteroceptivo para la estimación de la posición y la orientación. El objetivo de la estimación es utilizarlo en el bucle de control exterior de un UAV para el control de posición. Este trabajo presenta una configuración experimental para probar ese algoritmo utilizando un robot industrial para producir trayectorias planas precisas como una alternativa segura a probar el algoritmo en UAV reales. Los resultados de la estimación de fusión IMU-cámara para posiciones lineales y velocidades lineales muestran un error admisible para ser integrado en UAVs reales.

Palabras claves: fusión sensorial, unidad de medición inercial, cámara monocular, robot industrial, filtro de error de estado de Kalman.

Abstract

The integration of down-looking camera with an in-ertial measurement unit (IMU) sensor makes possible to provide a lightweight and low-cost pose estimation system for unmanned aerial vehicles (UAVs) and micro-UAVs (MAVs). Recently, the authors developed an algorithm for IMU and exteroceptive sensor fusion ﬁlter for position and orientation estimation. The aim of the estimation is to be used in the outer control loop of an UAV for position control. This work presents an experimental set up to test that algorithm using an industrial robot to produce accurate planar trajectories as a safe alternative to testing the algorithm on real UAVs. The results of the IMU-camera fusion estimation for linear positions and linear velocities show an error admissible to be integrated on real UAVs.

Keywords: sensor fusion, inertial measurement unit, monocular camera, industrial robot, error-state Kalman ﬁlter.

I Introduction

The autonomous navigation of unmanned aerial vehicles (UAVs) in GPS-denied environments is currently possible by using controllers that employs an on-board computation of the feedback signals of the vehicle pose (position and orientation) using multiple sensors –e.g., IMUs (Inertial Measurement Unit), cameras (monocular, stereo, and RGB-D), and/or laser range ﬁnders– combined with efﬁcient algorithms for data fusion that together minimize the uncertainty of the individual sensor measurements.

Concerning the state estimation for UAVs and MAVs, Kumar and Michael (2012) highlighted that the integration of camera sensors with IMUs enable a lightweight alternative to designs based on laser range ﬁnders or RGB-D cameras (that provide dense and rich three-dimensional information but have a trade-off with payload constraints and computational resources required). With the aim to (i) use low-weight and low-cost sensors, (ii) use efﬁcient real-time algorithms to reduce computational resources, (iii) use on-board computation to avoid the low-latency and possible interruptions of the communication for off-board computation, and (iv) reduce the energy consumption to increase the autonomy, the authors developed a sensor fusion algorithm (PerezPaina et al., 2017) for IMU and camera sensors that is further experimentally evaluated in this work.

In order to intensively test a sensor fusion algorithm, the ﬁrst option is to use a real UAV but involves a risk of damage if the algorithm fails while the ﬂight takes place. The main idea of the present work is to test the algorithm on-line but in a detached way from the real UAV’s ﬂights. So it is necessary to generate an accurate reference motion using another facility, in this case, an industrial manipulator is chosen; see Fig.1.

Fig. 1: Experimental setup to test the IMU-camera fusion estimation algorithm.

In this work, the proposed experimental set up is to generate motions for an IMU and a down-looking camera mounted on the end-effector of a robot manipulator to emulate the signals emitted by an UAV in motion. The measurements of the IMU and the down-looking camera are fused using the algorithm proposed PerezPaina et al. (2017). The aim of this work is to validate the sensor fusion for a motion parallel to the ground plane, including linear position and linear velocity, to be used in an UAV on-board controller, as the proposed Pucheta et al. (2016). The full pose estimation will be tackled in future work. The experiments show that the errors in the measurements are admissible to be integrated in the real system.

The article is organized as follows. Section II introduces the description of the robot manipulator used to generate the reference motion. Then, it describes the selection of the ﬁducial visual marker used to estimate the pose of the down-looking camera. Next, the rigid transformations used in the IMU-camera sensors are detailed. Section III brieﬂy reviews the fusion estimation algorithm. Section IV describe the implementation and the obtained results for two different paths, and section V presents the conclusions and future work.

II Materials and methods

A. Welding manipulator

An industrial robot manipulator, CLOOS Romat 310, is used in this work to generate accurate trajectories. It is available from the Mechanical Engineering’s Department of the FRC-UTN. This robot is a six- degrees-of-freedom serial anthropomorphic arm and is equipped with the ROTROL control system and a proprietary programming language named CAROLA. With respect to the average size of the welding robots existing in the market, its workspace is comparatively large (it is often combined with an additional translational axis to weld large mechanical parts). Its workspace is large enough to include reference trajectories for UAV calibration purposes.

The robot can be programmed by teaching some way-points to approximate the end-effector near the working volume, and from that point, several trajectories can be deﬁned using geometric templates referenced to a predeﬁned coordinate system. For instance, a program can be referenced to the home position, to a used-deﬁned frame of reference, or can be relative to a frame located at the base of the robot; in this work the latter frame is chosen. The end-effector pose is deﬁned by a 6-tuple of coordinates, 3 for the Cartesian positioning with respect to a frame located at the base of the robot and 3 for the orientationrelative to the home frame orientation. Because its welding capabilities, the programming language allows the user to deﬁne many characteristics for the interpolation of motions, several ways to approach the sharp corners of the trajectories in a controlled form, avoiding overlapping or oscillation through the target trajectories. This makes the robot an valuable research platform to generate accurate free-form paths in SE(3).

B. Visual marker for pose estimation

A ﬁducial visual marker on the ground is used to obtain an estimation of the absolute pose of the down-looking camera. To detect and track this marker is easier than to employ natural features. Visual marker with geometric forms like square are better than other shapes because they have four remarkable points (corners) that can be used to determine the pose with respect to the camera, while inner region can be used for identiﬁcation. Different methods are used for identiﬁcation and binary patterns are the most usual. Among the predeﬁned pattern for visual markers available in the literature, the ArUco dictionary of markers (Garrido-Jurado et al., 2014), which maximize intermarker distance is chosen. This allow to reduce the size of the dictionary to the minimum needed, increasing the speed of detection, when more than one marker is used.

The ArUco algorithm is integrated with the OpenCV library (Bradski & Kaehler, 2008) to compute the rigid transformations between the camera and the maker frame systems.

C. Reference frame systems for the used setup

The reference systems of the used setup can be seen in Fig. 2, where {i} is the inertial or navigation frame attached to the ground visual marker, {b} the body frame (coincident with the IMU frame), and {c} the camera frame.

The translation vector and the orientation matrix relating the navigation and the camera frames, are estimated by the visual algorithm. The translation vector and the orientation matrix relating the navigation and the body frame, are estimated by the fusion ﬁlter. Last, the relative pose given by the vector

and the matrix , relating the camera and body frames, is a known data measured by hand. As can be seen in Fig. 2, this relative pose is given only by a translation in the z coordinate and therefore .

Figure 2: Three frame systems and their rigid transformations represented by curved arrows: {c} is the frame system located at the optical center of the camera, {i} represents the inertial or navigation frame system, and {b} is the frame with unknown pose to be referred with respect to {i}. For each frame, local arrows in red, green, and blue color represent the x, y, and z axes, respectively.

D. Reference frame transformation

Both the relative position and orientation between the camera reference system {c}, and the reference system of the ground marker attached to inertial frame {i}, can be written as and , respectively (Ma et al., 2010). For vector , the sub-index ci indicates the starting {c}, and target {i}, reference systems. The supra-index indicates the frame in which the coordinates of this vector are expressed. Concerning the convention for the orientation, the matrix , uniquely represents the orientation of the reference system {i} with respect to the camera reference system {c}. This matrix can also be used to change the coordinates of a vector from the {i} reference system to the camera reference system {c} (Pucheta et al., 2014).

As was stated, using the visual algorithm the pose given by and is obtained, for which the inverted transformation needed by the fusion ﬁlter and is computed.

In order to compute the elements of the unknown pose, and , a geometric analysis is performed. Vector addition of relative positions can be formulated as

(1)

where, and respectively represents the same vectors as (although in different sense) and but referred to the coordinate system {i} to make the sum consistent. This pair of positions can be obtained by using the change of coordinates matrix , which is the inverse (and also the transpose for being orthogonal) of , computed by the detection algorithm. Then, (1) can be written as

(2)

Provided that

, (3)

(1) can also be written as

(4)

On the other hand, Fig. 2 also depicts that a vector represented in the frame system {b} can be changed of basis to be referred to the system {i}, and that is equivalent to a indirect change from {b} to {c} followed by a change from {c} to {i}.

An arbitrary vector

can be changed of reference system from {b} to {i} by means of

(5)

or, using an intermediate frame, by computing

(6)

where, clearly, the matrix that represents the orientation of the frame system {b} with respect to the frame system {i} is the same that produces the change of basis for a vector from {b} to {i}. Therefore, the orientation of frame {b} with respect to frame {i} can be written as

(7)

or, using the data supplied by the algorithm described by Garrido-Jurado et al. (2014), as

. (8)

Figure 3 shows real setup, similar to the schematic view in Fig.2, indicating the three reference system evolved, {i}, {c}, and {b}. The 3D printed sensor holder, which rigidly attach the camera and IMU (ﬁxed to the manipulator end-effector) can be seen in the zoom area.

Figure 3: Yellow and orange arrows represent relative position and orientation between different systems. Red, green, and blue arrows represent the x, y, and z axis, respectively.

III. IMU-Camera fusion

The implemented sensor fusion algorithm is based on the Error-State Kalman Filter (ESKF), which performs a loosely coupled sensor fusion (Madyastha et al., 2011; Solà, 2015). The inertial unit is composed of a three axis accelerometer and a three axis gyroscope. The implemented ﬁlter is based on the described by Perez Paina et al. (2017).

A. Error-state Kalman Filter

In the ESKF, the state is expressed as a nominal value plus an error or perturbation, , where the symbol ⊕ (composition) is an ordinary addition for position and velocity, and a quaternion product for the orientation. The ESKF performs the nominal state propagation while estimating the error state used to correct this integration.

The formulation of the ESKF is based on the system stochastic model, given by

                                                                                                                                          (9)

                                                                                                                                           (10)

where the state has to be expressed in terms of the nominal and the error parts.

The nominal state is composed of the position , the linear velocity , and the orientation expressed as a unit quaternion , all with respect to an inertial frame. The inertial measurements are used as input in the process model (9), with . On the other hand, the exteroceptive sensor measurement of (10), given here by a visual monocular pose estimation, is composed of the absolute position and orientation, i.e. .

The prediction stage of the ESKF ﬁlter is given by

                                                                                                                        (11)

                                                                                                            (12)

where (11) is obtained from the discretization of (9) and is used to integrate the nominal state from the inertial measurements. Equation (12) propagates the covariance matrix representing the estimation uncertainty and incorporates the sensor noise of (9), , which has also to be obtained from the continuous-time model.

On the other hand, the update stage of the ESKF is given by

                                                                                                       (13)

                                                                                                               (14)

                                                                                                                           (15)

which uses the camera pose measurement in (14), and the sensor model given by (10). Equation (13) incorporates the measurement noise of (10), .

As usual, the ﬁlter orientation error is denoted in its minimal representation, avoiding redundant parameters that can produce singularities when applying the needed additional constraints. In the context of the Kalman ﬁlter it is also known as a multiplicative error model (Markley, 2004; Markley, 2003). Thus, the ﬁlter estimation error is

                                                            ,                                         (16)

where for a small orientation error, can be expressed as

                                                                      ,                                                    (17)

with being the minimum orientation error representation. Then, the error is used to compute the covariance matrix which describes the uncertainty in the ﬁlter estimation required in (12), (13), and (15).

B. Process model based on inertial sensor measurements

The kinematic model used for integrating the inertial measurements is applied in the prediction stage of the estimation ﬁlter. For the Kalman ﬁlter implementation, the discrete-time kinematic model is derived from the continuous-time model, which for the case of the error state Kalman ﬁlter has to be separated in: (1) model for the nominal state integration, and (2) model for the uncertainty propagation.

1) Continuous-time model: The inertial sensor based kinematic continuous-time model is given by

, (18)

, (19)

(20)

where is the matrix representation of the orientation given by the unit quaternion , is the gravity acceleration expressed in the inertial frame, the symbol ⊗ represents the Hamilton product (Trawny and Roumeliotis, 2005) where the first quaternion component is the scalar value.

The accelerometer and gyroscope inertial sensors model are

(21)

(22)

where the subscript stands for measured value, which are composed of the true value plus an additive Gaussian noise and .

By replacing (21) in (19) and (22) in (20), the continuous-time stochastic model of (9) is obtained, which gives

, (23)

, (24)

(25)

Then, the nominal state evolution is given by

, (26)

, (27)

(28)

and the error state by

, (29)

, (30)

(31)

The derivation of (26) to (31) follows Solà (2015) and Trawny & Roumeliotis (2005).

As can be seen, the error state model of (29) to (31) is linear, which can be expressed as

(32)

with , and

, (33)

with ≡ .

2) Discrete-time model: In order to obtain the discrete-time model, here it is proposed to use the Euler integral solution of (26)-(31) for the nominal state and uncertainty propagation, respectively. The discrete-time evolution of the nominal state (omitting the subscript n), is then given by

, (34)

, (35)

(36)

where and

is a 4 skew-symmetric matrix. Therefore, the state transition matrix of (12) results

(37)

where is given by (33).

The discrete-time covariance matrix of the process noise is given by Simon (2006)

(38)

A detailed representation of the sub-matrices of (38) can be found in Perez Paina et al. (2016).

C. Observation model based on camera measurements

The measurement vector of the visual pose estimation algorithm is composed of the position and orientation with respect to the inertial frame, i.e. . Hence, the measurement function is , where is the additive Gaussian noise. In order to compute the ﬁlter innovation , it is necessary to express the orientation using an unit quaternion, such that can be computed as an ordinary subtraction for the position and using the quaternion product for the orientation. Given the nominal predicted state, the measurement prediction is ), and the innovation can be computed as

, (39)

and considering a small orientation error

, (40)

then, the minimal representation of the innovation is .

On the other hand, the Jacobian matrix of the measurement model, used in (13) and (15), has to be deﬁned. Given the visual algorithm providing absolute position and orientation measurements, this matrix results

, (41)

The covariance of the camera measurements, used in (13), is , which is composed of the covariance matrix to describe the uncertainty in the position and for the orientation expressed with Euler angles. For a more complex sensor, the measurement Jacobian matrix can be obtained as explained in Solà (2015).

IV. Implementation and Results

A. Implementation

The experimental results were obtained using the following hardware equipment: inertial measurement unit from Microstrain model 3DM-GX1 where raw data was used, camera from The Imaging Source model DMM22BUC03 (USB 2.0, 640x480 px), with optical lens also from The Imaging Source model TBN3.5C-3MP (low distortion board lens M12x0.5mm, focal length 3.5mm). The fusion ﬁlter was implemented in C++ programming language using the computer vision library OpenCV version 3.3.0, with the contribution module (also v3.3.0) for the ArUco algorithm implementation, and the Eigen library v3.3 for the ﬁlter matrix operations.

B. Results

The experimental evaluation of the fusion ﬁlter was carried out using two different paths at constant linear velocity: a circle, and a square one. Figure 4 shows the estimation result with the circular path. In Fig. 4a the path can be observed, the red line represents the real path generated by the robot manipulator, and the blue one is the estimation given by the fusion ﬁlter, whereas Fig. 4b shows the position in meters and the linear velocity in meter per seconds against the time given in seconds. Figure 5 shows the same as Fig. 4 but for the square path.

As it can appreciated in the top view of the -path given in Fig. 4a and 5a, the estimation presents more noise near the and planes. When approaching these particular points, the used visual algorithm gives an orientation estimation which oscillates between both sides of or axes. Then, when the camera measurement is inverted to obtain the pose with respect to the navigation frame, the orientation oscillation affects more the error in the translation estimation. The elimination of these effects will be the focus of next research.

Fig. 4: Circular path: (a) Shows the reference path in red color and the estimated one in blue, and (b) shows the position and linear velocity vs. time.

Fig. 5: Square path: (a) Shows the reference path in red color and the estimated one in blue, and (b) shows the position and linear velocity vs. time.

Conclusions

This work presented preliminary results of experiments achieved to test a fusion algorithm to integrate the measurements of a down-looking camera with an inertial measurement unit sensor, which will be take part of the full pose feedback signal of a UAV control system. An industrial robot was used to generate planar trajectories at constant linear velocity, a circle and an square, to test the algorithm for position estimation. The results for each task show position and velocity estimation errors admissible to be integrated on real UAVs. Next experiments will include the orientation estimation to test the full pose estimation. It is also expected that the level of noise in the real UAV will be higher than the produced by the robot.

Acknowledgments

This work was supported by the Universidad Tecnológica Nacional (Projects PID-UTN 3935, UTI3832TC, AMUTICO0007819TC and CCUTNCO0007805). The authors acknowledge to Nelson Vallejos, from the Mechanical Engineering Department, UTN-FRC, for assisting us with the robot programming issues.

Referencias

Bradski, G. & Kaehler, A. (2008). Learning OpenCV: Computer vision with the OpenCV library. Boston, MA, USA: O’Reilly Media, Incorporated.

Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F., & Marín-Jiménez, M. (2014). Automatic generation and detection of highly reliable fiducial markers under occlusion. PatternRecognition, 47(6):2280–2292. doi: 10.1016/j.patcog.2014.01.005

Kumar, V. & Michael, N. (2012). Opportunities and challenges with autonomous micro aerial vehicles. The International Journal of Robotics Research, 31(11):1279–1291. doi: 10.1177/0278364912455954

Ma, Y., Soatto, S., Kosecká, J., & Sastry, S. S. (2010). An Invitation to 3-D Vision: From Images to Geometric Models. Interdisciplinary Applied Mathematics, Berlin: Springer.

Madyastha, V., Ravindra, V., Mallikarjunan, S., & Goyal, A. (2011). Extended Kalman Filter vs. Error State Kalman Filter for Aircraft Attitude Estimation. Guidance, Navigation, and Control and Co-located Conferences. 8-11 August, Portland, Oregon. doi: 10.2514/6.2011-6615

Markley, F. L. (2003). Attitude Error Representations fo Kalman Filtering. Journal of Guidance, Control, and Dynamics, 26(2):311–317. doi: 10.2514/2.5048

Markley, F. L. (2004). Multiplicative vs. additive filtering for spacecraft attitude determination. In: 6th Conference on Dynamics and Control of Systems and Structures in Space (DCSSS). Riomaggiore, Italy: NACA.

Perez Paina, G., Paz, C., Kulich, M., Saska, M., & Araguás, G. (2016). Fusion of Monocular Visual-Inertial Measurements for Three Dimensional Pose Estimation, volume 9991 of Lecture Notes in Computer Science, chapter 20, pages 242–260. Cham, Switzerland: Springer International Publishing.

Perez Paina, G., Pucheta, M., & Paz, C. (2017). IMU- and exteroceptive sensor-based fusion for UAV control. In: Proceedings of the XVI Workshow on Information Processing and Control RPIC 2017, pages 1-6. Mar del Plata, Argentina: AADECA.doi: 10.23919/RPIC.2017.8214335

Pucheta, M. A., Alberto, N., Paz, C., & Perez Paina, G. (2016). Trajectory planning for an unmanned quadrotor. In: Giusti, S., Pucheta, M., and Storti, M. (eds.), Mecánica Computacional Vol. XXXIV, Proc. of the XXII Congreso sobre Métodos Numéricos y sus Aplicaciones ENIEF 2016, pages 2809–2824, Córdoba, Argentina: AMCA.

Pucheta, M. A., Paz, C. J., & Pereyra, M. E. (2014). Representaciones cinemáticas de orientación y ecuaciones de estimación. In: Bertolino, G., Cantero, M., Storti, M. and Teruel, F. (eds.) Mecánica Computacional Vol. XXXIII, Proc. of the XXI Congreso sobre Métodos Numéricos y sus Aplicaciones ENIEF 2014, pages 2303–2324, Bariloche, Argentina: AMCA.

Simon, D. (2006). Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches. New York, USA: Wiley-Interscience.

Solà, J. (2015). Quaternion kinematics for the error-state Kalman filter. Preprint arXiv: 1711.02508.

Trawny, N. & Roumeliotis, S.I. (2005). Indirect Kalman filter for 3D attitude estimation. Technical Report 2005-002 Rev. 57, University of Minnesota, Department of Computer Science & Engineering.