ArUco Marker Tracking with OpenCV

Ali Yasin Eser
6 min readJun 27, 2020


Hello everyone! While working on my graduation project, I realized there is not enough documentation for ArUco marker tracking. My goal is to give a brief introduction. I won’t focus on math, but you can check out the functions I’ve been using.

Let’s talk about markers first. Markers are reference shapes which can be helpful to give some info to change our space from 2D to 3D. ArUco markers are normally squared binary(Black&White) markers. These markers are stored in the ArUco dictionary as binary. After the detection of the markers in the image, they are compared to the ones in the dictionary and calculated which marker has which id. You can check the documentation here.

ArUco Marker.

According to OpenCV document, the steps to find ArUco Markers in the images are below:

  1. Finding Marker Candidates: The first step is finding square shapes so that we can have all the candidates.
  • We need to apply the adaptive threshold to the image first. Adaptive thresholding is done by applying window sliding(3x3, 5x5, 11x11, …) and finding optimum greyscale value for each window. Values that are below the calculated value will be black and above is white.
  • From the binary image, we find contours. If they are not convex or close to a square shape, they will be dropped. These conditions are defined with some filters(too big or small edges, the distances between the edges, etc.)

2. Marker candidate search: Pre-process to binary matrix and search.

  • Perspective transform will be applied to square-shaped candidates. This form is known as the canonical form. After that, otsu thresholding will be applied. Otsu thresholding is finding an optimum point of the histogram of the image so that threshold of the value will minimize black(background) and white(foreground) distribution difference.
  • Let’s say we are searching for 5x5 markers(it’s defined in the code, we’ll come to that). The last version of the images will be divided into 5x5 sub-images(with the border, 7x7). Since they are already thresholded, we have binary images and it’s now easy to convert them to a binary matrix! These matrices will be searched in the dictionary and if they match with the markers, we’ll have their ids.
(Soldan sağa) Orjinal görüntü, bulunan işaretçiler ve numaraları, reddedilen adaylar(kırmızı ile işaretlenmiş).

And now, we can estimate the positions of the markers. ArUco functions return 2 vectors, the translation(position), and rotation of the markers. These vectors are the position and rotation of the camera, with respect to the marker. If you need distances or some 3D info, it is totally normal but when you have more than one marker, camera centered info is better option most of the time. I’ll get into that in future articles. The rotation vector is Rodriguez's angles between the camera and marker center. The translation vector is the 3D position difference between the camera and the marker, the order is x,y, and z.

How to get depth from 2D image? We need a parameter to calculate depth. We are printing these markers to a paper which means they will have a length. From the length, and 4 points(corners of the marker) we’ll have enough info to get the 3D position! The length should be in the same unit that you used with camera calibration. Please use the same unit and calculate the length well. Even a tiny mistake can be a problem if the length is small and marker is far away from the camera.

Before the coding, make sure you calibrated your camera. If you didn’t, you can follow my calibration guide. If you calibrated your camera, you have your calibration matrix and we can start:

import numpy as np
import cv2
import cv2.aruco as aruco
cap = cv2.VideoCapture(1) # Get the camera sourcedef track(matrix_coefficients, distortion_coefficients):
while True:
ret, frame =
# operations on the frame come here
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Change grayscale

VideoCapture uses the second camera if you can change the value as 0 if you want to use your first(and maybe the only) camera. Our track function requires calibration matrix and distortion coefficients. Since we aimed to have a video stream, create an infinite loop, and get the image and convert to the grayscale level. ArUco asks for grayscale because it is required for threshold operations.

aruco_dict = aruco.Dictionary_get(aruco.DICT_5X5_250)  # Use 5x5 dictionary to find markers
parameters = aruco.DetectorParameters_create() # Marker detection parameters
# lists of ids and the corners beloning to each id
corners, ids, rejected_img_points = aruco.detectMarkers(gray, aruco_dict, parameters=parameters, cameraMatrix=matrix_coefficients, distCoeff=distortion_coefficients)

The first thing is choosing the dictionary. ArUco markers have sizes from 4x4 to 7x7 with 50,100,250 and 1000 available ids. If you need fewer markers, use a smaller dictionary because the search will be faster. Get the dictionary with Dictionary_get. Create detector parameters to detect them in the image. Now we can use detectMarkers function to find our markers.

The parameters of the function:

  • gray: Grayscale image.
  • aruco_dict: The dictionary we created, we’ll search in it.
  • parameters: Detector params.
  • cameraMatrix: Calibration matrix from the camera calibration process.
  • distCoeff: Distortion coefficients from the camera calibration process.

Function output:

  • corners: For every marker function has found, we’ll get 4 corners. For N markers, our result will be [N][4] matrix.
  • ids: ids of the markers. Same order as corners parameter.
  • rejected_img_points: Corner points of the marker candidates who are rejected by the function.
if np.all(ids is not None):  # If there are markers found by detector
for i in range(0, len(ids)): # Iterate in markers
# Estimate pose of each marker and return the values rvec and tvec---different from camera coefficients
rvec, tvec, markerPoints = aruco.estimatePoseSingleMarkers(corners[i], 0.02, matrix_coefficients, distortion_coefficients)
(rvec - tvec).any() # get rid of that nasty numpy value array error
aruco.drawDetectedMarkers(frame, corners) # Draw A square around the markers
aruco.drawAxis(frame, matrix_coefficients, distortion_coefficients, rvec, tvec, 0.01) # Draw Axis

If ids not empty, we have the markers. We can iterate over the corners or ids and use estimatePoseSingleMarkers function. Parameters are almost the same, except 0.02 value. This is my marker size, in the same metric unit that I calculated my calibration matrix. We’ll get rvec and tvec vectors which are rotation and translation vectors I mentioned above. We can also use drawAxis to see markers and make sure they are found correctly. I gave frame which is the first image I took from the camera, RGB. Value 0.01 is the length of the axis, I used half-length of the marker since the center of the marker is the point axis will be drawn.

You can track the ArUco markers now. The whole function is below:

Please consider that a lot of parameters can affect the error rate. Camera calibration success, light, environment, etc. will affect success. Let’s look at our pose estimation:

Marker estimation and tracking.

Not bad. It would be better if I use a solid plate and smile a bit. I tried to explain how ArUco tracking works. I hope it will be helpful for some of you out there. In my next article, I will show how you can get relative positions of the markers. Have a great day!





Ali Yasin Eser

iOS Developer with Computer Vision and Embedded Systems background. Solo musician with 3 albums.