This site uses cookies to improve your user experience. If you continue to use our website, you consent to our Cookies Policy

  1. Home
  2. Insights
  3. How to Track an Object in a Video With the Help of Math
How to Track an Object in a Video With the Help of Math

October 15, 2020

How to Track an Object in a Video With the Help of Math

There is no doubt that working with video content is always fun. Still, it’s also difficult to argue that working with such content doesn’t require a lot of carefulness and attention to detail. More so when we talk about something that requires extreme accuracy like catching and following elements in a video. But nothing is impossible.

It’s easy to be convinced that complex tasks require complex solutions. Well, we are happy to tell you that’s not always the case. In our work, we had one particular task that seemed pretty complicated at first glance: We needed an app to select and follow some dynamic objects and allow a user to interact with them. But how can this be done? You may be surprised, but in this case, math saved the day!

A Few Words About the Project

Our team works on a variety of projects and faces different challenges every day. We would like to tell you about one of them where the scientific approach was our key to success and allowed us to make this application work the way it works now.

We are talking about the Drive Focus app. This is a video-based tablet app for Android and iOS that helps drivers rehabilitate after difficult medical conditions or life experiences that have affected their driving. The app is of particular use for driving schools, universities, and medical research centers.

We needed to make an application for helping both professional and novice drivers enhance their skills. In the app, users would watch short interactive videos of real driving experiences and tap on various critical objects on the road. The app also had to be suitable for people recovering after complex medical conditions or life experiences that impaired their driving.

As a result, we have developed a highly innovative application for learning and improving driving skills. It’s suitable for beginner drivers and war veterans, as well as those who are suffering from a medical condition. In the future, the app might be expanded to train emergency and firefighting crews. It’s currently promoted by driving schools, universities, meetups, and conferences, and available on Google Play and App Store for $12.99. 

Why We Need to Capture Areas in a Video

We had to achieve the right balance between the gaming element and the education component. Even though we had to attract users’ attention, we also could not forget about the app’s primary task, which is to train users in the visual driving environment.

When a user passes a drive, several critical items must be simultaneously displayed, such as traffic signs, other cars, or pedestrians. In this case, the user has to tap the critical items in the order of priority. However, all of the critical items simultaneously appearing together have to be synchronized with the displaying video, changing their sizes and placements accordingly. To resolve this challenge, we have used linear interpolation. Such an approach is also frequently applied in animation when two keyframes are used to mark up the video.


Linear Interpolation as the Best Solution

In order to make this project possible, where you can interact with objects in the video, we decided to combine solutions such as a simple crop and linear interpolation.

In this case, we selected the necessary areas on one video shot (for example, a car), and cropped them. However, since this is a video, and any object moves as the video plays, it was necessary to solve the issue of moving these selected areas, since it would be ineffective to crop a frame for each millisecond of the video.

Linear interpolation helped us to solve this problem. The essence of interpolation is to use the available data to get the expected values at unknown points.

This method assumes that we select the object we need at point A and point B, and using interpolation, we calculate the value between them. This value will set the required trajectory of our object. 


At point A and point B, we have its coordinates and size, and, for example, in 10 seconds we have to move from point A to point B. Thanks to linear interpolation, as the video plays, the values that set the object's trajectory were calculated.


Why Linear Interpolation

Crop and interpolation proved to be an efficient and simple method to solve this problem.

It was a little surprising to us that there were almost no specific ready-made solutions for these kinds of issues. There are some video editors that can do it, which we used when thinking over the way to work with videos. These editors also work to transform an object as the video plays, but they use very complex interpolation methods.

So we did something similar but using linear interpolation and the ability to make edits manually.

As a result, our method can be reflected in a few steps:

  • Select the necessary area on the video, setting the first control point;
  • Speed forward the video and select the second control point;
  • Play the video. The trajectory will be drawn on the canvas of this field.
  • If somewhere our object goes beyond the selected area, you can always add additional control points. This way the object's trajectory will be more accurate.

Another important note here is that videos can be in different formats. In this regard, all drops are saved as a percentage. This way, no matter what format the video is displayed in, the required area will always be captured correctly.

Conclusion: Motion Tracking in Video

To sum it up, we have created a unique training app for drivers containing both the gaming elements and the educational components. We have resolved a number of challenges related to the app’s videos, which were quite a few.

For this project to succeed and be able to present it exactly in the form the client needed, we had to study ways to cut out areas from the video for further interaction with objects. It was a challenge that we managed to solve with the help of good old mathematics, or rather with the help of linear interpolation.

Thanks to this method, we no longer needed to select the necessary object on the video over and over again. You just need to set the main points, and the interpolation will do everything itself, the only thing left is the control of the result.

Subscribe to new posts.

Get weekly updates on the newest design stories, case studies and tips right in your mailbox.