SUMMARY

We are going to optimize the performance of an open source object tracker on a smartphone. The aim of the project is to efficiently use all the heterogenous execution resources that a smart-phone provides in order to boost the performance of the circulant object tracking algorithm.

15618 Project

Optimizing BoofCV object tracker on a smartphone

Ashutosh Tadkase(atadkase)

 Shreedutt Hegde(shreeduh)

Object tracking in a video is a compute intensive task. Successive frames of the video have to be compared in order to determine the motion of the object and track it across frames. Inherently, an axis of parallelism exists across the pixels in a frame. This axis can be exploited in order to speed-up the performance of object tracking. Another axis of parallelism can also be present across groups of frames.

The tracker that we are proposing to optimize is not scale invariant. Its performance on a single core degrades with larger frames. We aim to optimize it and measure the performance improvement due to the use of various heterogenous execution resources made available on a modern smart-phone. Smartphones these days come equipped with multi-core CPUs, GPUs, multi-core DSPs and other ASICs. These execution resources would help us to exploit the parallelism in this application.

BACKGROUND

THE CHALLENGE

  1. When compared to a multicore PC equipped with a GPU, a smartphone has lower performance execution resources. This implies that applications that would perform well on a computer would fare much worse on a smartphone. Also, smartphones typically have smaller memories, and also have a lot of background processes which reduce the effective availability of the resources.

  2. Since we are using heterogenous computing resources, the speed of execution in the different pipelines differs. One challenge would be to make scheduling decisions so that all the resources are busy and and provide the maximum throughput at low latencies. Also, the interconnects on a smartphone might force the application to be bandwidth-bound

RESOURCES

Intially we plan to use a Moto G4 Plus. This smartphone has an octa-core ARM CPU (A53), an Adreno 405 GPU, and a Hexagon 546 DSP along with 4 GB RAM. We will use BoofCV as a starting code base. A single-threaded Android implementation is provided by the author, Peter Abeles, which can be parallelized. Adreno and Hexagon SDKs will be used for writing native code kernels in the application.

A paper describing the tracking algorithm can be found here: Exploiting the Circulant Structure of Tracking-by-detection with Kernels

GOALS & DELIVERABLES

Plan to achieve: 

   Significant speedup over the single threaded implementation, given the number of execution resources.

Hope to achieve: 

   As we will be speeding up some core modules of the starter code, we hope that this might help us to speed up another algorithm in the application suite.

At the final project presentation, we intend to show the performance of the original and the optimized object trackers, along with speedup graphs.

PLATFORM CHOICES

We would be choosing a Qualcomm Snapdragon SoC based smartphone offering multiple ARM cores(each with NEON SIMD units),Adreno GPU, and Hexagon DSP.

The platform we have chosen enables us to experiment with DSPs as well as GPUs. This can help us determine the optimal mix of heterogenous resources that can be used to speedup this application.

SCHEDULE

Week 1 

Research about the project. Establish a baseline.  [DONE]
Week 2

Incorporate JNI in the application. Deploy OpenCl kernels for GPU computation. 
Week 3 

Work with DSP to offload some GPU calculations. Intermingle operations between DSP, CPU and GPU. 
Week 4

Experiment with different execution resource allocations and profile the application. If possible, extend to another tracking algorithm. 

CHECKPOINT UPDATE

Week 3 Part 1 

Incorporate JNI in application - Ashutosh; Profile Circulant Tracker Functions for identifying bottlenecks - Shreedutt 
Week 3 Part 2

OpenCl deployment - Ashutosh; Hexagon deployment - Shreedutt 
Week 4 Part 1 

OpenCl deployment - Shreedutt; Hexagon deployment - Ashutosh 
Week 4 Part 2 

Packaging and Incorporting "Nice to haves" 

SCHEDULE

We extracted the Circulant Tracker from the BoofCV suite and have created a desktop version for a benchmark. As this is Java code, we have ported it to Android for replicating the benchmark. This benchmark is currently single threaded. In this benchmark, a video is loaded and an object is tracked across all its frames. The time taken for tracking is measured and reported.

WORK DONE SO FAR

UPDATED GOALS AND DELIVERABLES

We should be able to provide at least the "plan to achieve" part of our proposal. We are running a bit behind schedule due to some intricacies involved in writing JNI compatible Android code.

At the final project presentation, we intend to show the performance of the original and the optimized object trackers, along with speedup graphs.

ISSUES

JNI is painful, and the code around it seems to break. Some documentation about the use of native code and code examples are related to the older Android SDKs. The newer Android SDKs have incorporated features which somehow render these examples useless. This increases the time we have to spend in debugging platform issues.

Final project report