Shreedutt C. Hegde
Hardware and FPGA system design
15618 Project
Optimizing BoofCV object tracker on a smartphone
Final project report
SUMMARY
We are going to optimize the performance of an open source object tracker on a smartphone. The aim of the project is to efficiently use all the heterogenous execution resources that a smart-phone provides in order to boost the performance of the circulant object tracking algorithm.
BACKGROUND
Object tracking in a video is a compute intensive task. Successive frames of the video have to be compared in order to determine the motion of the object and track it across frames. Inherently, an axis of parallelism exists across the pixels in a frame. This axis can be exploited in order to speed-up the performance of object tracking. Another axis of parallelism can also be present across groups of frames.
The tracker that we are proposing to optimize is not scale invariant. Its performance on a single core degrades with larger frames. We aim to optimize it and measure the performance improvement due to the use of various heterogenous execution resources made available on a modern smart-phone. Smartphones these days come equipped with multi-core CPUs, GPUs, multi-core DSPs and other ASICs. These execution resources would help us to exploit the parallelism in this application.
THE CHALLENGE
-
When compared to a multicore PC equipped with a GPU, a smartphone has lower performance execution resources. This implies that applications that would perform well on a computer would fare much worse on a smartphone. Also, smartphones typically have smaller memories, and also have a lot of background processes which reduce the effective availability of the resources.
-
Since we are using heterogenous computing resources, the speed of execution in the different pipelines differs. One challenge would be to make scheduling decisions so that all the resources are busy and and provide the maximum throughput at low latencies. Also, the interconnects on a smartphone might force the application to be bandwidth-bound
RESOURCES
Intially we plan to use a Moto G4 Plus. This smartphone has an octa-core ARM CPU (A53), an Adreno 405 GPU, and a Hexagon 546 DSP along with 4 GB RAM. We will use BoofCV as a starting code base. A single-threaded Android implementation is provided by the author, Peter Abeles, which can be parallelized. Adreno and Hexagon SDKs will be used for writing native code kernels in the application.
A paper describing the tracking algorithm can be found here: Exploiting the Circulant Structure of Tracking-by-detection with Kernels
GOALS & DELIVERABLES
Plan to achieve:
Significant speedup over the single threaded implementation, given the number of execution resources.
Hope to achieve:
As we will be speeding up some core modules of the starter code, we hope that this might help us to speed up another algorithm in the application suite.
​
At the final project presentation, we intend to show the performance of the original and the optimized object trackers, along with speedup graphs.
PLATFORM CHOICES
We would be choosing a Qualcomm Snapdragon SoC based smartphone offering multiple ARM cores(each with NEON SIMD units),Adreno GPU, and Hexagon DSP.
The platform we have chosen enables us to experiment with DSPs as well as GPUs. This can help us determine the optimal mix of heterogenous resources that can be used to speedup this application.
SCHEDULE
Week 1
Research about the project. Establish a baseline. [DONE]
Week 2
Incorporate JNI in the application. Deploy OpenCl kernels for GPU computation.
Week 3
Work with DSP to offload some GPU calculations. Intermingle operations between DSP, CPU and GPU.
Week 4
Experiment with different execution resource allocations and profile the application. If possible, extend to another tracking algorithm.
CHECKPOINT UPDATE
SCHEDULE
Week 3 Part 1
Incorporate JNI in application - Ashutosh; Profile Circulant Tracker Functions for identifying bottlenecks - Shreedutt
Week 3 Part 2
OpenCl deployment - Ashutosh; Hexagon deployment - Shreedutt
Week 4 Part 1
OpenCl deployment - Shreedutt; Hexagon deployment - Ashutosh
Week 4 Part 2
Packaging and Incorporting "Nice to haves"
WORK DONE SO FAR
We extracted the Circulant Tracker from the BoofCV suite and have created a desktop version for a benchmark. As this is Java code, we have ported it to Android for replicating the benchmark. This benchmark is currently single threaded. In this benchmark, a video is loaded and an object is tracked across all its frames. The time taken for tracking is measured and reported.
UPDATED GOALS AND DELIVERABLES
We should be able to provide at least the "plan to achieve" part of our proposal. We are running a bit behind schedule due to some intricacies involved in writing JNI compatible Android code.
At the final project presentation, we intend to show the performance of the original and the optimized object trackers, along with speedup graphs.
ISSUES
JNI is painful, and the code around it seems to break. Some documentation about the use of native code and code examples are related to the older Android SDKs. The newer Android SDKs have incorporated features which somehow render these examples useless. This increases the time we have to spend in debugging platform issues.