Algorithm Optimization on Embedded & Server GPU

[Available]

Introduction

NVIDIA TensorRT5.1™ is an Inference Optimizer and runtime that allows low latency and high throughput for Deep-Learning applications. TensorRT can be used to optimize, validate and deploy neural networks both on hyperscale data centers as well as for embedded or automotive platforms.

For this study Addfor S.p.A. will provide advanced technical support and computing platforms available at the moment:

Training Platforms:  TITAN RTX, RTX2080 Ti and IBM PowerAI NVIDIA V100servers.

Inference Platforms: NVIDIA JETSON NANO, Intel Movidius and Google Coral Dev Board

Planned Activities

When Optimizing a Deep Learning model for inference there are some “tricks” to apply to make the model smaller and faster: lowering the floating-point precision from Float32 to Float16, or Int8, doing a “pruning” of the model by removing the less important features. Some layers (convolution, bias, and ReLu operations) can be merged together.

This work can be done manually, but today there are tools like NVIDIA TensorRT5.1™ that can automate most of the job. Not every frameworks is supported, however: this means that not all deep learning models can be directly optimized by TensorRT.

The goal of this thesis will be first and foremost to elaborate a workflow for the optimization of the Object Detection and Segmentation models (for specific Embedded processors such as Pascal, Volta and Xavier GPU). In the second phase of the work a subset of models will be optimized and speed and precision loss  will be benchmarked.

Who we’re looking for

Students that are about to get their master degree in: Computer Science, Mathematics, Physics

Skills: Python, C/C++, CUDA C, Numerical Quantization Methods, Finite Precision Math, preferred knowledge of  TensorFlow / TensorRT5.1 / NVIDIA DIGITS

Duration of this Projects: 6-8 months

Check these links before moving on

Contact Us

Directly by email to: [email protected]

By LinkedIn: linkedin.com/in/cannavò-sonia-66a95467