High-Level Synthesis of Neural Networks on FPGAs

Christophe Dubach, Christof Schlaak, Tzung-Han Juang, Hamza Javed, Ayan Chakraborty, Jiaxuan Cai, Andrej Ivanis, Martin Kristien

Sep 4, 2016

Many modern applications that perform classification, prediction or clustering employ Neural Networks (NN) for these tasks. They are often used in datacenters or mobile devices, where high performance and energy-efficiency is crucial. Due to their parallelity, GPUs bring a big performance improvement (in comparison to CPUs) for these applications. Nevertheless, they still have a limitation in their fixed architecture. Especially, because the field is changing quickly and fast innovation must be made possible.

Field-Programmable Gate Arrays (FPGAs) offer more flexibility and can specialise the architecture design to the specific neural network application in order to archive the best performance. Furthermore they are highly energy-efficient and are therefore well suited for NNs in embedded systems or large scale server clusters.

However, deploying neural networks on FPGAs (like programming parallel accelerators in general), with focus on high performance, is a complex step. Due to their flexible architecture, FPGAs allow for so many options for tweaking the performance but in return require a lot of hardware specific expertise and take costly development time to be configured properly. Rather than manual development, automation should take place here to accelerate the development process and also drive down costs. Furthermore, developers do not want to manually adapt their implementations for various accelerators (e.g. CPU, GPU, FPGA). Instead, performance portability is desirable.

Lift addresses these challenges by offering a high-level functional, data-parallel language, which allows the user to efficiently develop an application independently of the target hardware platform. Then, rewrite rules in the Lift compiler open a vast design space of possible implementations for this abstract system specification. This design space is explored to find a suitable solution, which satisfies the performance and energy requirements. In order to exploit the parallel structure of NNs, the compiler employs pipelining mechanisms and allocates distributed on-chip memory on the FPGA. Timing behaviour and scheduling is introduced until finally a Hardware description language (HDL) code is emitted, that can be used to generate the bitstream for the FPGA.

Supported by:

[CIFAR AI Chair](https://cifar.ca/ai/canada-cifar-ai-chairs/) — CIFAR AI Chair

[NSERC](https://www.nserc-crsng.gc.ca/) — NSERC

Christophe Dubach

Associate Professor
Canada CIFAR AI Chair, Mila

My research interests include data-prallel language design and implementation, high-level code generation and optimisation for parallel hardware (e.g. GPU, FPGAs), architecture design space exploration, and the use of machine-learning techniques applied to all these topics.

High-Level Synthesis of Neural Networks on FPGAs

Christophe Dubach

Associate Professor
Canada CIFAR AI Chair, Mila

Christof Schlaak

PhD student (Edinburgh University)

Tzung-Han Juang

PhD student (McGill University)

Hamza Javed

PhD student (McGill University)

Ayan Chakraborty

Summer 2021 McGill intern (visiting from IIT Kharagpur)

Jiaxuan Cai

Summer 2021 McGill intern (visiting from Chongqing University)

Andrej Ivanis

MSc student 2018-2019 (Edinburgh University)

Martin Kristien

BSc student 2016-2017 (Edinburgh University)

High-Level Synthesis of Neural Networks on FPGAs

Christophe Dubach

Associate Professor Canada CIFAR AI Chair, Mila

Christof Schlaak

PhD student (Edinburgh University)

Tzung-Han Juang

PhD student (McGill University)

Hamza Javed

PhD student (McGill University)

Ayan Chakraborty

Summer 2021 McGill intern (visiting from IIT Kharagpur)

Jiaxuan Cai

Summer 2021 McGill intern (visiting from Chongqing University)

Andrej Ivanis

MSc student 2018-2019 (Edinburgh University)

Martin Kristien

BSc student 2016-2017 (Edinburgh University)

Associate Professor
Canada CIFAR AI Chair, Mila