_____ _______ _
(____ \ (_______) | |
_ \ \ ____ ____ ____ _____ ____ ____ ____| | _
| | | / _ ) _ ) _ \ | ___) ___) _ |/ ___) | / )
| |__/ ( (/ ( (/ /| | | | | | | | ( ( | ( (___| |< (
|_____/ \____)____) ||_/ |_| |_| \_||_|\____)_| \_)
|_|
DeepFrack is a novel framework developed for enhancing energy efficiency and reducing latency in deep learning workloads executed on hardware accelerators. By optimally fusing layers and implementing an asymmetric tiling strategy, DeepFrack addresses the limitations of traditional layer-by-layer scheduling. The DeepFrack project aims to build succinct and simple framework that contributes to the ongoing efforts in designing more efficient hardware accelerators for machine learning workloads.
DeepFrack is wrapper to Timeloop written in python.
This document is to serve as a guide to users for installing and using DeepFrack. This document (along with the paper) will also provide any user with additional knowledge required to go through the complete source code of DeepFrack and suggest changes. For any queries please contact Mithil Pechimuthu (pechimuthumithil@iitgn.ac.in).
The DeepFrack project consits of three modules that together make the final working tool. These consist of:
- Tom Glint Issac
- Joycee Mekie
- Mithil Pechimuthu
The files in this repository are suffecient for deploying the tool.
DeepFrack relies on Timeloop and Accelergy for it's cost metrics. Hence it is absolutely necessary for the user to have timeloop installed for running DeepFrack. One can find the the installation procedure and other useful information regarding the dependencies in the Timeloop documentation.
The source code for DeepFrack is in python. Hence python version >= 3.0.0 along with libraries like numpy must be present.
These modules are to be executed in the given sequence.
The Benchmarker is the module of DeepFrack that uses timeloop to generate costs (total energy by default) for each step taken by a point in the design sapce. The benchmarker takes the inputs provided to it and returns costs in a folder of .json files. The examples will make it more clear.
The Bencmarker must be provided with the following YAML files.
-
Timeloop mapper YAML file.
-
YAML file to describe the architecture of the accelerator.
-
YAML file that describes the mapping constraints for timeloop.
-
A folder with seven YAML files correspoding to the variaous data flow that are possible. For example, layer by layer scheduling is a type of dataflow, only I/O cached scheduling is also another type of dataflow. The folowing table sumarizes the various types of dataflow that are possible.
The output from the hardware benchmarker is:
- Seven dictionaries stored in seven separate json files. The Dictionary struncture is as follows:
{Layer number:{Tile Dimension: Energy Value,...},...}
- We aslo store the log files from timeloop in separate folders that can be later used for obtaining a deep analysis of DeepFrack's mapping.
The directory structure should look something like this.

The hardware benchmarking may take some time (~hours) as we are try to qunatize a cost value for every possible step that DeepFrack may take.
Inputs to DeepFrack are gives as paths to the following files/folders
- The folder that contains the yaml files of every layer in the workload.
- The folder containing all the .json files that store the costs generated by the Hardware Benchmarker.
- Output folder name where the overview statistics file along with comparison image will be stored.
DeepFrack will output a final log file that will show the optimal tiling, and fusion along with the total energy consumed by fusion over layer by layer scheduling. This file consists of suffecient data to describe the mapping conditions chosen. Moreover one can obtain the exact mapping on the architecture done by timeloop is present in the log files folder populated by the Hardware Benchmarker. The Statistics file also displays the way the tiles are going to placed and the order in whixh they will be computed.
The log file will look like the following.

The DeepFrack core may also take time (~hours) proportional to the size of the search space.
This takes as input the outputs from the previous two modules. These include:
- The optimal partition obtained (from the log file from the DeepFrack Core)
- The optimal weight caching pattern (from the log file from the DeepFrack Core)
- The log files generated form the Heardware Benchmarer module.
The output will include a csv file that contains the per-component detail of the optimal mapping found by the DeepFrack modules.
The time consumed to perform this is almost neglegible.
- Incorporate residual networks
- Make faster search by better pruning of search space.
- Results on inception networks.
- Reference material for hardware designers to develop fused layer scheduling favourable hardware accelerators.
Note for NanoDC: The source code in this repository support CNNs. <-- Containers: GeminiBenchMarker2, MithilDeepFrackTesting
TODO:
- The colouring algorthm that baically tiles a sqaure with tiles from a tile list is ery slow for some reason for medium size squares. Need to look into it to optimize it or parallelize it.
- Update with DOSA and COSA models insetead of Timeloop.

