Hello,
I am looking for published performance data (latency in miliseconds) for Goya inference processing with a VGG16 CNN network.
Specifically, layer-by-layer latency when executing inference with the VGG16 model, using the ImageNet dataset ( or other similar dataset ).
I am looking for latency data (start of inference processing by a layer to end of processing by the same layer) listed for each layer : for example
CONV1 layer - x1 mili-sec
CONV2 layer - x2 mili-sec
…
Fully_connected FC6 layer - y_fc6 mili-sec
Fully_connected FC7 layer - y_fc7 mili-sec
Fully_connected FC8 layer - y_fc8 mili-sec
these are the layers I’m interested in. I have a VLSI hardware background and I’m familiar with (multi-cycle) hardware pipeline stages, with start/done processing flags per stage; these start/done flags allow for easy and accurate hardware latency measurements per stage. Intuitively, similar start/done flags for each DNN layer can be used to profile inferencing latency per layer. Perhaps the Goya accelerator has such start/done flags and they have been used by software applications to extract layer-by-layer inference latency ?
I’m aware of these published benchmarks :
habana_labs_goya_whitepaper.pdf
for a SSD300 vgg16 model (topology) in Table 1. Only one value of 1.1 msec is listed for the entire model with Mxnet (Framework) and batch size of 1. Is there a more detailed publication with layer-by-layer breakdown of this 1.1 msec value ?
thank you,
Nick Iliev, Ph.D.
Research Associate
ECE AEON lab
UIC