Performance data (latency) for VGG16 layer-by-layer inference with Goya

niliev · June 7, 2021, 5:22pm

Hello,

I am looking for published performance data (latency in miliseconds) for Goya inference processing with a VGG16 CNN network.
Specifically, layer-by-layer latency when executing inference with the VGG16 model, using the ImageNet dataset ( or other similar dataset ).

I am looking for latency data (start of inference processing by a layer to end of processing by the same layer) listed for each layer : for example

CONV1 layer - x1 mili-sec
CONV2 layer - x2 mili-sec
…
Fully_connected FC6 layer - y_fc6 mili-sec
Fully_connected FC7 layer - y_fc7 mili-sec
Fully_connected FC8 layer - y_fc8 mili-sec

these are the layers I’m interested in. I have a VLSI hardware background and I’m familiar with (multi-cycle) hardware pipeline stages, with start/done processing flags per stage; these start/done flags allow for easy and accurate hardware latency measurements per stage. Intuitively, similar start/done flags for each DNN layer can be used to profile inferencing latency per layer. Perhaps the Goya accelerator has such start/done flags and they have been used by software applications to extract layer-by-layer inference latency ?

I’m aware of these published benchmarks :

habana_labs_goya_whitepaper.pdf

for a SSD300 vgg16 model (topology) in Table 1. Only one value of 1.1 msec is listed for the entire model with Mxnet (Framework) and batch size of 1. Is there a more detailed publication with layer-by-layer breakdown of this 1.1 msec value ?

thank you,
Nick Iliev, Ph.D.
Research Associate
ECE AEON lab
UIC

Greg_S · June 7, 2021, 10:15pm

Hello @niliev, thanks for posting this question. We have received your question regarding the details of the VGG16 model performance and are discussing internally. We’ll get back to you soon with an update.

niliev · July 1, 2021, 12:20am

Hi,

Perhaps there’s already some published results on VGG16 inference in IEEE, ACM (or other) journals or conferences ?

I found this 2020 article on IEEE Xplore :

but there’s no VGG16 layer-by-layer inference analysis in it.

Greg_S · August 4, 2021, 7:02am

Hello @niliev, unfortunately, Habana does not have that level of detail publicly available. We’re sorry it took this time to get an answer back to you.

Thanks, Greg

Topic		Replies	Views
SynapseAI 1.15.0 Release Announcements	0	345	April 17, 2024
Gaudi2 slower compared to A100 Training	10	650	June 7, 2023
SynapseAI 1.8.0 Release Announcements	0	560	February 9, 2023
SynapseAI 1.6.0 Release Announcements	0	633	August 15, 2022
Current best inference server implementation for Gaudi2 Inference models , performance , pytorch	3	450	January 2, 2025

Performance data (latency) for VGG16 layer-by-layer inference with Goya

Related topics