Issue:
We are running inference on a diffusion model. In order to produce an output with a diffusion model, we need to start from a normally distributed input (using torch.randn). We are getting noise with hpu (and not with cpu and gpu). We also saw that if we compute the input iniitialization (randn) on cpu, we get the expected results when moving it then to hpu and persuing the calculations on hpu.
Our assumption is that randn might not produce a proper normal distribution.

Analysis:
import torch
x_hpu = torch.randn([2, 3, 256, 256], device=torch.device(“hpu”)).to(“cpu”)
x_cpu = torch.randn([2, 3, 256, 256], device=torch.device(“cpu”))

to get the histogram
import pandas as pd
data = pd.Series(x.flatten().tolist())
data.hist()
data.std()

to get the qq plot
import statsmodels.api as sm
import pylab
sm.qqplot(data, line=‘45’)
pylab.show()

The calculations of the expectations below might also show that we dont get what we expect for a normal distribution (see here: integration - Expectation of a Standard Normal Random Variable - Mathematics Stack Exchange )

Greg_S
June 2, 2022, 4:16pm
#2
Hi @estellea , to be clear, are you running Inference on Gaudi, correct? can you share the model to allow us to help reproduce this?

We made those changes to cfg_sample to enable hpu support

As well as :
commenting out “convert_weights” in CLIP/clip/model.py:429 to go full fp32

Hi @Greg_S ,

Yes exactly: inference on Gaudi

The model can be found here : GitHub - crowsonkb/v-diffusion-pytorch: v objective diffusion inference code for PyTorch.
Using this command line:
./cfg_sample.py “red apple”:5 -n 1 -bs 4 --seed 0 --hpu

You will see very different outputs if you run it on gpu/cpu versus hpu.
If you only change the function run_all in cfg_sample with :
x_cpu = torch.randn([n, 3, side_y, side_x], device=torch.device(“cpu”))
x = x_cpu.to(“hpu”)

you will see comparable results than on gpu/hpu

Greg_S
June 7, 2022, 2:09am
#5
thank you @estellea , we’re reviewing this now

Hi @estellea

Thanks for pointing out the issue. We will update here once its fixed.

In the meanwhile, maybe you can use the box muller transform to generate/simulate gaussian distribution from uniform distribution.

Sample code:

```
import torch
import matplotlib.pyplot as plt
import math
device = 'hpu'
if device == 'hpu':
from habana_frameworks.torch.utils.library_loader import load_habana_module
load_habana_module()
if False:
x = torch.randn([2, 3, 256, 256], device=torch.device("cpu"))
else:
u1 = torch.rand([2, 3, 256, 256]).to(device)
u2 = torch.rand([2, 3, 256, 256]).to(device)
z1 = torch.sqrt(-2 * torch.log(u1)) * torch.cos(2 * math.pi * u2)
x = z1
import pdb; pdb.set_trace()
x = x.to('cpu')
import pandas as pd
data = pd.Series(x.flatten().tolist())
ax = data.hist(bins=100)
data.std()
fig = ax.get_figure()
fig.savefig('test_hist.pdf')
import statsmodels.api as sm
import pylab
sm.qqplot(data, line='45')
plt.savefig('testplot.png')
```

I see this qq plot and histogram:

Let us know if this unblocks you.

Thanks

Thank you @Sayantan_S this is what we did

2 Likes

Hi!
Any update on this thread?

Thank you!
Estelle

@estellea

Could you please try the newly released 1.7 and see if it works for you

Attached here are the cpu vs hpu QQ plots for 1.7 release