The horovod doc you refer to is based off the original horovod doc from here.
The scaling of learning rate might be a heuristic, and some training might converge with original learning rate. In this particular case we use LAMB, which is designed for large batch training. The original horovod doc advice of scaling learning rates might be more applicable for simpler optimizers like SGD.
Regarding broadcasting of variables I think we initialize from a checkpoint, hence we might be skipping broadcast.
Horovod: broadcast initial variable states from rank 0 to all other processes. This is necessary to ensure consistent initialization of all workers when training is started with random weights or restored from a checkpoint.