This work proposes a novel distributed deep learning model for Remote Sensing (RS) images super-resolution. High Performance Computing (HPC) systems with GPUs are used to accelerate the learning of the unknown low to high resolution mapping from large volumes of Sentinel-2 data. The proposed deep learning model is based on self-attention mechanism and residual learning. The results demonstrate that stateof- the-art performance can be achieved by keeping the size of the model relatively small. Synchronous data parallelism is applied to scale up the training process without severe performance loss. Distributed training is thus shown to speed up learning substantially while keeping performance intact.