Rapid increase in digital data coupled with advances in deep learning algorithms is opening unprecedented opportunities for incorporating multiple data sources for modeling spatial dynamics of human infectious diseases. We used Convolutional Neural Networks (CNN) in conjunction with satellite imagery-based urban housing and socio-economic data to predict disease density in a developing country setting. We explored both single (uni) and multiple input (multimodality) network architectures for this purpose. We achieved maximum test set accuracy of 81.6 per cent using a single input CNN model built with one convolutional layer and trained using housing image data. However, this fairly good performance was biased in favor of specific disease density classes due to an unbalanced data set despite our use of methods to address the problem. These results suggest CNN are promising for modeling spatial dynamics of human infectious diseases, especially in a developing country setting. Urban housing signals extracted from satellite imagery seem suitable for this purpose, under the same context.