Soybean yield prediction is a challenging problem in plant breeding that is often affected by many different factors simultaneously. Hyperspectral reflectance data from plants and soil data provide breeders with useful information about soybean plant health and using these different types of data to predict yield is an active area of research. Furthermore, breeding programs encounter challenges such as data imbalance and external factors like genotype variability across different environments, which present significant hurdles in the development of yield prediction models for large-scale breeding programs. In this work, we perform a comprehensive study of predicting yield using both hyperspectral reflectance and soil data to understand what scenario's offer the best chances of predicting yield with high accuracy. We demonstrate a cluster based ensemble approach for yield prediction using hyperspectral reflectance data that can perform well for large scale breeding programs by efficiently harnessing useful information from data through an unsupervised approach.