Traditional chemical research often relies on trial-and-error synthesis and characterization methods. Now, modern machine learning (ML) offers data-driven approaches for predicting properties, like water solubility, directly from chemical structure. But with various data representation schemes for molecular structure and model approaches to select from, it can be difficult for non-experts to determine best practices for utilizing ML. To clarify this landscape of choices, this study uses the ESOL molecular solubility dataset to compare the performance of a selection of different models on different data representations. First, we compare three classical regression methods (linear, ridge, LASSO) on three common data representations (RDKit fingerprint, Morgan fingerprint, intuition-selected molecular features). Then, we demonstrate how two distinct deep learning approaches (multilayer perceptron, graph convolution) can achieve accurate predictions even when prior intuition about feature-property correlations are absent. Finally, we outline a modern attention-based approach, inspired by successes in language modeling and fine-tuned by Bayesian optimization, to achieve a prediction methodology that is more general and performant than the previous approaches. This attention-based approach operates directly on the common SMILES string molecular representation, without requiring as many model parameters as other deep learning approaches or preprocessing into a representative fingerprint or vector of intuitively selected features. In short, when selected molecular features are known to likely correlate with a feature of interest, it may be possible to achieve good predictive modeling without turning to massive deep learning approaches. When particular features are not known a priori, graph approaches are a common solution, and we further demonstrate how a modern hyperparameter optimized attention approach can perform even better.