1. Introduction
In the contemporary data-driven world, any organization could have access to data, but how well they make use of it to generate insights will dictate their market position. To take a competitive edge over rivalries, it is not just enough to have efficient transactional systems; it has become essential to analyze the historical data promptly and propose necessary actions for the business to take. Previous research has highlighted the significance of data-driven decisions and their potential to realize values over decisions based solely on opinions [
1]. Another research pointed out that understanding the data is key, and data science is the discipline that helps expand knowledge about available data and generate insights out of it. Furthermore, it stated that data science helps to attain data-driven decisions [
2].
Data science is an umbrella term for various in-depth studies that can be classified into three major areas: data analytics, data mining, and machine learning (ML) [
3]. The data science paradigm encompasses discrete roles and responsibilities, such as data engineers, data analysts, and data scientists, as well as external dependencies like product owners, project sponsors, business analysts, IT managers, c-level executives, etc. A person with a specific persona does not need to be an expert in other trades or have a cross-functional skillset. However, it would be advantageous if a person with an understanding of business requirements and objectives had access to a platform that allowed them to perform the basic operations of data science-related roles without prior coding, analytics, or data engineering experience.
On the other hand, Alsharef et al. emphasized that developing a ML model necessitates domain expertise and advanced ML programming skills [
4]. They highlighted the difficulties in finding trained ML experts in the market; hence, automatic ML is seen as an asset that bridges the gap between data science use-cases and a lack of appropriate ML resources [
4]. This is where the no-code/low-code ML platform comes into play. Before getting there, we first need to understand the evolution of ML and how we landed on advanced ML platforms with low or no code. Both low-code and no-code approaches aid in the rapid development of ML models, the automation of data pipelines, and the visualization of the findings. However, they differ greatly in terms of the type of audience willing to use this service. Developers can leverage existing building blocks and libraries while still have the flexibility to customize the task as required with the low-code approach. Conversely, no-code is primarily intended for domain experts with minimal to no prior software development knowledge [
5]. With the no-code approach, users can use drag-and-drop functionality to execute the desired task, with minimal to no flexibility to customize. We can categorize cloud-native and cloud-agnostic ML platforms as low-code platforms since they allow us to build custom ML models by writing code in ML platform-native notebooks.
When it comes to developing ML models using the AutoML service, specifically, we must categorize it as a no-code since we anticipate the ML platform to conduct all the tasks in the ML lifecycle automatically with very few inputs from its users initially. Low-code ML platforms can be used by different personas, including data scientists and ML developers. Added to this, no-code AutoML services can also be used by personas with strong business or data domain knowledge, such as data engineers, data analysts, business analysts or product owners. We can even form a cross-functional team comprised of all the aforementioned personas to create ML models using AutoML services; this, in the long-run, would yield multiple benefits in terms of saving time and money. Research has supported the importance of emerging low-code cloud data platforms and their vital role in the speed of digitalization [
6].
In this research, we examine similarities, differences, advantages, and limitations in leveraging some of the cloud-based low/no-code ML platforms. The Gartner Magic Quadrant published in 2020 for cloud-native AI developer service providers listed Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) as the top three leaders [
7]. Hence, we have chosen these cloud-based platforms and their cloud-native ML services for further research. When it comes to cloud-agnostic ML platforms, we have chosen Databricks for future investigation, as it is an enterprise-scale open-source unified data engineering, ML, and AI platform that is already integrated with all three cloud-native ML platforms listed [
8]. These ML platforms can handle the entire ML lifecycle. In the following topics, we have highlighted our findings by developing ML models without writing a single line of code using ML services available on the above-mentioned cloud platforms.
2. Problematization
Previous research has demonstrated the dire need to have a methodology for automatically selecting the optimum ML model and tuning the hyperparameters to improve ML model performance [
9]. Building a ML model is quite a laborious task. ML developers must try out multiple algorithms and tweak hyperparameters constantly to derive the best ML model for solving a given business problem. This requires not only a thorough understanding of developing ML models but also time-consuming and intensive computing for data processing. Luo also emphasized the importance of the skillsets required for building state-of-the-art ML models manually, i.e., by a human attendant. Even with higher competencies, we cannot reduce the time spent refining the model and its hyperparameters to derive the best results [
9]. While trying out different experiments to find the right model requires computation and substantial time, we may end up ramping up the computational resources as needed. The table below lists some of the ML algorithms and their corresponding hyperparameters.
From
Table 1, we can understand that there are multiple hyperparameters connected to each ML algorithm. For decision trees, the max_depth parameter defines how far the leaf nodes can be split; when the maximum value set for this parameter is reached, it will stop splitting the node any further; the min_impurity_split parameter defines the minimum impurity level that can be at max up to the value set for this parameter; the min_samples_leaf parameter defines the minimum number of samples for each leaf node to be formed; and the max_leaf_node parameter defines the maximum number of leaf nodes we can have.
For random forest, n_estimators define the number of decision trees to be generated, and max_features define the maximum number of features to be selected for each split.
For support vector machine, the kernel defines how the input data will be represented, the penalty value is a regularization constant, and the tol parameter defines the stopping criteria for the model when no significant improvements are noticed on two consecutive iterations of training the model [
3].
For k-nearest neighbor, n_neighbors define how many neighbors should be related, and the metric parameter defines the distance metric, for example, Euclidean distance.
For Naïve Bayes, the kernel density estimator defines the kind of data distribution to be considered, and window width is used for smoothing the kernel window size.
For stochastic gradient boosting, learning_rate defines how fast the ML model should learn and understand the pattern of the given data distribution; n_estimators define the number of trees or steps; subsample defines the subset of data to be considered; and max_depth defines the maximum depth of each tree.
For a neural network, we must find the ideal number of hidden layers, how many nodes should be present in each hidden layer, what would be the activation function, the number of epochs for trying out the maximum number of training iterations, and finally the learning rate [
9].
Previous research on tuning hyperparameters for deep learning models by implying different optimization techniques, symbolizes the complexity and expertise needed in transferring the previous learning to every new iteration of testing ML model performance [
10,
11]. The depiction in
Table 1 should just be considered a metaphor for the whole narrative, as we did not cover the entire array of algorithms or related hyperparameters. Another research highlighted that tuning hyperparameters is time-consuming [
12]. They also support the notion that finding optimal hyperparameter value for a ML model requires multiple iterations of testing.
Table 1 shows how complex it would be to select the best algorithm to address a business case by trying out different ML and ensemble models. This requires in-depth knowledge to address questions such as: which type of ML algorithm to use? How to configure the hyperparameters? How to evaluate the model? How to select the best model? How to deploy the model to a different endpoint? Such a list is not comprehensive; however, the list can go on and on based on the type of business case we are trying to achieve. Another important aspect is how fast these questions can be answered, because time is an important factor when considering market competitiveness. Additionally, to train and test multiple models, there is a need for scaling the computational resources. This is where cloud-based ML platforms come into the picture, which can address all the questions easily and in less time. Another advantage of a cloud-based ML platform is that we do not require to be masters of all trades; we could just have basic knowledge about data and business use cases and still be able to develop a classic ML model using the automatic ML features offered by different cloud vendors. Lastly, when the model is being trained, resources are scaled automatically in real-time based on the requirements. As highlighted by Bahri et al., the automatic ML service helps in choosing the best ML model and tuning hyperparameters through multiple iterations of testing and different combinations of values [
12].
5. Cloud-Based ML Platforms
Regarding cloud-based ML platforms, the three cloud vendors (AWS, GCP, and MS Azure) provide different services to address the use case of building an end-to-end ML model lifecycle without writing a single line of code and providing the least number of inputs. This helps business stakeholders who do not possess prior programming knowledge to be able to develop ML models easily.
In
Figure 1, we have depicted the ML architecture based on the Azure ecosystem. Azure Data Lake Generation 2 can act as a data warehouse for storing structured, semi-structured, and unstructured data. We can even store all types of data in Azure blob storage. However, if we intend to use Azure Synapse, then it is a prerequisite to use Azure Data Lake instead. Regarding data transformation requirements, Azure offers the Synapse service, which acts as a lakehouse. This means it has the capabilities to store the data in conjunction with Azure Data Lake and is yet able to query using transact-standard query language (T-SQL) upon the metadata of data stored. Azure Synapse also supports atomic, complete, isolated, and durable (ACID) transactions. Synapse also consists of different features, like data from different source connectors that can be ingested into Azure using linked services. Similarly, we can transform the data using some of the operations connectors in the pipeline. Synapse can invoke the Azure ML service for building manual or automatic ML models. Azure offers a ML service for building the ML model either using a pre-built model, through notebooks, or via the AutoML option. When the best model is built and evaluated, it is ready to be deployed at an endpoint. This is where the Azure container registry service comes into play; it takes care of containerizing the ML model and saving the container image, which can then be deployed using an orchestration service built by Azure, which is the Azure Kubernetes service. When the model is built and deployed on the endpoint, it will be continuously observed by Azure Monitor. Any significant change in the underlying trained dataset would trigger auto-training the ML model again. If the auto-trained model gets a low performance score, it is time to build a new or ensemble model. All the users registered with Azure Active Directory when they login to the Azure portal once will not be prompted again to access any other service which they have access to until they log out of the portal or time out due to being idle for a long time. Further, all the sensitive assets, like passwords, authentication, or access keys, can be stored in the Azure key vault. Only the users having access to the key vault can assess the secrets stored inside it [
13,
14,
15].
In
Figure 2, we have depicted the ML platform architecture in the AWS ecosystem. AWS supports different types of data from heterogeneous sources. Data should first be uploaded into the Amazon S3 bucket or Amazon EC2 instance. When the data is within the AWS premise, it can be transformed using Amazon SageMaker Studio. AWS has developed SageMaker as a unified ML platform that can handle the end-to-end ML lifecycle. If the requirement is to generate automated ML, then we can make use of SageMaker’s AutoPilot service. The AutoPilot takes care of training the model, evaluating the model, and choosing the best model based on the evaluation metrics score. When the best model is identified and tested, we can register it in the Amazon elastic container registry, then we will have a container image of our model and can be deployed to any endpoint. The Amazon Cloud Watch service is used to monitor AWS services. The Amazon single sign-on service is used to authenticate users to the AWS portal; once a user signs into the portal, they will not be prompted again to login when they access any of the AWS services to which they have access. The Amazon IAM service takes care of granting required privileges on resources to a role or a user [
14,
16,
17]. Researchers have explained the two phases of an AutoPilot job as candidate generation and candidate exploration. The candidate generation phase is responsible for splitting the dataset into train, test and validation, exploring the data distribution, and performing necessary pre-processing. The candidate exploration phase is responsible for tuning different hyperparameters and finding the right values based on model performance metrics.
In
Figure 3, we have depicted the ML platform architecture based on the GCP ecosystem. Like Azure and AWS, GCP supports all types of data. The prerequisite for generating a ML model is that we upload the data to Google Cloud Storage and create a dataset. Google has developed vertex AI for the unified ML platform experience. We can perform all the data transformation tasks from vertex AI, which consists of pipelines and operators for the same. For generating ML models automatically, we can make use of the Google AutoML service. The AutoML service trains, evaluates, and chooses the best model automatically without writing a single line of code. When the model is ready, we can register it with the Google container registry. Then the container image can be deployed to an endpoint using the orchestration service called Google Kubernetes. The Google monitoring service monitors all Google resources and triggers auto-healing when required. For identity and access management, we can use the cloud IAM service. For storing the secrets, we can use Hashicorp Vault integrated with GCP [
14,
16,
17,
19]. With AutoML in GCP, when it comes to image data, it could belong to any of the following categories: single label classification, multi-label classification, object detection, and segmentation. With tabular data, we can choose between either regression, classification, or prediction. For natural language processing (NLP) business cases and text data, we can choose between the following categories: single-label classification, multi-label classification, entity extraction, and sentiment analysis. For video-related data, we can choose between the following categories: action recognition, classification, and object tracking.
All the cloud-based ML platforms provide respective cloud-native security, monitoring and deployment solutions. In principle, they all support identity and access management for granting role-based access and have vault services for storing the secrets, access keys and certificates.
In
Table 3, we present a summary to compare between the three platforms (AWS, GCP, MS Azure).
7. Experimental Results
We have studied three cloud ML platforms in this research, and all our further findings are connected to these cloud platforms. In the following section, we present our findings from each cloud platform and will descriptively compare the results. One thing to keep in mind here is that our research is focused mainly on generating automatic ML (AutoML) models without writing a single line of code or with low-level code. Even though it is possible to use other data formats, we have chosen tabular data for further analysis for simplicity and practical reasons.
7.1. Dataset Description
We have downloaded the Melbourne housing dataset from Kaggle
1. The dataset consists of 21 features, of which 20 are descriptive and 1 is a target feature. The target feature is the property price. This dataset is not balanced and consists of missing values. We have not treated this dataset by performing manual pre-processing tasks or balancing the data distribution. We have ingested the raw dataset as-is into all three ML platforms and derived results. All the pre-processing tasks are automatically handled by the respective ML platforms.
7.2. Low-Code ML Platform Based on AWS
Amazon has developed a range of AutoML services to support different business use cases. Amazon has published an open-source AutoML library called AutoGluon. Through which developers can create ML models for tabular, text, and image data with just a few lines of code. Besides this, Amazon has developed an in-house fully managed ML service called Amazon Sagemaker. Sagemaker is an ML platform that can perform activities in the entire ML lifecycle, from ingesting the data to creating models to deploying the models to the desired endpoint.
In our case, we have used the Sagemaker autopilot service to create the automatic ML model. The prerequisite for using the autopilot service is to upload the data to an Amazon S3 bucket so the Sagemaker autopilot service can consume the data for further analysis. Creating a new ML project is called an experiment in Sagemaker’s terminology. We first uploaded our dataset to the Amazon S3 bucket. Then we created a new experiment, where we provided the experiment name, chose the S3 bucket name, and picked our dataset from the list. Then we must provide the target feature name; in our case, we have chosen the feature price. Then it is possible to deploy the best model automatically to the desired endpoint by enabling the auto-deployment feature and specifying the end-point name or leaving the default name. It is also possible to provide the output directory name, which must be present in the S3 bucket where all the autopilot output logs will be stored. The above-mentioned options are the basic setting options; they are enough to create an auto-ML model. However, we have the possibility to restrain the ML generation behavior by tuning/ tweaking the advanced setting option. We have the possibility to define the following vital options as part of advanced settings:
ML problem type: we can choose the following options: auto, binary classification, multi-class classification, and regression.
Experiment run type: we can choose between executing the whole experiment or copying the generated code into a notebook and executing the commands cell-wise.
Runtime: we can define how long the experiment can execute, how many maximum models it can generate, and the maximum time it can spend generating each model.
Access: we can restrict access to any IAM role.
Encryption: we can enable encryption for data present at the S3 bucket level.
Security: we can use a virtual private cloud connection if we desire to have a highly secure private connection.
When we initiate a new experiment through autopilot, it automatically takes care of the following tasks: pre-processing, candidate definition generation, feature engineering, model training, explainability report generation, insights report generation, and the option to deploy the model to the desired endpoint.
The AutoPilot job has generated different ML models and chosen the best model with the least mean squared error (MSE). This model is built on the XGBoost algorithm. It took about two hours to generate all the models and choose the best model from the pool. The best model has been automatically deployed to the endpoint specified. When we navigated to our model, it gave us richer information to understand the output results. It has provided details on the explainability of the model, performance metrics, artifacts, and endpoint. The AutoPilot job also generates the feature importance based on the best model.
We noticed that distance, type of property, and number of rooms are considered the most important features for this model. As part of the automatic model build, AutoPilot has automatically tested and tweaked hyperparameters to generate the best model.
There exists a list of artifacts generated from the AutoPilot job, which includes the input dataset, split of the training and validation sets, preprocessed training and validation sets, Python code for the feature engineering task, zipped folders consisting of all the feature engineering models, ML algorithm models, and other explainability artifacts. All the output data is stored inside the directory name that we specified earlier during experiment creation.
7.3. Low-Code ML Platform Based on GCP
GCP has a unified ML platform for all the ML-related business requirements, and Google has developed ‘Vertex AI’ as a single stop for addressing all the ML use cases. It allows users to leverage existing pre-built models, use AutoML to create the best ML model automatically, or build a custom ML model from scratch. There are advantages and disadvantages to either of these options. When it comes to the AutoML service, it supports data in different formats like tabular, text, speech, image, and video.
It is a prerequisite that we have our dataset within the Google Cloud for models to consume. Hence, the first step is to upload the dataset from the local machine to the Google Cloud. It is a mandate to create a dataset in vertex AI if we want to create a new model for the dataset. While creating the dataset, vertex AI can fetch the data from the local machine, Google Cloud, or a big query. However, it will create a dedicated directory within Google Cloud to store the dataset.
After creating the dataset, we can start training the model. While creating a training model, we must first choose the dataset that has been uploaded to GCP Storage. Based on the type of dataset, we will be given the option to choose the objective of the business problem. In our case, as we have tabular data, we are presented with regression or classification options, and we have chosen regression. Then we also have the option to choose whether the model should be created automatically without any interference from humans or use a pre-built model based on TensorFlow, Scikit-Learn, or XGBoost frameworks.
In the next step, we must provide a name for the new model, and we also have the option to either create a new model or retrain an existing model. Then we should also choose the target field; in our case, we have chosen the price feature. When it comes to splitting the data, Vertex AI provides us with three different ways to split the data. The first option is to choose the data for training, testing, and validation at random; the second option is to choose it manually; and the third option is to choose the data in chronological order: the first 80% would be assigned to training; and the next 10% would be assigned to validation; and the last 10% would be assigned to the test set.
In the next step, we have the possibility to define different training options, like changing the data type of a feature that is auto-detected or excluding a feature from further analysis.
We also have the option to adjust the weight of the dataset for all the features based on the weight of a particular feature in the dataset; if not, by default, equal weight shall be assigned by AutoML to balance the dataset. Then we have the possibility to optimize the training model based on RMSE, MSE, or RMSLE. RMSE can be chosen if we intend to give high importance to extreme values; MSE can be chosen if we intend to exclude extreme values as outliers; or RMSLE can be chosen if we intend to penalize error based on the relative weight.
As the last step, we have the option to choose the maximum node hours for training the model. The minimum hour that can be chosen is one, and we can choose a higher value based on our requirements. Based on the value, the model will be allowed to train by autoscaling the required computing resources. With this, we could train a new model or retrain an existing one. With the four steps mentioned above, we can create a new model and train it without writing a single line of code. Model training will be allowed to execute until the budget node time is specified, and then it will automatically be stopped; no intervention is required.
When the AutoML job has generated new ML models, we could see additional details like when the model training has started and until when it is allowed to execute, on which region compute resources were allocated for training the model, type of encryption key, dataset details, data split details, whether we have trained the model with custom-built or AutoML, and finally what type of problem we are trying to address, in our case a regression problem.
The trained model has also generated the feature importance matrix. As per feature importance, we noticed that region name, land size, distance, and type of property are considered the most important features in deciding the price of the property.
We have the option to export our model as a TensorFlow-saved model docker container. By creating the model as a container, we can deploy it elsewhere promptly. We can also directly deploy our model at any desired endpoint as we wish. When the model is deployed to an endpoint, we given the option to test our model from the respective endpoint without any need for manual testing, creating test strategies, or creating test cases. We also have the option to perform predictions in batches and store the results in the specified cloud storage directory.
7.4. Low-Code ML Platform Based on Microsoft Azure
Microsoft has developed many ML, AI, integration, and data services. Meanwhile, they also offer a unified ML platform experience through two of their major services: Azure Synapse and Azure Databricks. In general, Databricks is considered a cloud-agnostic data engineering and ML platform; however, Azure has incorporated an optimized version of Databricks and called it Azure Databricks. Azure Synapse is an end-to-end ML platform, through which we can perform entire ML life-cycle tasks, right from ingesting the data to analyzing the data to building the ML models by integrating it with Azure ML as a linked service, deploying the model to the desired end-point, monitoring the end-point and source datasets when there is any significant change in the source dataset, then retraining the model and evaluating it.
Azure Synapse offers a pipeline through which we can orchestrate the ML workflow as per the requirements. Azure Synapse depends on the Azure ML service when it comes to creating ML models, training them, and evaluating their performance. The Azure ML service offers AutoML features, through which we can create a new model or retrain an existing one without writing a single line of code. Azure Synapse requires the Azure data lake to store the data; it does not just serve as a data warehouse but also offers to query against the underlying data. Hence, Azure Synapse can be considered lakehouse architecture. However, the Azure ML service requires either a compute cluster or a compute instance to process the data and train the model. Later, it also requires a compute cluster or compute instance for explaining the model, as this task also requires computing power. If we have a heavy lifting task or a large dataset to analyze, then we can consider a compute cluster, which is a collection of interconnected nodes or instances. If we have a smaller dataset or a less resource-intensive task, we can consider a compute instance, a single node instance.
Before creating the automated ML model, we must create a persistent dataset in the Azure workspace blob storage by uploading the dataset from the local machine. Once the dataset is available in Azure blob storage, we can create an AutoML job. The first step is to choose the source dataset. Then provide a name for the experiment and choose the target feature name; in our case, we have chosen the feature price. We must also choose the compute type for creating and training the new model. We can choose between a compute cluster or compute instance as per the business requirement; if none exists, then we must create one by choosing different available sizes of pre-built compute instances and clusters. The next step is to choose the type of task, whether regression, classification, or time series forecasting. In our experimental case, it is a regression model. The last step is to select the validation type, where we can choose between auto or manual options. Then we have the option to choose how to split the data for the test set. We have three options to choose from: either we can provide our own test dataset, skip the test dataset, or provide the percentage of data that should be allocated to the test dataset. In our case, we have chosen 20% of the data to be allocated to the test dataset. The AutoML job has generated multiple models and chosen the best model based on the RMSE score.
The AutoML job has also generated the feature importance chart for each model created. We noticed that the region name, distance to the property, type of property, and number of bedrooms were chosen as the top four important features. Unlike GCP and AWS, Azure has consolidated the ML-related features into Azure ML Services and the data engineering-related features into Synapse. For basic data integration and transformation requirements, we can also use Azure Data Factory, which helps copy the data from source to target. Azure supports more than 80 source and target connectors.
7.5. Low-Code ML Platform Based on Databricks
Databricks supports data engineering, ML, and SQL querying tasks. Hence, it can be considered a complete lakehouse platform. Databricks is powered by a Spark engine for data processing. It is cloud-agnostic, as we have the freedom to choose the data residency of our choice. Data can be stored and hosted on any of the cloud service providers ecosystems (AWS, GCP, or Azure). It is a prerequisite to mount the storage on any of the cloud platforms and create a Databricks cluster for computing before trying to ingest the data. When the prerequisites are met, we can easily ingest the data by either providing the dataset’s path from the filestore or dragging and dropping the file from the local system. Once the dataset is uploaded to Databricks, we have the possibility to perform different actions with it. For example, we can create an AutoML job with the given dataset to create a ML model, or we can create a table from the dataset and explore the data by executing a Spark or SQL query against the respective table. In our case, we have chosen an AWS S3 bucket as a data storage area for our Databricks AutoML experiments.
Configuring the AutoML experiment is smooth with Databricks; we must provision a Databricks cluster for computing, choose the type of problem, choose the dataset, provide the target class (in our case, its price), and name the experiment to keep track of it. Databricks takes care of imputing the missing data if we leave the default Auto option. Apart from the basic details, we can choose the evaluation metric, whether it should be accuracy, F1 score, log loss, precision, or received operating characteristic (ROC)/ area under the curve (AUC). In our case, we left the default value, which is F1 Score. Then we can choose between three different training frameworks recommended by Databricks: LightGBM, Scikit-learn, and XGBoost. We have chosen all three frameworks for better comparison. We can set the timeout period for how long the experiment should run. Then it is a good idea to provide a time feature value from the dataset, which should be of the date-time datatype; in our case, we have chosen the date feature. Databricks uses the time feature to split the data into training, testing, and validation sets. We can also provide the data storage location for storing the experiment results in the persistent storage area.
When the AutoML job ends, we have the option to view the Python code generated by the Databricks AutoML job for each model in depth by either opening the notebook for the respective model or viewing the data exploration notebook to understand the different data exploratory actions performed by the AutoML job. Models are sorted based on the test_r2 score in descending order, starting from the best model to the model that performed less well.
If we want to get more details about a particular auto-generated ML model, then we can simply click on the hyperlink; it will take us to a separate page consisting of end-to-end details about the model description, parameters provided, evaluation metrics considered, and all the artifacts generated during the model creation. It is also possible to register the model from this page, so it can be exposed to the outside world as a Rest API endpoint. We have noticed there are 124 hyperparameters set by the AutoML job; these parameters are constantly tuned by testing different values by generating different models and validating the results against evaluation metrics. In our case, the AutoML job generated more than 100 models within 60 minutes. Besides, for each model evaluation metric, artifacts required for deployment and inference were accessible both from the Databricks user interface and in the AWS S3 bucket.
We noticed that all the Databricks-related artifacts are available in the persistent Amazon S3 bucket storage. To deploy the model to an endpoint or containerize it, we can use the artifacts generated as an AutoML job. We can use the model file in conjunction with the dependent pickle file to deploy the model to any endpoint, the conda file to install the necessary libraries, and the Python environment requirement files to install the necessary Python libraries. Detailed model inference is found by opening the model notebook. This notebook briefly describes importing the required libraries, ingesting data, pre-processing, splitting the data into training/ test and validation sets, training the model, generating feature importance, and evaluating the model against different performance metrics.
7.6. Comparing Models Based on Performance Metrics
All the auto-generated ML models across GCP, AWS, Azure, and Databricks have been evaluated automatically using different evaluation metrics like MAE, RMSE, and R2. However, for this comparison, we have only considered the R2 metric, as it indicates how the variance in the independent variable explains the difference in the dependent variable. It is also commonly referred to as the coefficient of determination, which simply determines the variance between the predicted value and the plotted regression line. As we intend to predict the price of the property, we are interested in knowing how the variation in the response variables affects the price of the property. Hence, we have chosen R2 as a performance metric for comparison.
For the Melbourne housing dataset, the model performance score for the best ML models generated from GCP, AWS, Azure, and the Databricks platform is depicted in
Table 4. As per the results, we could see that Azure has the highest R
2 score. However, we could not conclude that one ML platform is better than others based on this metric alone since several other metrics need to be considered, like visibility of AutoML jobs operation, traceability of AutoML job logs, complexity involved in providing inputs for AutoML job, AutoML job elapsed time, customizability, support for co-authoring, how well AutoML can be explained, and how easy it is to deploy a generated ML model. These criteria are briefly explained in the following section.
7.7. Comparison between ML Platforms
We have created and trained automatic ML models on all three cloud platforms using the respective AutoML services. We noticed similarities and differences in how AutoML services work on each cloud; see
Table 5 below.
In
Table 4, we highlighted the similarities between generating an AutoML model and the three cloud ML platforms. It is a prerequisite for all three ML platforms to upload the data to cloud storage and create a dataset. The best model generated through AutoML jobs from different cloud ML platforms has resulted in very similar feature-importance results. Added to that, model evaluation metrics are similar between the three ML platforms. There could also be some additional metrics, but the three cloud ML platforms have considered commonly used evaluation metrics such as MAE, MSE, RMSE, and R
2. On the other hand, the differences are reported in the table below.
In
Table 6, we highlighted some of the differences that we found while generating the AutoML model from the three cloud-based ML platforms. Regarding the traceability aspect, AWS has created all the output logs quite neatly under a directory name, as we provided inside the Amazon S3 bucket. It was quite easy for us to consume the performance logs, Python code for different models, and hyper-parameters tuned from the directory. However, from our experience with GCP and Azure, we found output logs were available in their respective portals; we must manually download them to our local machine if required. Due to the complexity of creating the AutoML model, we found that both AWS and Azure have very few touchpoints for creating the AutoML model and are less time-consuming. However, with GCP, we must provide additional inputs, and it is time-consuming.
Regarding customizing the AutoML job, we found that AWS offered the highest level of customization in comparison to both GCP and Azure. When it comes to the co-authoring feature, where more than one developer can be involved in the ML model development, all three cloud vendors offer this feature. However, only AWS and GCP offer co-authoring with shared compute resources; with Azure, each developer would require dedicated compute resources. When it comes to the explainability of the model, AWS has automatically explained the model without any intervention from our end. With GCP, we can explain the model with a button click. However, with Azure, it is more than a button click, as we have allocated a separate compute instance or cluster for explaining the model. When it comes to deployment, models can be deployed to both internal and external endpoints with GCP and Azure; yet, models can only be deployed to the internal endpoint with Amazon.