Databricks and RStudio announced on Wednesday a new release of MLflow, an open source multi-cloud framework for the machine learning lifecycle, now with R integration. RStudio has partnered with Databricks to develop an R API for MLflow v0.7.0 which was showcased at Spark + AI Summit Europe.
This consolidation adds to features that have already been released, making MLflow a comprehensive open source machine learning platform, with support for multiple programming languages, integrations with machine learning libraries, and support for multiple clouds.
Since launching MLflow, community engagement and contributions have led to an impressive array of new features and integrations that have been released, including support for multiple programming languages to give developers a choice, in addition to R, MLflow supports Python, Java and Scala; as well as a REST server interface which can be used from any language.
MLflow has also built-in integrations with various machine learning libraries including scikit-learn, TensorFlow, Keras, PyTorch, H2O, and Apache Spark MLlib to help teams build, test, and deploy machine learning applications. Organizations can use MLflow to deploy machine learning models to multiple cloud services, including Databricks, Azure Machine Learning, and Amazon SageMaker based on their needs. MLflow leverages AWS S3, Google Cloud Storage, and Azure Data Lake Storage allowing teams to track and share artifacts from their code.
Databricks provides MLflow as a managed service, and early adopters are experiencing increased efficiency across the machine learning lifecycle. By leveraging MLflow within Databricks’ Unified Analytics Platform, users can easily initiate runs from their on-premises environment or from Databricks notebooks.
MLflow’s integration with Databricks Delta enables data science teams to track the large-scale data that fed the models along with all the other model parameters then reliably reproduce training runs. By integrating MLflow as part of its Unified Analytics Platform, Databricks is bringing the overall benefits of one common security model to the entire machine learning lifecycle.
Prior to MLflow, the industry did not have a standard process or end-to-end infrastructure to develop and productionize machine learning applications in a simple and consistent way. With MLflow, organizations can package their code as reproducible runs, execute and compare hundreds of parallel experiments, leverage any hardware or software platform for training, tuning, hyperparameter search and more.
Additionally, organizations can deploy and manage models in production on a variety of clouds and serving platforms. As a testament to MLflow’s design to be an open platform, RStudio’s contribution extends the MLflow platform to the large community of data scientists who use RStudio and R programming language.
“In many organizations machine learning workflows are far too ad-hoc, with no systematic tracking of experiments, inadequate protocols around reproducibility, and no consistent way to package and deploy models. MLflow helps address these issues in a uniform fashion across languages and frameworks,” said JJ Allaire, chief executive officer at RStudio. “Integration of R with MLflow will significantly broaden the reach of the project by allowing a broader community to use and contribute to MLflow.”
With MLflow, data science teams can systematically package and reuse models across frameworks, track and share experiments locally or in the cloud, and deploy models virtually anywhere,” according to Matei Zaharia, chief technologist at Databricks. “The flurry of interest and contributions we’ve seen from the data science community validates the need for an open source framework to streamline the machine learning lifecycle.”