, , , ,

From Ledgers to Intelligence Part 15: AI and ML Enter the Stack Feature Stores, MLflow, and Analytics That Learn

Digital Transformation | June 2026

For the first fifteen years of the data warehouse era, analytics and machine learning were separate disciplines practised by separate teams on separate infrastructure. Data analysts built reports. Data scientists built models. The two groups shared some data sources but otherwise operated independently and the practical gap between them was significant. A data scientist’s model that worked in a Jupyter notebook might take months to reach production; the features it needed might not exist in the data warehouse; the predictions it produced might not appear in the BI dashboard where decisions were made.

From 2020 onward, the boundary between analytics and ML engineering dissolved. The same cloud data warehouse infrastructure that served BI queries began serving ML training jobs. The same data engineering practices version control, automated testing, CI/CD that had transformed the analytics engineer’s workflow spread to ML pipelines. The result was a unified data and AI stack in which predictive models were first-class components of the analytics architecture not separate experiments, but production-grade analytical artifacts.

Machine learning infrastructure — the compute and data architecture required to train, evaluate, and serve ML models at production scale, integrated with the same data foundations that serve analytical dashboards.
Machine learning infrastructure the compute and data architecture required to train, evaluate, and serve ML models at production scale, integrated with the same data foundations that serve analytical dashboards. Credit: Unsplash

The Feature Store: Reusable ML Inputs

The most important new infrastructure component introduced by production ML was the feature store a system for computing, storing, and serving the input features that ML models consume. Features are derived variables computed from raw data: a customer’s average order value over the last 30 days, a product’s view-to-purchase ratio in the last week, a user’s session length percentile. Computing these features correctly and consistently, across training and serving turns out to be one of the hardest engineering problems in ML.

The core challenge is point-in-time correctness. When training a fraud detection model, you need to know what a customer’s transaction history looked like at the moment of each historical transaction not what it looks like today. A feature store with point-in-time join capability can retrieve the feature values as they existed at any historical timestamp, preventing the data leakage that makes offline model performance far better than online performance.

The feature store also enables feature reuse: a feature computed for one model (say, “customer days since last purchase”) is also useful for a dozen other models. Without a feature store, each team recomputes the same features independently wasteful and inconsistent. With a feature store, features are computed once, stored, and served to any model that needs them through a standard API.

Major feature store implementations include Feast (open-source, supporting offline and online stores with pluggable backends), Tecton (commercial, with a strong focus on streaming feature computation), Hopsworks (open-source with a strong feature catalogue and monitoring layer), and Vertex AI Feature Store (Google Cloud’s managed implementation, tightly integrated with BigQuery and Vertex AI).

MLflow: Experiment Tracking and Model Registry

MLflow, developed at Databricks and open-sourced in 2018, addressed a specific pain point in the ML development lifecycle: reproducibility and organisation. Without experiment tracking, a data scientist working iteratively trying different model architectures, different hyperparameter settings, different feature subsets might run hundreds of experiments and be unable to reconstruct which parameters produced the best result, or to reproduce a result two weeks later.

MLflow’s four components addressed the full ML lifecycle. The Tracking component logged parameters, metrics, and artifacts (model files, confusion matrices) for every experiment run. The Projects component defined a reproducible format for ML code with explicit dependency specifications. The Models component defined a standard model format (MLmodel) that could be served by multiple deployment targets (REST API, Spark UDF, etc.). The Model Registry managed the lifecycle of models from experimentation through staging to production, with version control, annotations, and approval workflows.

From Descriptive to Predictive and Prescriptive

The traditional analytics stack was descriptive: it told you what had happened. The integration of ML models into the analytics stack enabled two higher levels of analytical capability.

Predictive analytics tells you what is likely to happen: a demand forecast for the next four weeks, a customer churn probability score, a recommended next best action. These predictions are derived from historical patterns in the data the same data that feeds descriptive dashboards but require ML model serving infrastructure that the traditional BI stack did not include.

Prescriptive analytics tells you what you should do: an optimal inventory replenishment quantity, a personalised discount offer calibrated to conversion probability, an automated pricing decision. Prescriptive systems close the loop between data and action producing outputs that directly drive operational decisions rather than requiring a human to interpret a chart and make a judgment.

The convergence of analytics and ML in a unified data and AI stack made this progression tractable. Predictions from ML models stored in the data warehouse could be visualised in the same BI dashboards that showed historical metrics. Analysts could compare model predictions against actuals. Data engineers could build prediction pipelines using the same tools and practices as analytical pipelines. The boundary between “analytics team” and “data science team” became a question of domain expertise rather than a technical partition.


References

  1. Chen, A. et al. (2022). Developments in MLflow: A System to Accelerate the Machine Learning Lifecycle. Proceedings of DEEM Workshop at SIGMOD 2020.
  2. Zaharia, M. et al. (2018). Accelerating the Machine Learning Lifecycle with MLflow. IEEE Data Engineering Bulletin.
  3. Feast (2019). Feature Store for Machine Learning. Feast Documentation.
  4. Tecton (2020). The Feature Store for Production ML. Tecton Whitepaper.
  5. Sculley, D. et al. (2015). Hidden Technical Debt in Machine Learning Systems. Proceedings of NeurIPS 2015.
  6. Huyen, C. (2022). Designing Machine Learning Systems. O’Reilly Media.
  7. Breck, E. et al. (2017). The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction. Proceedings of IEEE BigData 2017.

Enjoyed this article?

Get more like it — weekly insights on AI, data, and enterprise tech.

Discover more from DataOnTheMove

Subscribe now to keep reading and get access to the full archive.

Continue reading