Machine Learning Models in Cassandra
You can now use ML models developed with ML.NET in Juno Cassandra.
Over the past six weeks we have been hard at work in testing a Proof-of-Concept (POC) for the implementation of Machine Learning models in Juno Cassandra. This partly explains the reason for the lack of news on this site over the past month!
Our POC project is now nearing completion and we are looking forward to rolling out our first ML-based model for forecasting of Road Condition for a client over the next two months. The POC model incorporates machine learning models to predict the following parameters:
Rutting (segment mean and 90th percentile)
Roughness (segment mean and 90th percentile)
Texture Depth (segment mean and 10th percentile)
Maintenance Probability for Pavement and Surface maintenance
Maintenance Cost for Pavement and Surface maintenance
The above comprises 10 ML models in total. However, separate versions of these models are required for the following model situations:
Incrementing when no treatment is applied
Resetting when Re-Surfacing is applied (Chipseal or Thin Asphalt)
Resetting when Rehabilitation is applied
Thus there are 30 machine learning models in total involved in this Domain Model. In addition to these ML models, the Cassandra model also needs sub-models to keep track of aspects such as Surface Age, Function, Type and Thickness as well as Pavement Age and Type. These parameters, amongst others, are of course needed by the ML models for forward prediction.
We have done some stress testing of our model for a data set comprising 15,824 elements with a model run over 20 years. A forecast-only model runs for 23 minutes on my fairly powerful desktop computer. Of course, when we include treatment selection, this time will lengthen - but we are quite happy with the performance thus far when ML models are included.
As part of this work, we have had to finalise the first version of our Machine Learning implementation in Cassandra. Today I completed the documentation for this feature, which you can find at this page of the Cassandra documentation site.
It has been a great experience for me to work on this project and I have learned a lot about creating an automated pipeline for preparing data to train the Machine Learning models on. There are many interesting aspects that came out of this work which I hope to share with you in future posts. Hopefully I will have a bit more time on my hands to keep you interested and up to date.