simulation and ML for disease prediction and prevention

Prediabetes Prediction and Prevention with Simulation and Machine Learning

The main project I am currently working on is based on my previous computational studies at RWTH Aachen University in Germany. There we researched the way to predict physiology based on known theories. In this short article, I outline the synergy between theory-based and data-driven computation for disease prediction and prevention like prediabetes.

More specifically, integrating mechanistic simulations and machine learning (ML) for prediabetes prediction and prevention offers a compelling example of how these technologies can be synergized in a healthcare setting. Here’s a concrete example of how this could work:

Prediabetes Prediction and Prevention


Develop a predictive tool that identifies individuals at high risk of developing prediabetes and provides personalized lifestyle and medical interventions to prevent its progression to diabetes.

Approach Outline

1. Data Collection and Preparation

– Collect a comprehensive dataset including

  • EHR: patient demographics, family history of disease, other relevant health metrics
  • Lab: blood sugar levels
  • Wearables: lifestyle factors (diet, exercise)

– Ensure data quality, completeness, and privacy compliance.

2. Mechanistic Simulations (Theory-based)

– Develop a mechanistic model that simulates two key factors in the development of prediabetes:

  • (glucose) metabolism
  • insulin resistance

– Incorporate into the model known

  • physiological pathways
  • biochemical reactions

3. Machine Learning Integration (Data-driven)

– Use the mechanistic model “Sim” to generate a synthetic dataset for conditions where data is scarce, such as early-stage prediabetes.

– Apply ML algorithms (like neural networks or decision trees) to the combined real and synthetic datasets to identify patterns and predictors of prediabetes.

4. Model Training and Validation

– Train the ML model on a subset of the data, continually adjusting and validating its predictions against known clinical outcomes.

– Use the mechanistic model to interpret and validate the ML predictions, ensuring they align with medical understanding.

5. Risk Prediction and Personalization

– Implement the model to predict individual risk scores for developing prediabetes.

– Personalize recommendations for lifestyle changes based on individual risk factors, such as tailored diet and exercise plans.

6. Clinical Decision Support

– Integrate the tool into clinical workflows to assist healthcare providers in early identification and intervention.

– Provide clinicians with actionable insights and evidence-based recommendations.

7. Patient Engagement

– Develop a patient-facing application that offers personalized advice, tracks progress, and encourages adherence to prevention strategies.

– Include educational resources to increase awareness and understanding of prediabetes risks.

8. Continuous Improvement and Adaptation

– Regularly update the model with new data and research findings.

– Monitor and evaluate the effectiveness of interventions and adjust strategies accordingly.

Impact and Benefits

– Early Detection: Identifies individuals at risk of prediabetes earlier, enabling timely interventions.

– Personalization: Offers customized prevention strategies, increasing their effectiveness.

– Education and Awareness: Raises awareness about prediabetes and its risk factors.

– Healthcare Efficiency: Helps healthcare providers prioritize resources and interventions for high-risk individuals.

– Long-term Health Outcomes: Potentially reduces the incidence of diabetes and associated health complications.


This example illustrates how mechanistic simulations and ML can complement each other to enhance healthcare delivery. The combination of detailed physiological understanding with powerful pattern recognition capabilities of ML can lead to more accurate, personalized, and effective healthcare solutions, particularly in complex areas like prediabetes prediction and prevention. I deeply believe that we need to use the right tool for the problem throughout the healthcare value stream: discovery (sim), detection (ML), diagnosis (ML+sim), treatment (ML+sim), and monitoring (ML).

What is your take on this? Oh, and merry Christmas 2023 😉


Collin, C.B. et al. (2022) ‘Computational Models for Clinical Applications in Personalized Medicine—Guidelines and Recommendations for Data Integration and Model Validation’, Journal of Personalized Medicine, 12(2), p. 166. Available at: