Docs Menu
Docs Home
/ /
Atlas Architecture Center
/ / /

AI-Powered Healthcare with MongoDB & Microsoft Solution

MongoDB and Microsoft deliver AI-powered solutions for breast cancer care, unifying data and enabling predictive modeling, intelligent chatbots, and analytics.

Use cases: Analytics, Gen AI, Interoperability

Industries: Healthcare

Products and tools: MongoDB Atlas, Atlas Search, Atlas Data Federation, Atlas Charts

Partners: Microsoft

MongoDB Atlas and Microsoft AI technologies converge in an innovative healthcare solution called "Leafy Hospital," showcasing how cutting-edge technology can transform breast cancer diagnosis and patient care. This integrated system leverages MongoDB's flexible data platform for unifying operational, metadata, and AI data, while incorporating Microsoft's advanced capabilities including Azure OpenAI, Microsoft Fabric, and Power BI to create a comprehensive healthcare analytics and diagnostic solution. The solution demonstrates three key technological approaches:

  • Predictive AI for early detection using deep learning models to analyze mammograms and predict BI-RADS scores.

  • Generative AI for workflow automation, featuring Vector Search capabilities and RAG-based chatbots for intelligent information retrieval.

  • Advanced analytics that combines real-time operational insights with long-term trend analysis through Power BI integration.

This approach enables healthcare providers to streamline diagnostic processes, automate clinical documentation, and make data-driven decisions while ensuring secure handling of sensitive patient information.

The reference architecture illustrates how the Leafy Hospital solution integrates various components across three main technological areas:

  1. Predictive AI layer (bottom yellow box):

    • Fabric Data Science processes mammogram images and clinical data.

    • Handles BI-RADS (breast imaging-reporting and data system) scoring and biopsy type analysis.

    • Determines malignant or benign classification.

    • Receives images from Azure Blob Storage.

    • Outputs operational data to MongoDB Atlas.

  2. Generative AI layer (middle purple box):

    • Azure AI Studio integrates with MongoDB Atlas on Azure.

    • Enables automated report generation for clinical documentation.

    • Features a chatbot for question-answering capabilities.

    • Processes operational and vector data from MongoDB Atlas.

    • Facilitates natural language interactions with the system.

  3. Advanced analytics layer (middle green box):

    • Combines Fabric Power BI and Fabric OneLak.

    • Generates reports and dashboards from processed data.

    • Integrates with MongoDB Atlas for data visualization.

    • Provides comprehensive analytics capabilities.

The data flow begins with medical images stored in Azure Blob Storage, which are then processed through the various layers:

  • Images and operational data flow through Fabric Data Science for AI processing.

  • Results are stored in MongoDB Atlas, which serves as the central operational database.

  • Azure AI Studio handles generative AI tasks using the stored data.

  • Finally, Fabric Power BI and OneLake enable advanced analytics and visualization.

Leafy Hospital solution architecture

Figure 1. Leafy Hospital solution architecture

This architecture ensures a seamless flow of information from raw medical data to actionable insights while maintaining security and performance throughout the system.

The Leafy Hospital demo showcases the integration of MongoDB Atlas with Microsoft's AI and analytics services through several key components:

The solution's data architecture supports both operational and analytical workloads efficiently. MongoDB Atlas serves as the operational datastore for real-time AI applications, while Microsoft OneLake handles analytics for long-term trend analysis. This dual architecture enables:

  • Real-time processing of patient data and medical imaging.

  • Seamless integration between operational and analytical systems.

  • Efficient data flow from transactional to analytical processing.

  • Support for both millisecond-response operational queries and complex analytical workloads.

Real-Time to Analytics Data Pipeline

Figure 2: Real-time to analytics data pipeline

Predictive AI is critical in healthcare as it aids in accurate diagnosis, relying on predictions from large datasets compared with manual analysis, which is likely to bring in manual errors. Microsoft Fabric Data Science presents a robust platform to train and experiment with ML Models and manage MLOps cycles. In this solution, we trained two models.

  1. BI-RADS prediction:

    BI-RADS is an industry standard mechanism to describe mammogram findings and is classified in seven categories with a score of possibility of a malignant cancer increasing with the score value from 0 to 6. VGG16 is a deep convolutional neural network (CNN) model. It is trained on mammogram images from the dataset on Kaggle, which were grouped in folders as per their BI-RADS. Image analysis needs deep neural network models and the best model needs to be selected based on training on actual datasets running into multiple epochs.

    Fabric Data Science is used to train the models, run experiments, and manage the multiple versions. Multiple experiments were run with the two algorithms VGG16 and EfficientNetV2L, and the easy comparison of the multiple ML parameters and metrics for each version helps in the selection process of the final model. The images for training are directly uploaded to the Lakehouse in OneLake from the user's local machine using the UI itself. Additionally, the images stored in Azure Blob Storage can be easily referenced in the notebook downloading them from the blob URL using wget/curl, referencing using shortcuts, or even using a data pipeline. The image metadata and final prediction are stored in MongoDB Atlas.

  2. Biopsy classification:

    For the use case of binary classification of the cancer as malignant or benign, classification or regression models can be used. Random forest classifier model is trained on a dataset from Kaggle, with nine input parameters such as clump thickness, uniformity of cell size and shape, bare nuclei, mitoses, etc. Based on the values of these parameters the model is able to predict if the cancer is malignant or benign. In production use cases, more parameters can be added and the model can be trained from their values to be able to predict with more accuracy. Random forest model gave an accuracy of more than 97% and thus was ideal for this use case. The training dataset is fetched from MongoDB Atlas and prediction output is updated back to MongoDB, thanks to the MongoDB Spark Connector.

Fabric Data Science makes the training and managing the end-to-end ML lifecycle easy and intuitive. Fabric Data Science manages the lifecycle by auto logging related parameters for each experiment and model using the de-facto data science standards of MLflow.

Vector Search capabilities form the foundation of the solution's intelligent querying system, implemented in three key stages:

  1. Data preparation:

    • Clinical notes are processed using Azure OpenAI's text-embedding-ada-002 model.

    • Data is converted into vector embeddings for high-dimensional space representation.

    • Vector embeddings are stored in MongoDB Atlas with optimized search indexes.

  2. Query processing:

    • Natural language queries are converted to vector representations.

    • Semantic understanding enables complex medical queries.

    • Query vectors are matched against stored embeddings.

  3. Document retrieval:

    • Atlas Search executes similarity-based searches.

    • Returns relevant medical records based on semantic matching.

    • Enables intuitive access to patient information.

Vector Search Implementation Process Flow

Figure 3. Vector Search implementation process flow

The chatbot implementation leverages retrieval augmented generation (RAG) architecture with three distinct data contexts:

  1. Patient information retrieval:

    • Executes queries to fetch current patient details.

    • Retrieves structured patient data from MongoDB collections.

    • Provides immediate access to critical patient information.

  2. Historical data processing:

    • Accesses 10-year patient history from MongoDB Atlas.

    • Decodes and summarizes historical data through Azure OpenAI LLM.

    • Implements thought chaining for context-aware responses.

  3. Medical knowledge integration:

    • Uses vectorized medical documentation.

    • Performs real-time vectorial searches based on the query's context.

    • Integrates relevant medical literature and case studies.

Blueprint for the Chatbot architecture

Figure 4. Blueprint for the chatbot architecture

The solution leverages two complementary visualization platforms for comprehensive analytics.

First, MongoDB Atlas Charts provides native, real-time operational dashboards directly connected to MongoDB data. It enables immediate insights into critical healthcare metrics through intuitive visualizations without requiring data transformations or additional tools. The operational dashboard (Figure 5) demonstrates key metrics including patient numbers, appointment status, and clinic distribution.

Atlas Charts Dashboard

Figure 5. Atlas Charts

Then, Power BI integration extends the analytics capabilities by enabling enterprise-wide data analysis and advanced visualizations. Through the MongoDB Atlas Connector, healthcare data can be combined with other enterprise sources in Microsoft OneLake. The geographical visualization dashboard (Figure 6) showcases this integration, displaying patient distribution and enabling sophisticated analytical capabilities.

PowerBI integration with MongoDB Atlas

Figure 6. PowerBI integration with MongoDB Atlas

Together, these platforms provide a complete analytics solution that handles both immediate operational needs and long-term analytical requirements.

The solution demonstrates how MongoDB Atlas serves as a unified platform that handles operational data, Vector Search capabilities, and analytics requirements while seamlessly integrating with Microsoft's AI and visualization tools. This architecture enables healthcare providers to leverage both real-time operational insights and long-term analytical capabilities within a single, coherent system.

For a detailed, step-by-step guide on implementing this solution, including code samples and specific configuration instructions, visit our GitHub repository .

  • Unified data platform: MongoDB Atlas serves as a central repository that effectively unifies operational data, metadata, and AI data, enabling seamless integration between different components of the healthcare system.

  • AI integration capabilities: The architecture demonstrates how different types of AI (Predictive, Generative, and Analytics) can be effectively integrated into a single healthcare solution using Microsoft's AI services and MongoDB Atlas.

  • Workflow automation: The solution shows how AI can automate critical healthcare workflows, from diagnostic predictions to report generation and intelligent querying through chatbots, reducing manual effort and potential errors.

  • Scalable analytics: The combination of MongoDB Atlas with Microsoft Fabric and Power BI enables both real-time operational analytics and long-term trend analysis, providing comprehensive insights for healthcare decision-making.

  • Secure healthcare architecture: The solution exemplifies how to build a modern healthcare system that maintains data security and privacy while enabling advanced AI capabilities and data analytics.

Partner technologies:

  • Francesc Mateu, MongoDB

  • Diana Annie Jenosh, MongoDB

  • Sebastian Rojas Arbulu, MongoDB

Back

Healthcare Interoperability

On this page