Events Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
M
T
W
T
F
S
S
29
30
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
Articles News

The COMET model uses deep learning to improve disease prediction.

EMR Industry

A new machine learning framework called COMET uses transfer learning to combine EHR data with omics analysis, greatly improving predictive modeling and revealing biological insights from small cohorts.

Researchers introduced clinical and omics multimodal analysis enhanced with transfer learning (COMET), a deep learning and transfer learning protocol, in a recent work that was published in the journal Nature Machine Intelligence.

Technological developments in omics have transformed our understanding of biology. Analyte quantification in the same material is now affordable thanks to proteomic, metabolic, transcriptomic, and other tests. Although these tests produce high-dimensional data, the number of omics cohorts is constrained by clinical and financial factors. As a result, new methods are required to enhance high-dimensional data analysis.

While statistical techniques deal with false positives, machine learning (ML) techniques are less common. Some strategies use transfer learning, a method in which a machine learning model is trained on a pre-training dataset and then applied to a smaller dataset. Even though more recent deep learning techniques have been used with statistical frameworks, they mostly rely on learning from omics data or useful metadata.

By combining early and late fusion techniques and using pretraining on sizable electronic health record (EHR) datasets, the COMET architecture gets over these restrictions and enables better biological discovery and prediction performance.

The research and conclusions
Researchers presented COMET, a deep learning and transfer learning technique that enhances omics analyses, in this work. When omics data and electronic health records (EHR) are accessible for both small and large cohorts, COMET may be used. COMET includes pre-training, multimodal modeling, and a technique for embedding longitudinal EHR data.

In COMET, a multimodal architecture trained and assessed on a smaller sample using omics and EHR data will receive the weights of an ML model that was trained exclusively on EHR data. First, a Stanford Healthcare pregnancy cohort of more than 30,904 people had their days to labor onset predicted using COMET. A proteomics dataset of 1,317 proteins was created using many plasma samples taken from 61 pregnant people (the omics cohort) during the final days of pregnancy.

Days to labor onset were predicted using EHR data from blood sampling at the beginning of pregnancy. Weights were passed to a multimodal network trained to generate predictions on the omics cohort following pre-training on EHR-only data (of 30,843 people). The model’s good predictive power was demonstrated by its 0.868 Pearson correlation coefficient (95% CI [0.825, 0.900]). The actual number of days before labor beginning and the anticipated number of days were strongly correlated, suggesting that COMET was quite accurate in small cohorts with multidimensional data.

Next, either proteomics data, EHR data, or both were used to compare COMET with baseline models. These baseline models didn’t have pre-training and only used omics cohort data. With a correlation of 0.768, the EHR-only baseline model scored the worst, but the proteomics-only model did somewhat better at 0.796. With a correlation of 0.815, the combined baseline model outperformed the others, but it was still less effective than COMET.

By projecting the correlation matrix into two dimensions, researchers used t-distributed stochastic neighbor embedding (t-SNE) to visualize multimodal data and uncover significant feature clusters based on correlation patterns. This allowed them to obtain deeper insights. Correlations between close features and every other variable in the space are comparable. The medical ideas that the EHR or protein properties within each cluster represent were used to annotate these clusters. Significant relationships between different proteins and EHR factors were found.

Each protein’s feature importance was calculated by the team. In accordance with accepted biological knowledge, proteins shown to be very significant in COMET models linked with gestational age, pregnancy problems, and fetal development. The three-year cancer mortality was then predicted using COMET on a cancer cohort from the UK Biobank. All of the participants had received a cancer diagnosis within five years after their enrollment.

Blood samples from a subset of participants were available and subjected to proteome analysis. If the samples were taken within a year of the cancer diagnosis, they were added to the omics cohort. With an area under the receiver operating characteristic curve (AUROC) of 0.842, COMET consistently outperformed all baselines in predicting three-year cancer mortality, exceeding both the single-modality and joint baseline models (AUROC 0.786). In the omics cohort, the three-year death rate was 5.5%.

Furthermore, compared to labor onset data, the correlation matrix, which was shown using t-SNE, showed reduced overlap between EHR and proteomics data modalities. However, when the correlation network was displayed, with each modality projected into two dimensions separately, there were notable correlations between proteomics and EHR data modalities. Its potential as a predictive biomarker was highlighted by the fact that mortality factor 4-like protein 2 showed the highest associations with EHR parameters, especially medication prescriptions.

Sixty-six percent of proteins from cancer patients did not correlate with any EHR characteristic. Additionally, the researchers calculated the highest correlation across all proteins for each EHR feature as well as the connection between each EHR feature and all proteins. This highlighted the importance of including several data modalities by revealing numerous EHR variables with weak associations to proteins in cancer patients.

Greater feature relevance proteins in COMET models correspond to established biomarkers for cancer prognosis. Crucially, the biological relevance of the model was further confirmed by the statistical association of mortality status with nine proteins that were more significant in COMET models.

Conclusions
Overall, the study demonstrated how COMET may enhance predictive modeling for a variety of tasks by using pre-training and transfer learning. Better-regularized models that more closely mirrored known biology were produced by COMET. Furthermore, biologically significant proteins for particular health outcomes were found using COMET models.

Proteins essential for immunological control, placental development, and pregnancy problems were identified by COMET in labor onset models, and its predictive power was corroborated by Pearson correlation values. Proteins implicated in tumor growth and microenvironment modification were found to be associated with cancer mortality. All things considered, COMET offers a framework for defining intricate connections between biological pathways and clinical manifestations.