August 20, 2025

Synoptic AI Engine Excels in Medical Record Interpretation

Synoptic Intelligence Outperforms 20x Larger Language Model in Multi-Lable EHR Interpretation

Pairwise Meta-learning shows promise in preclinical, bioassay-centered molecular activity prediction. On top of this framework, we introduce a new element — synoptic learning — and build an advanced, synoptic intelligence engine for application in clinical studies and real-world healthcare.

A preliminary study is carried out on ICD disease labeling of de-identified electronic health records (EHR) and the performance of our synoptic intelligence is compared with Google’s T5 language model.

We will briefly introduce the rationale of study design before looking at the performance metrics:

1. Datasets

Foundation models, including T5, are known for a broad spectrum of applications by virtue of their ability to quickly learn from the limited data of each application. This advantage is attributed to the fact that the models have been pre-trained on a “large” dataset. In such a dataset, a high degree of content diversity is assumed. 

T5 models were pre-trained on the C4 (Colossal Clean Crawled Corpus) dataset. It contains 365 million documents, summing to 156 billion tokens.

A subset of Medical Information Mart for Intensive Care III (MIMIC3) was used in a previous report to test different modules in Google’s T5 language models. We adopt the same data split in the current study.

The MIMIC3 data split yields 30k medical records for training, 10k for validation, and 10k for testing. This is about 12,167 times smaller than C4 in terms of per-record/per-document application.

2. Multi-label disease coding

Each MIMIC record is annotated with the medical conditions it involved in the form of ICD codes. The models are tested with their abilities to predict the presence/absence of 19 codes, which comprise the ICD9 disease & injury index, in each record’s annotation.

Notably, the presence of one index does not exclude the presence of another.

This reflects the reality in our daily healthcare where multiple medical conditions, such as co-morbidities and patients’ adverse reactions to the treatment, usually occur in the same case.

3. Model training on MIMIC3

As the baseline, the retrained T5-base encoder is fine-tuned. It begins overfitting after 10 epochs of fine-tuning on the relatively small, 30k training set.

The model of pairwise meta-learning is trained on the same 30k dataset from scratch.

For each model, the set of model parameters that yields the optimal performance on the validation set during the training process is used in testing.

4. Multi-label predictions from individual EHR sections

Each MIMIC record contains pre-defined sections. 

11 relevant sections — Chief complaint, brief hospital course, history of present illness, past medical history, family history, social history, surgical history, medications on admission, physical exams, discharge medications, and discharge diagnosis — are used as inputs for the models to predict ICD labels.

Just as in hospitals where medical encounters do not equally distribute across departments, data availability is not balanced over the 19 ICD codes. To attribute equal importance to all disease/injury categories, we adopt the macro average of F1 scores to present the model performance.

The T5 encoder has better macro-F1 scores on 7 EHR sections, and a section average of 0.49.

The model of pairwise meta-learning performs better on sections of medical history, hospital course, physical exams, and has a section-averaged score of 0.47.

In contrast to the marginally lower score of macro F1, the model of pairwise meta-learning has a better accuracy (0.7) than the T5 encoder (0.58), though.

Here comes the question: 

Should we weigh the evidence on certain EHR sections or trust the average? Which scoring metric to consider? 

Synoptic Intelligence Resolves Discrepancy in Clinical Statistics

When we enable predictions through the synoptic learning module, it captures hidden synergies of accessible EHR sections and improves both the scores of accuracy (0.79) and macro F1 (0.63) by an evident margin.

Contradicting statistics from different clinical studies and hospital encounters are a common issue in healthcare research and practice — not only as a consequence of sampling bias, type I/II errors, and missing data. The variability in real-world patient populations is also a major contributing factor.

Unlike drafting a scientific study report, we can’t conclude the issue like “this is a complex issue with multiple reasons; addressing this challenge requires a multi-faceted approach that includes A, B & C.” and leave the problem to the future.

In medical practice, we have to make decisions on the spot. The synoptic intelligence engine is developed to support real-time decision making.

Join Us to Solve The Problem in Real Life

It is noteworthy that synoptic intelligence improves model performance by 36%, as compared to the T5 encoder, using a 20x smaller parameter set (66MB vs 1.3GB). Striking escalation of the predicted power is anticipated when data availability matches the model size at scale.

See our offers to leverage this leading-edge technology in therapeutic development and clinical decision, or let us know what you need.

More to Explore

Rewriting the Rules of Pediatric Safety

The future of pediatric care isn’t coming — it’s already here. We’re introducing Synoptic Intelligence to break the bottleneck in EHR‑based AI and safeguard children from harmful, unintended consequences of medical treatment.

Health and Economic Challenge of Adverse Drug Reactions in Children

The challenge and proactive solutions at a glance

3 Takeaways from Our Media Coverage

How our deep learning outperforms prominent models in bioactivity prediction and empowers practical, scalable solutions in real world

True Acceleration Saves Much More than You May Have Imagined

A tenfold reduction of the time for lead optimization saves the development expenses by 15 folds. More savings can be expected when it comes in synergy with the reduction of research cost.