Deep Learning Tool Matches Human Experts at Sleep Test Scoring

A deep learning tool produced diagnostic scores for several sleep conditions with the same level of accuracy as human experts.

A deep learning tool was able to replicate diagnostic scores for sleep staging, sleep apnea, and limb movements at a level of accuracy comparable to that of human clinicians, a study published in JAMIA revealed.

Common sleep disorders including sleep apnea, insomnia, and restless legs syndrome affect tens of millions of adults, the researchers noted, and these conditions are often linked to decreased work productivity, poorer quality of life, and increased mortality.

Timely identification of these disorders is critical so that patients can pursue appropriate treatment, yet many individuals do not receive diagnoses for their issues. While technological advancements have increased access to sleep diagnostics, both at-home and in-lab polysomnography (PSG) still require manual scoring, which is time consuming and demands the attention of a highly skilled clinician.

Want to publish your own articles on DistilINFO Publications?

Send us an email, we will get in touch with you.

To automate the sleep scoring process, the research team developed a deep learning model and tested it on data from the Massachusetts General Hospital (MGH) sleep laboratory.

The deep learning model was trained to assign each 30-second PSG to one of five sleep stages: awake, rapid eye movement (REM), sleep, and non-REM stages. The model was also trained to use five respiratory channels to detect apnea events and limb movement events.

The researchers found that the model achieved an overall accuracy of 87.5 percent for sleep staging, which compares favorably to human performance. Additionally, the team noted that the deep learning model outperformed an existing machine learning method for sleep staging, which demonstrated an accuracy of 69.34 percent.

When sorting apnea events into standard clinical categories of mild, moderate, and severe, the deep learning model achieved an overall diagnostic accuracy of 88.2 percent, which was comparable to human performance.

When identifying limb movement events, the deep learning model also achieved a scoring accuracy that strongly correlated with that of human experts, showing a regression of 0.79.

To test the generalizability of the model, the researchers applied the tool to another independent dataset. The team evaluated the model’s sleep staging performance, as well as its ability to detect sleep apnea events.

When tested on the independent dataset, the deep learning model achieved an accuracy of 77.7 percent for sleep staging. For apnea detection, the model also demonstrated a strong correlation with that of human clinicians, at a regression of 0.77.

These findings demonstrate the potential for deep learning technology to automate the sleep scoring process, improve sleep disorder diagnosis, and even increase patient care access.

The researchers noted that portable sleep scoring systems reach a far larger audience than in-lab scoring systems, and these systems would directly benefit from reliable, accurate deep learning algorithms. Improved sleep disorder classification would have important implications for devices such as home sleep apnea kits and consumer-facing devices.

“Our results demonstrate human-level performance of deep learning algorithms trained on large PSG datasets to replicate the primary categories of scoring: stages, sleep apnea, and limb movements,” the researchers said.

“The ability to automate overnight PSG scoring with the accuracy of a sleep specialist has the potential to expand access to essential medical care.”

Moreover, the researchers asserted that the results show the generalizability of the tool, as the model performed with considerable accuracy when tested on the independent dataset.

“Our results suggest that, given sufficiently large datasets, training on real-world data can yield human level performance and generalize to standardized data sets,” the group said. “The capacity to generalizability is a prerequisite for algorithm deployment in real-world settings.”

Future studies should focus on incorporating deep learning technology into specific monitoring devices, the team stated, as well as optimizing the technology’s performance in real-world clinical settings.

“Our deep network is accurate and scalable, and can be deployed on multi-channel (e.g. in-lab PSG) or limited channel (e.g. portable) acquisition systems,” the team said.

“The potential for substantial clinical impact includes broadening the reach of clinical sleep medicine, augmenting clinical decision-making for sleep specialists, and improving the accuracy and reliability of at-home portable systems.”

Date: January 9, 2019

Source: Health IT Analytics