The doctor still interprets a lung photo better than a computer

For once, no news report that says that artificial intelligence is again trumping humans in something. In daily practice, radiologists still appear to be better at recognizing common lung diseases than AI systems. This is evident from a comparison between four AI tools and a pool of 72 doctors in which more than 2,000 X-rays of the lungs were assessed for common acute conditions.

The commercially available AI systems were quite good at their job, but they were more likely to diagnose a condition when it wasn’t there (a false positive) than the radiologists. They also performed less well than the doctors when multiple diseases were present at the same time, or when deviations were only minor.

This researchers write affiliated with two Danish hospitals and the University of Copenhagen on Tuesday in the scientific journal Radiology. AI should not yet be seen as more than a supporting tool and care must be taken, especially in complex cases, they conclude.

Deep learning

Recognizing deviations in patterns is a task that artificial intelligence (AI) is very good at. Thanks to deep learning those who apply such algorithms can process information in multiple, increasingly abstract, layers. In addition, unlike doctors, it is easy to train the algorithms with millions of sample photos, algorithms work very quickly and never get tired.

This study looked at how well AI can recognize a number of common lung diseases: respiratory diseases (such as pneumonia or pulmonary edema), pneumothorax (collapsed lung) and pleural effusion (build-up of fluid around the lungs).

2,040 lung photographs were reviewed, from patients between 58 and 81 years old. In 669 cases there was an acute lung disease, in 1,371 cases there was not. The photos were not isolated, previous photos were available of 1,641 patients, more than half of the patients had multiple – even non-acute – problems with the lungs. The radiologists could include this information in their assessment just as in their daily practice, but the AI ​​systems could not do that.

Missed diagnoses

The most striking differences were seen in identifying respiratory diseases. The AI ​​tools incorrectly reported a positive result between 13.7 and 36.9 percent of the cases, while for the radiologists this was 11.6 percent of the cases. The share of missed diagnoses (false negative) was on average around 20 percent for both the AI ​​tools and the radiologists, and these usually involved subtle abnormalities.

When identifying fluid around the lungs, the false positive rates were much lower (AI 1.1 to 2.4 percent, radiologists 0.2 percent). A collapsed lung was assessed as falsely positive and falsely negative with equal frequency.

“Like previous research, this study also shows that doctors should not overestimate the results of AI,” write two radiologists from Osaka University in Japan in a commentary in Radiology. But it is expected that AI tools will continue to improve. “An important reason that doctors perform better is access to more data than that one photo. AI can also develop in this direction. Moreover, other types of algorithms are emerging that learn slightly differently, and the first results are promising.”

The Japanese also find it a shame that the study does not break down the results of the radiologists by years of experience. “To find out what impact AI can have on physician performance in the actual clinical environment, this is essential information.”

ttn-32