1. Introduction
Aortic dissection (AD) is a severe thoracic aortic disorder and a cardiovascular emergency with a high mortality risk [
1]. Its incidence is increasing, rising each year 15 cases per 100,000 patients [
2,
3,
4]. If left untreated, acute AD has a 33% mortality rate within the first 24 hours, rising to 50% by 48 hours and up to 75% in undiagnosed cases of ascending AD [
2,
5,
6,
7]. Nearly 22% of patients die before reaching medical care [
3,
4]. Timely diagnosis through imaging and prompt patient management are crucial, as mortality in AD patients increases by 1-2% per hour in the first 24 hours [
8].
Computed tomography angiography (CTA) is the gold standard for diagnosing suspected AD due to its non-invasiveness and ability to produce rapid, high-quality images [
4]. However, the Royal College of Radiologists reports that radiologists face significant pressure from increasing workloads and demands for efficiency, leading to fatigue, errors, and delays in diagnosis [
9,
10]. Some studies reported dire consequences of potential delays in treatment, often resulting from misdiagnosis, late diagnosis, or a low clinical index of suspicion in the Emergency Department [
11,
12].
Deep learning–based artificial intelligence (AI) systems are emerging in radiology, showing promise in neurology, cardiology, thoracic imaging, and cancer screening and detection [
13,
14,
15,
16,
17]. AI tools can help identify AD features on CTA, reducing the risk of missed lesions [
18]. New AI tools have been developed and validated to assist radiologists by prioritising AD positive cases, ensuring they receive urgent attention [
19,
20]. These AI systems effectively identify most dissections and all available aortic ruptures, placing critical cases at the top of radiologists' worklists. Prioritisation helps streamline the workflow, enabling timely and accurate diagnosis and treatment for patients needing immediate care [
21]. This approach is a more effective alternative to the First-In, First-Out (FIFO) methodology, which has significant drawbacks, including its failure to consider case urgency or severity and potentially leading to increased risks in patient care [
22].
There is limited literature on the advantages of computerised tools for AD detection on radiologist workflow and patient outcomes. A recent study highlighted that an automated tool not only exhibited good technical performance but also significantly decreased the time between study intake to radiologist read [
21]. To elucidate the clinical benefits of integrating an automated tool for AD detection and prioritisation, we conducted a retrospective Multi-Reader Multi-Case (MRMC) study. This study aimed to assess the impact of the validated AI algorithm, CINA-CHEST for AD (Avicenna.AI, La Ciotat, France), on radiologists' efficiency and the time required to identify AD-positive cases. We simulated two clinical workflows for AD detection: a conventional FIFO approach without AI assistance and an AI-enhanced approach based on prioritisation. We hypothesised that the use of AI plays a crucial role in prioritising critical AD cases, improving the timeliness of prompt diagnosis.
2. Materials and Methods
2.1. Data Collection
The study was conducted in accordance with the 1975 Helsinki Declaration (as revised in 2013). Prior to investigator assessments, all data were anonymised in compliance with HIPAA and General Data Protection Regulation (GDPR) (EU) 2016/679. Informed consent was waived when it was deemed necessary, following national legislation and institutional protocols. Anonymised CTA cases were retrospectively provided by a large U.S. teleradiology network and acquired between July 2017 and December 2018 from multiple clinical sites across 90 cities in the U.S. The dataset included CTA images from 4 scanner vendors including 16 different scanner models: 5 GE Medical Systems, 2 Philips, 6 Siemens and 3 Canon/Toshiba. Among these patients, we identified the cohort based on the following criteria: 1) age ≥ 18 years old, 2) chest or thoraco-abdominal CTA scans only, 3) slice thickness ≤ 3 mm or less with no gap between successive slices, and 4) soft tissue reconstruction kernel. Exclusion criteria included: 1) not compatible with the recommended acquisition protocol, 2) thoracic aorta out of the field of view, and 3) significant acquisition artefacts impending CTA interpretation.
2.2. The Ground Truth
Two U.S. board-certified expert radiologists (P.D.C. and D.S.C.), with 7- and 6-years of experience in clinical practice respectively, independently analysed CTAs to determine the presence of aortic dissections (ADs). In case of disagreement, a third U.S. board-certified expert radiologist (J.E.S.), with 8 years of experience in clinical practice, analysed the cases and the ground truth (presence or absence of AD, along with the classification of AD type) was determined by majority agreement. All classifications of AD—hyperacute, acute, subacute, or chronic—were considered positive. Additionally, the radiologists documented observed confounding factors, such as thoracic or abdominal aneurysms, intramural hematoma, calcifications, and post-surgery instances (e.g., presence of stents).
2.2.1. AI Algorithm for AD
An FDA-approved and CE-marked commercially available DL-powered application for AD, CINA-CHEST v1.0.3 (Avicenna.AI, La Ciotat, France), was utilised in this study. The application automatically processes CTA and displays notifications of suspected findings (if any) alongside image series information. For cases flagged as positive by the application, the type of AD (Type A or B) is displayed and localised with a red bounding box. These cases are subsequently prioritised in the worklist according to their classification as either positive or negative. A more detailed explanation of the deep learning algorithms and the integration process of CINA-CHEST for AD was provided in a recent study [
19].
2.3. Multi-Reader Multi-Case (MRMC) Study
A retrospective, multi-center, fully-crossed MRMC study was conducted to evaluate the clinical efficacy of CINA-CHEST for AD on clinical workflow. This study included two phases, the pre-AI phase (Unaided Arm) in which radiologists detected AD without the aid of the software’s outputs and the post-AI phase (Aided Arm) in which radiologists detected AD assisted by the application outputs.
Three radiologists (J.C.J, B.W. and C.Z.; two U.S. board-certified radiologists and one fellow in general radiology) participated in the study and were different from the ones who established the ground truth, they had respectively 9, 5 and 2 years of experience in radiology clinical practice. The readers analysed every CTA twice, once unaided and once aided by the software’s outputs. A 1-month washout period between the two sessions mitigated recall bias. The study design and the radiologist reading workflow are summarised in
Figure 1.
In the pre-AI phase, cases were presented in random order without alerts, simulating a conventional first-in, first-out (FIFO) worklist [
22]. Radiologists analysed the cases as they would do in their routine daily practice, with each CTA appearing in the radiologists’ worklist sequentially and the most recently completed examination positioned at the bottom of the list. In the post-AI phase, the AI application flagged potential positive AD cases, prioritising them at the top of the worklist for evaluation (
Figure 1b,
Figure 2).
All readers assessed the CTAs for AD. Uncertain cases were marked as indeterminate, while cases without dissection were labelled "No dissection." Indeterminate cases were excluded from the analysis. Readers were blinded to each other, the ground truth, and patient clinical data.
The time taken by each reader to analyse each case was automatically measured, starting from the moment the reader began the analysis and ending when they validated the result and proceeded to the next case. Based on this data, we evaluated two key metrics: the primary endpoint, equivalent to Scan-to-Assessment Time (STAT) for true positive cases, defined as the cumulative duration (in minutes) from when a study becomes available for interpretation on the clinical workstation to the moment the final diagnosis is made; and the secondary endpoint, per-case Interpretation Time (IT), which is the time interval (in minutes) beginning when a radiologist opens the corresponding CTA and ending when they submit their final response. This latter represents the duration required for the radiologist to analyse one case, interpret it, and confirm their findings before moving on to the next case.
The effectiveness of CINA-CHEST for AD in reducing the time required to identify and assess AD on CTA was evaluated under the assumption that all the scans were previously acquired and became available for interpretation on the workstation simultaneously, simulating the most critical situation in an emergency hospital department with high workload.
2.4. Statistical Analysis
Initially, the results computed by CINA-CHEST for AD were compared to the ground truth, with the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, and accuracy calculated for the entire dataset. The 95% confidence intervals (CI) for sensitivity, specificity, and accuracy were determined using the Clopper–Pearson method, based on the exact binomial distribution. The performance of each of the three U.S. board-certified radiologists was first individually evaluated against the ground truth and subsequently assessed across the two phases of the study, before and after the implementation of AI, by measuring the AUROC, sensitivity, specificity, and accuracy in each phase.
Furthermore, to compare workflow metrics pre- and post-AI, the STAT for each true positive case was calculated for each arm as follows (Equation 1): STAT(n) in minutes = ∑ni = 1 IT(i) where STAT(n) represents the scan-to-assessment time for the n-th case and IT(i) denotes the interpretation time for the i-th case. Stratified analyses were conducted to explore potential variations across various subgroups based on readers' expertise, with a per-group comparison of junior (< 5 years of clinical experience) versus senior readers (≥ 5 years of experience). Additionally, a comparison of STAT between pre- and post-AI phases including all positive and negative cases was performed. In addition, the per-case IT was calculated for each arm, as follows (Equation 2): IT (seconds) = TDiagnosis - Tstudy open. The differences between the aided and unaided arms for mean STAT and mean IT were assessed.
To evaluate the statistically significant difference (significant reduction, α ≤ 0.05, two-sided) between the aided arm and the unaided arm, a mixed effects repeated measures model was implemented with reader, case and AI usage (Aided versus Unaided) included as fixed effect terms in the model and a paired sample t-test was conducted [
23]. A p-value < 0.05 was considered to represent statistical significance. All statistical analyses were conducted using MedCalc Statistical Software (v22.023, MedCalc Software Ltd).
4. Discussion
In this retrospective, multi-center, fully-crossed MRMC study, three radiologists interpreted 95 cases twice to evaluate the clinical efficacy of CINA-CHEST for AD. The primary goal was to determine if the AI tool could reduce the STAT for AD-positive cases, compared to the STAT of a traditional FIFO workflow. This study underscored the challenges radiologists face in promptly identifying urgent cases in critical situations with very busy worklists. To the best of our knowledge, this study is the first to demonstrate improved STAT and IT in identification and prioritisation of AD on CTA scans using AI.
The CINA-CHEST for AD software showed robust standalone performance, achieving an AUROC of 0.971, 97.89% accuracy, 94.29% sensitivity, and 100% specificity. In the post-AI phase, readers’ performance remained consistently high. The junior reader maintained 100% sensitivity across both phases, with only a minor non-significant drop in specificity (98.33% to 96.66%). The two senior readers experienced a small improvement in accuracy post-AI, with one of them reaching 98.94% accuracy compared to 97.89% pre-AI. The overall AUROC remained high for all readers in both phases, indicating that the software effectively supported accurate diagnoses without significant changes in their performance. Notably, in the pre-AI phase, all readers misclassified one case as positive, which was later confirmed as a true negative using the help of the device in the post-AI phase. Additionally, two of the three readers classified two cases as indeterminate in the pre-AI phase, later identified as true negatives post-AI, though these two cases were excluded from the final analysis. These findings underscore the AI tool’s ability to maintain diagnostic accuracy and reliability, even for a pathology generally considered easy to identify, without introducing variability in clinical judgment.
By simulating real-world clinical workflow, we found that AI expedited identification, and prioritisation of the most critical AD cases. With AI-assisted detection, radiologists identified all positive cases in an average of 5.07 minutes, approximately 11 minutes faster than with traditional FIFO methods, resulting in a 68% improvement in efficiency. Similarly, a global comparison of STAT, including all positive and negative cases, resulted in a significant reduction of 26.8% (4.6 minutes) in the aided arm, demonstrating that AI streamline the detection of critical cases but also enhances overall workflow efficiency for both urgent and routine cases.
Prioritising radiology worklists may enhance patient care and reduce radiologist workload, in contrast to the traditional FIFO workflow, which is driven by often incomplete and ambiguous priority categories (e.g., stat, ASAP, now, critical) determined by the ordering physician’s urgency information [
24]. By actively prioritising positive cases, AI may enable radiologists to spend more time on critical instances which are often misdiagnosed due to its nonspecific symptoms [
25,
26]. According to the International Registry of Acute Aortic Dissection, the median time from presentation in the emergency department to a definitive diagnosis of acute aortic dissection is 4.3 hours, due to the high patient load [
27,
28]. Hence, the implementation of an AI algorithm capable of detecting AD features on CT images could greatly reduce delays in treating potentially serious aortic lesions and shorten the patient's hospital length of stay [
17,
29].
To date, no other studies have focused specifically on the prioritisation of AD detection in relation to STAT. Previous studies have assessed the performances of AI triage solutions for AD detection, mostly evaluating the diagnostic ability and accuracy of the algorithm compared to those of the radiologists. However, these findings might not determine the true clinical benefit on radiology workflow and patient management [
19,
30,
31,
32]. Harris et al. developed a convolutional neural network model trained to detect AD and rupture, resulting in a median reduction of 395 seconds in the delay time, the interval between when a study is received by the system and when it is opened by a radiologist. This is the only study found in the literature that begins to address study prioritisation for AD, however, its primary focus is on evaluating the technical processing performances of the device [
21].
Conversely, there are several studies evaluating AI-based prioritisation effectiveness for different pathologies, such as pulmonary embolism (PE), intracranial haemorrhage (ICH) or cancer. For instance, AI prioritised worklists significantly reduced the time to diagnosis of incidental PE on CT scans in cancer patients. The median turnaround time (TAT) for true positive examinations flagged by the AI software, was 91 minutes, resulting in a significant reduction of TAT from several days to 1.5 hours, compared to the traditional FIFO workflow [
33]. Moreover, the implementation of AI in radiology significantly reduced scan-to-alert time (from scan initiation to AI alert) for PE patients, with an average AI alert time of under 6 minutes. These findings highlight the critical importance of a prioritisation model, which may improve patients’ chance of survival [
34]. Similarly, regarding ICH, the incorporation of a machine learning algorithm into the clinical radiology workflow, decreased significantly the mean report TAT from 75 to 69 minutes in emergency settings, leading to faster critical case identification and improved patient outcomes in urgent care scenarios [
35]. Even though these studies were conducted for other pathologies different to AD, they demonstrate that AI-based prioritisation can significantly reduce TATs and improve patient outcomes, highlighting the potential for similar benefits in other diagnostic areas.
On the other hand, regarding individual reader performance, our stratified analyses revealed that each of the three junior and senior readers experienced a significant reduction in STAT for true positive cases during the post-AI phase, with reductions ranging from 5 to 15 minutes compared to the pre-AI phase. When including positive and negative cases in the analysis, one of the three readers showed no significant difference between the aided and unaided study arms, and even a negligible, non-significant increase in STAT. This variability reflects differences in how AI integration could affect individual readers' workflows, however, despite individual discrepancies on average the overall STAT was statistically reduced. In addition, for true positive cases, the senior group experienced a greater gain in STAT (71.5%) from the AI tool than the junior group (53.8%), suggesting that more experienced radiologists gained a greater efficiency boost from the AI implementation compared to their less experienced counterparts. Other studies evaluating IT rather than TAT have shown similar trends, Bennani et al., reported greater IT improvements for general radiologists (34%) compared to residents (30%) in an MRMC study on thoracic abnormalities, and Muller et al., noted a small increase in IT for one resident with AI aid, though residents perceived a better overview of cases [
36]. The greater efficiency improvement observed in senior radiologists suggests that their advanced skills and familiarity with complex cases enable them to leverage AI tools more effectively than junior radiologists. However, the impact of AI can depend on user experience and clinical context.
Finally, an analysis was performed specifically on IT, the average time a radiologists spend interpreting a case regardless of its position in the working list. We found that the aid of the AI software improves radiologists’ IT by 33%, which highlights the helpfulness of AI in AD detection and consequently in identifying the most complex cases, enabling radiologists to spend more time on these critical instances. Previous studies have shown that AI tools can reduce IT for chest CT scans by 7% to 44% compared to traditional methods in detecting and measuring lung nodules. These results align with our findings, as we used a similar paired study design where the same reader analysed each case twice [
37]. In a prospective study of 390 patients, cardiothoracic radiologists interpreting chest CT exams with and without AI assistance observed a 22.1% reduction in IT. The AI system automatically labeled, segmented, and measured both normal structures and abnormalities across the cardiac, pulmonary, and musculoskeletal systems [
38]. These findings suggest that AI automation enhances efficiency by streamlining diagnostics and optimising radiology workflows.
In summary, the significant reduction in IT and STAT underscores the transformative impact of AI on radiology. By streamlining the diagnostic process and improving STAT, AI not only enhances efficiency but also shifts the paradigm from traditional FIFO methods to more effective prioritisation. This advancement highlights the remarkable potential of AI to revolutionize radiology workflows, making it an invaluable tool in modern medical practice. This study was conducted across multiple clinical sites, scanner makers, and countries, this reader study benefits from a diverse dataset encompassing various imaging parameters and patient profiles. Moreover, it engaged a panel of readers representing different expertise levels encountered in real world clinical practice.
This study has a few limitations. First, the workflow impact of the AI tool was only assessed among three radiologists of varying experience and so the effect may differ among other radiologists. Future research should evaluate AI's impact across a broader group of radiologists. Additionally, the database analysed only consisted of 100 cases, with 5% excluded due to 'Indeterminate' responses, and future studies should include a larger sample size. Conducting a prospective study in future would provide a more robust validation of these findings in a real-world setting.
Author Contributions
Conceptualization, Y.C., P.D.C., A.A. and S.Q; Methodology, A.A. and S.Q.; Software, M.R.-S.; Validation, Y.C., A.A., and V.L.; Formal analysis, A.A. and M.C.; Investigation, Y.C., J.C.J. B.W. and C.Z.; Resources, P.D.C., D.S.C., J.E.S., and J.C.J., B.W. and C.Z.; Data curation, M.R.-S. and A.A. Writing—original draft preparation, M.C.; Writing—review and editing, M.C., A.A., Y.C., M.R.-S., J.E.S. and S.Q.; Visualization, M.C.; Supervision, Y.C. and A.A.; Project administration, S.Q. All authors have read and agreed to the published version of the manuscript.