1. Introduction
In the medical domain, the localization and determination of a disease’s extension can have a major advantage to the treatment. Ever since imaging modalities were available for cancer therapy, the precise delineation of organs and target volumes has been of great interest. The manual generation of these contours is thereby often time-consuming, requires intensive prior training and often lacks consistency between observers, especially for target volumes [
1,
2]. Because of the importance of available contour annotations in the clinical routine, a lot of research has been conducted in this area. Widespread early approaches that were used to automate medical image segmentation are atlas-based methods [
3,
4,
5]. For this, reference images are first contoured to build the atlas. These atlas images are then registered onto the new image while the same deformation field is applied to the atlas’ contours, resulting in a segmentation of the new image. While this approach was proved to be successful in terms of manual labor reduction [
6,
7], it showed drawbacks in individual segmentation quality, when the image quality or the individual anatomy deviated from the atlas.
With the increase in deep learning (DL) methods that are capable of accurate contouring, the automatization of segmentation (auto-segmentation) is applied in more and more areas in which medical images are analyzed. The most popular network architecture for automatic medical image segmentation is the U-Net which was introduced by Ronneberger et al. [
8]. The deployment of this architecture in a framework with self-configuring hyperparameters, the nnU-Net [
9], increased the accuracy and accessibility of DL-based segmentation methods. With the nnU-Net, it is possible to train a state-of-the-art deep learning model for medical image segmentation tasks on custom data-label pairs, eliminating the need to explore task-specific hyperparameter settings.
While at first, DL methods were optimized to predict single volumes of interest, the importance of models for multi-organ segmentation increases [
10,
11]. Recently, the TotalSegmentator Version 2 toolkit was released
1. The TotalSegmentator is a ready-trained open-access toolkit for the auto-segmentation of 117 anatomical structures in the whole body, which is based on the nnU-Net framework [
11].
Multi-label segmentation models were shown to be beneficial for the segmentation accuracy of individual organs and robustness of the DL methods when compared to single label models [
12]. Currently, most multi-organ segmentation models are trained on sparse labels (i.e. most voxels of an image are not labeled) due to missing dense annotations in available medical image data sets. Aiming to increase the segmentation accuracy, the dense segmentation of the human body is necessary, i.e. the segmentation of every anatomical structure and its substructures. Gare et al. [
13] showed that for ultrasound images dense pixel-labeling improves disease classification when compared to models trained on only sparsely labeled images.
DL-based auto-segmentation enhances different tasks that need medical image segmentation. Enhancements can be in the form of improved standardization, time-savings or refined precision. Relevant tasks can be found in the realm of radiology, surgery [
14], and radiotherapy. It also facilitates research fields like biomechanical modelling [
15], and generation of synthetic medical image data sets [
16], which in return improve the results in clinical applications. Nevertheless, the main application of automatic medical image segmentation methods lies within cancer diagnosis and treatment planning [
17]. In cancer therapy, common auto-segmentation tasks are the segmentation of organs at risk (OARs) [
18,
19], target volumes [
20,
21,
22,
23], and metastases [
24]. For example, Nikolov et al. [
19] trained a DL-based auto-segmentation model that delineates 21 OARs achieving expert-level performance in the head and neck area.
In the field of radiation therapy, the exact contouring of OARs as well as target volumes is of major importance for the treatment outcome. Only with the precise delineation of target volumes and OARs, optimal tumor control can be achieved while adjacent healthy tissues are preserved. This significance is particularly pronounced in the head and neck region, where anatomical structures exhibit close spatial proximity paired with high anatomical flexibility. Target volumes as well as OARs are delineated by experts on the planning CT scans. These volumes are the basis for the objective function in the optimization of the radiation treatment plan.
Different target volumes are defined in radiotherapy. Following [
25], the gross target volume is the visible and palpable, most inner tumor extension. It is surrounded by the clinical target volume (CTV) which comprises tissue that is potentially infiltrated by microscopic tumor cells. The CTV can itself be subdivided into the primary CTV and the nodal CTV. The primary CTV is drawn as a margin of 0.5 – 1 cm around the gross target volume, while the nodal CTV follows the lymphatic pathways and includes all areas that are found to harbor microscopic tumor cells with a probability of 10% or more [
26,
27,
28]. The outermost target volume is the planning target volume which surrounds the union of all former mentioned target volumes and compensates for beam parameter uncertainties, patient placement errors, organ fluctuations and other motion-induced variance [
29].
The extension of the CTV is not visible with modern imaging techniques, since it comprises normal tissues infiltrated by microscopic tumor cells. The definition of its outline is rather based on recurrence studies and thus, empirically built clinical experience [
30,
31]. This makes the delineation of CTVs a difficult task for clinicians that needs many years of training [
32]. Its complexity is not only visible in the training needed to perform this task, but also in the time needed to produce acceptable delineations and in their resulting divergence. Given the same CT scan, the manual CTV delineations of different experts show a large inter- and intra-observer variability of up to 200% difference in volume [
1].
The quality of manual labels heavily affects the training and thus, the prediction accuracy of supervised learning methods. The inconsistent manual delineations of CTVs have a negative impact on the auto-segmentation of target volumes [
33,
34]. For that, researchers in this field focus on curating consistent data sets by executing extensive peer-reviews on the process of manual contouring or incorporating contours of only a minimum number of clinical experts, or institutes [
21,
22,
23]. For CTV delineation, the predicted labels are reported to still need intensive pre- and post-processing [
35,
36,
37,
38] and they are not easily adaptable to changes in segmentation standards or patient-individual requirements. All this is done, aiming for improved spatial conformance of the predicted contour with manual delineation, while knowing that manual delineations are not well standardized.
Not only the comparison to labels that are highly dependent on the expert that generated the label, but also recent studies on evaluation metrics raise critiques on the current state-of-the-art. Reinke et al. [
39] point out that the measurements of pure spatial overlap (i.e. the DICE) do not necessarily quantify the actual quality of interest in medical image segmentation tasks. For the delineation of CTVs the quality of interest that should be measured is the conformance of the CTV delineation with the expert guidelines.
To overcome the variety in CTV delineation, the detailed clinical knowledge about the extension of the CTVs is collected in international consensus expert guidelines including head and neck treatments [
27,
28]. These expert guidelines provide a commonly accepted delineation scheme for the CTVs in a rule-based manner and thus, standardize their segmentation. As one example, Grégoire et al. [
27] focus on the delineation of nodal CTV in the head and neck area. In these expert guidelines, the nodal CTV is subdivided into ten levels with some additional subdivisions. The extent of each single level is described by bordering anatomical structures. Thus, the expert guidelines convert the difficult problem of delineating the extent of cancerous infiltration which is not visible in CT scans, in a contouring task of anatomical structures. The selection of levels that should be irradiated is based on the location of the primary tumor.
Oriented towards the goal of evaluating guideline conformance of CTV delineations, in this study, the 71 most important anatomical structures mentioned in the expert guidelines have been chosen for an auto-segmentation task. For that, all 71 structures have been manually delineated, and used to train nnU-Net models for auto-segmentation. The predictions for 18 unseen data sets are evaluated against the manual labels as well as segmentations generated by the TotalSegmentator, and compared to previously reported segmentation results. So far, studies on the segmentation of anatomical structures have only published results on a small subset of the necessary 71 anatomical structures. The existent results are widely distributed over multiple unrelated publications. Finally, the impact of the segmentation accuracy on the construction of CTV delineation according to the expert guidelines is discussed.
2. Materials and Methods
2.1. Image properties of the data set
The planning CT scans for this study were aggregated from four different study cohorts.
Figure 1 shows an exemplary CT scan of each cohort. All patients received radiotherapy for head and neck cancer. For each patient, there was exactly one planning CT scan considered in this study. Each CT scan consists of 90 to 220 single slices (mean: 141 ± 24) of 512 × 512 voxels each. The voxel size ranged from 0.98 × 0.98 × 2
to 1.27 × 1.27 × 3
.
The training data set and test data set are mutually exclusive. The
training data set (86 scans) included (a) 84 in-house HNC patients from three different cohorts (varying setup, positioning, devices, and protocols) [
43,
44], and (b) 2 open access HNC data sets [
40,
41,
42]. The
test data set (18 scans) is curated from the same three study cohorts (14, and 4 scans, respectively). The patient selection for the test data set was based on available meta-information to best represent the variety of the data cohorts. Factors for the selections were study cohort, location of the primary tumor, gender, presence of a tracheostoma, size of nCTV, estimated age and weight of the patient.
2.2. Label selection and generation of the manual labels
The 71 structures were chosen based on their number of occurrence in the Grégoire et al. [
27] expert guidelines. The resulting set of anatomical structures is visualized in
Figure 2. Manual labels of the 71 anatomical structures were generated for all 104 CT scans by six different trained observers on a Wacom Cintiq 24HD Display in RayStation 8B(R) SP1. The observers were following a standard operation procedure for the delineations that included (a) the unambiguous definition of the structures’ extent (e.g. mandible without teeth), (b) windowing, and (c) spatial restrictions based on other anatomical structures (mostly cranial and caudal). The whole standard operation procedure can be found in
Appendix A.1 . Each data set was at least once reviewed and if necessary adjusted by one of the other observers before it was accepted for the study.
For one patient data set, 41 selected structures were segmented a second time by one of the trained observers who was not involved in the initial segmentation or the review of this patient. Based on those two sets of contours, the inter-observer variability was approximately assessed.
Caused by the field of view of our CT scans, the esophagus, the sternum (corpus and manubrium), the lobes of the lung, the trachea, the trapezius muscles, the brachiocephalic veins, and the skin are never or not always completely present on our patient scans, but cut off on the caudal edge of the scan. The sternum corpus is sometimes not present at all. Further, in cases where the patients were post-operatively irradiated, or the extension of the primary tumor distorted surrounding anatomical structures, the respective missing anatomical structures were not segmented. In total, there were 30 anatomical structures missing. Fifteen of those structures cumulated in two test patients (#8, #7), and three other patients had at least two missing structures. Nine of the 18 test patients were not missing any structure and thus, had the full set of 71 anatomical structures manually segmented.
2.3. Network training and label prediction
For the automatic segmentation, the nnU-Net framework Version 1 was chosen and trained with one adaption to the default parameters: mirroring was removed from the data augmentation to keep the left-right orientation of the patients consistent during training. The final training data set provided for the nnU-Net training was generated by mirroring all 86 training data sets. Left and right instances of anatomical structures were then swapped back for left-right consistency after mirroring.
Since in the nnU-Net Version 1, a network can only be trained for non-overlapping structures, the labels of all 71 anatomical structures were subdivided into three non-overlapping, disjoint subsets, containing (a) the labels for all bones, muscles, vessels, air-related structures, glands and the esophagus (#64), (b) the labels for all cavities (#6), and (c) the skin label (#1). According to the author, nnU-Net Version 2 has no accuracy advantages over its Version 1 [
45].
Following the nnU-Net’s five-fold cross-validation standard, for all three subsets there were five 3D full-resolution models trained with the trainer V2. Fold 1 and fold 2 were using 137 data sets for training and 35 data sets for validation, while fold 3 – 5 were using 138 data sets for training and 34 data sets for validation. Each fold was trained for 1000 epochs. The predictions were made for all 18 previously unseen test data sets in the nnU-Net’s default 5-heads manner. No postprocessing was applied.
All computations were executed using the nnU-Net Version 1.7.0 with Python Version 3.9.7, PyTorch 1.10.2 with CUDA Version 11.3.1. Training and predictions were executed on a computer with an AMD Ryzen™ 9 3900X Processor, 128 GB RAM, with an NVIDIA GeForce RTX 3090, and 24 GB VRAM.
For 16 of our anatomical structures, segmentations can also be retrieved by using the pre-trained TotalSegmentator toolkit. We employed the TotalSegmentator as Python library on our 18 test patients with default configurations. The predictions generated by the TotalSegmentator were run on a computer with an Intel® Core™ i7 Processor, 64 GB RAM, with an NVIDIA GeForce RTX 2070, and 8 GB VRAM.
2.4. Evaluation of predicted labels
We assess the similarity and distance between two distinct labels of the same structure through three metrics: (a) their volumetric overlap, measured using the Sørensen–Dice coefficient (DICE) [
46,
47], (b) the distance between both contours, evaluated by the Hausdorff distance (HD) [
48] and (c) the fraction of deviation larger than 2 mm, quantified using the surface DICE (sDICE) as defined in Nikolov et al. [
19]. For the evaluation of the HD we chose the 95th percentile (HD (95)). The margin of 2 mm is chosen for the sDICE because of its clinical relevance in radiation therapy with photons. The sDICE (2 mm) is considered to indicate the correction effort needed for the predicted CTVs. This selection of metrics is consistent with the metrics reloaded framework [
39]
2. Structures that are not present in the manual labels, in the predicted labels or both sets of labels are left out in the analyses. For the calculation of all metrics, the library surface-distance-based-measures Version 0.1 was used.
4. Discussion
When comparing the grouped DICEm between tissue types, groups with good contrast on CT scans like air-related structures and bones show an increased accuracy when compared to other groups. Noticeably, the variation in DICEm is the largest for the group of muscles. First, this group has the largest number of different anatomical instances. Further, the contrast of soft tissues on CT scans is not sufficient to identify most muscles completely. Finally, the group of muscles is also the most diverse group ranging from structures with an average volume of 550 voxels (digastric muscle) to 55,000 voxels (trapezius muscle).
4.1. Reasons for impaired prediction accuracy
We have visually analyzed cases of impaired prediction accuracy for highlighted anatomical structures from before. In the following section, reasons for this impaired prediction accuracy are summarized.
The visual analysis of cases in which the
internal carotid artery (ICA) shows especially low DICE and sDICE on both sides, results in four common reasons for deviations between the manual segmentation and its prediction: (a) the ICA is a thin structure, (b) the transition between internal carotid artery and common carotid artery varies, (c) the final slice, on which the ICA occurs cranially varies, and (d) due to metal in the mouth, CT artifacts occur in this area.
Figure 5 shows the deviation between manual and predicted segmentation of the ICA due to inconsistent decision on the most cranial slice and the bottom row of
Figure 6 shows metal artifacts.
For the subclavian artery similar reasons are resulting in small DICEm and sDICEm: (a) the subclavian artery is a thin structure, (b) the transition between the right subclavian artery and the brachiocephalic artery varies, and (c) the lateral extension varies.
The visual analysis of the superior constrictor muscles and middle constrictor muscles also results in clear confusion at the area of transition between both structures, as well as the transition between the middle and the inferior constrictor muscles. This observation is supported by the above-median performance of their combination (i.e. constrictors (s., m., i.)). Training their combination, and differentiating the substructures in a rule-based post-processing, might be beneficial to the auto-segmentation of the constrictor muscles and similar cases.
The digastric muscles and the posterior scalene muscles show an (almost) below Q1 performance in DICEm and sDICEm with large standard deviations amongst test patients. DICE values range from [0 – 0.83] for the digastric muscles and [0 – 0.71 (0.81)] for the posterior scalene muscles. sDICE values deviate by more than 0.68 (digastric muscles) and 0.85 (posterior scalene muscles) between minimum and maximum. All predictions show greater accordance with the manual labels than the segmentations generated by the second observer (high inter-observer variability).
The tongue has an above-median DICEm, but a noticeable low sDICEm. Since the tongue is a theoretically easy to locate structure of above-average volume, the DICEm does only marginally indicate problems with its segmentation. The sDICEm signals inconsistencies in the precise outline of the tongue. Reasons are metal artifacts that occur predominantly in the area of the mouth which impair the precise segmentation of the tongue.
The right platysma muscle is an outlier in HDm. The analysis of individual cases shows a deviation of the manual labels in the frontal-dorsal direction and the cranial-caudal direction. Since the platysma muscle is a thin cutaneous muscle, it is sometimes barely visible in its most frontal and most dorsal extension. Thus, the network is trained on only a few extended examples. Auto-segmentations depict only the mostly visible inner extension of the platysma muscles.
4.2. Inter-observer variability, and tracheostomy analysis
The anatomical structures with an inter-observer variability outside the 3
interval around the mean in any of the three metrics or a value below the Q1 in DICE
m or sDICE
m or above the Q3 in HD
m were visually analyzed. Two systematic reasons are found that explain deviations. First, the lateral extension of the subclavian artery was inconsistent. Second, muscular structures were systematically segmented wider by one observer than by the other. This holds for the prevertebral muscles, the sternocleidomastoid muscles, the trapezius muscles and the digastric muscles. The deviation between all scalene muscles and the tonsils did not follow systematic reasons. Those structures are barely or not visible in the planning CT scans. Figure
Figure 6 shows this for the tonsil (green arrows). This results in largely deviating contours between both observers as visualized in the right column of
Figure 7. No unambiguous reason can be given for the right internal carotid artery. As it is a thin structure that is difficult to segment, deviations occur in some central slices, while its left counterpart is much better aligned between both observers. No clear difference is visible between both sides of the patient CT scan.
Although the DL-models were trained on a distinct amount of patient data sets with tracheostomy, leaving out those patients from the analysis improves seventeen selected structures noticeably in almost all of the three metrics. Analyzing the deviation of the DICEm and the sDICEm for all other anatomical structures shows almost no change. Most of the 17 structures are in close proximity to the tracheostomy or the distortions in the larynx caused by tracheostomy.
4.3. Comparison to TotalSegmentator
Most anatomical structures that are automatically segmented by the TotalSegmentator framework (TS) are very similar to our own generated segmentations. For those structures that are deviating noticeable there is a common reason when analyzing the segmentations visually.
Figure 5 includes the 3D comparison of those structures. The most common reason is the disagreement in the starting and ending position of elongated structures like the common carotid artery, the trachea, and the subclavian artery. Our manual segmentations for the common carotid arteries ends cranially at the artery’s bifurcation. Although caudally starting very similarly, the segmentations of the TS end approximately half way to the artery’s bifurcation, close to the cranial edge of the esophagus and the trachea. For the trachea, our manual labels exclude the bronchi, while the TS predicted segmentations include the right and left primary bronchi. Our manual labels for the subclavian artery exceed the TS generated labels laterally.
Deviations in the auto-segmentation of the thyroid gland result from patient-individual differences, rather than a systematic difference in the definition. Especially in patients that are equipped with a tracheostoma, the TS predictions deviate more from the manual segmentations than our own predictions. It might be, that in the training data set on which the TS model was trained, there were less or no patient data with a tracheostoma.
4.4. Impact on CTV delineation
The delineation of CTVs should be targeted for auto-segmentation using DL algorithms. Following the international consensus guidelines of Grégoire et al. [
27]. this study can be the basis for improved standardization and reduced workload. In the following section, the implications are analyzed that the prior described systematic deviations in the auto-segmentations of anatomical structures have on the clinical target volume delineation when following Grégoire et al. [
27].
The predicted contour of the
internal carotid artery (ICA) deviates caudally when transitioning into the common carotid artery (CCA) and its final slice cranially, as well as due to metal artifacts. Within the expert guidelines [
27], the ICA is needed as the medial edge of Level II, the lateral edge of Level VIIa, and the medial edge of the Level VIIb. All these levels are transitioning into each other and the precise boundary becomes only relevant if some, but not all of these levels are irradiated. Since Level II begins caudally approximately where the CCA and ICA are transitioning, one might add the CCA as boundary into the rules when automating the delineation of Level II. The cranial edge of Level II is given by either the lateral process of C1 which the ICA always exceeds, or Level VIIb. The cranial edge of Level VIIb is the base of skull (jugular foramen) which the ICA reaches in all our test patients. Thus, the deviations introduced by the auto-segmentation of the ICA do not affect the CTVs’ delineation.
The predicted contour of the
subclavian artery (SuA) deviates laterally and in its transition to the brachiocephalic artery. Within the expert guidelines [
27], the SuA is needed as the posterior edge of the Level IVb. Caudally, this posterior boundary is cumulating both, the SuA and the brachiocephalic artery, such that their transition does not affect the delineation of the CTV. Also cranially, the lateral deviation of the SuA’s segmentation does not affect the posterior edge of the Level IVb. This is, because the SuA’s extension always exceeds the necessary boundary of Level IVb.
The predicted contour of the
inferior, middle and superior constrictor muscles (CM) deviates caudally and cranially at the transitions between each other. Within the expert guidelines [
27], the CM is needed as the anterior edge of Level VIIa which is bordering the superior or middle pharyngeal constrictor muscle. This boundary is cumulating both, the superior and middle CM, such that their transition does not affect the delineation of the CTV.
The predicted contour of the
platysma muscle (PM) deviates in frontal and dorsal direction as well as in cranial and caudal direction. Within the expert guidelines [
27], the PM is needed as caudal edge of Level Ia and Ib, lateral edge of Level Ib and Level V, and anterior edge of Level VIa. The caudal edge of Level Ia required sufficient delineations of the PM in its central regions which is shown consistently. The caudal edge of Level Ib is described by a plane independent of the PM. The PM only cuts this plane as it is the lateral border of Level Ib. For this, the central parts of the PM are relevant. Those are well-predicted. In the boundary descriptions of Level V and Level VIa, the skin is given as an alternative edge. Since the PM is a thin cutaneous muscle, the expert guidelines already account for its potential invisibility. In this case, there will be no further implications for the CTV delineation than the irradiation of the PM itself.
The predicted contour of the
anterior belly of the digastric muscle (aDM) deviates unsystematically. Within the expert guidelines [
27], the aDM is needed as caudal and lateral edge of Level Ia, and medial edge of Level Ib. For the caudal edge of Level Ia the aDM is not the primary boundary, but a substitute for the PM if the PM is not visible. Due to inconsistent delineations of the sDM, substituting the PM in this case might cause deviations in the caudal boundary of Level Ia. Nevertheless, as discussed before, the PM is often delineated well in the discussed region. Visually analyzing the data, as lateral edge of Level Ia, often the mandible is chosen. Further, as medial edge of Level Ib, often the Level Ia is chosen. Thus, the delineations we got from the clinics do not always spare the aDM. With our inconsistent delineations, we cannot improve this situation and spare the aDM reliably. No solution can be provided for cases in which Level Ib is irradiated while Level Ia is not.
The predicted contour of the
posterior scalene muscle (pSM) deviates unsystematically. Within the expert guidelines [
27], the scalene muscles are needed as medial edge of Level II , Level III, Level IVa, Level V, Level Vc, posterior edge of Level IVa, and lateral edge of Level IVb. Although not specified precisely, the visual analysis shows that most boundaries are given by the anterior scalene muscle. The pSM potentially plays a role in delineating the medial edge of Level V caudally. Here, the confusion between different scalene muscles does not affect CTV delineation, but the pSM could be unintentionally irradiated if contoured erroneously.
The predicted contour of the
tongue and the
tonsils deviate unsystematically due to metal artefacts and missing soft tissue contrast. Since both structures are not used as a boundary definition, but only as selection criterion for nodal levels in the expert guidelines [
27], the CTV delineation is not affected by distortions of these two structures.
4.5. Limitations
In our study, we segmented 71 anatomical structures. With additional tools like the TotalSegmentator, the set of structures can be further extended. Nevertheless, even including multiple models, there are still anatomical structures that are segmented neither previously nor in this study. Thus, the dense segmentation of all anatomical structures in the human body is still at issue. Even if dense annotations become more feasible, the large inter-observer variability indicates upcoming problems related to this topic. Before dense annotations can be generated, better agreement of structures’ definitions should be reached. Their precise delineation could be supported by additional multi-model images.
Not all necessary structures are covered for the auto-segmentation of all CTV levels in the head and neck area. Structures like the posterior belly of the digastric muscle, the mylo-hyoid muscle, the transversal cervical vessel and the infrahyoid (strap) muscles are missing for completeness. Further, some segmented structures do not lead to sufficient prediction accuracy to be spared (e.g. the anterior belly of the digastric muscles).
Although our training data set was very diverse, the number of training and test samples was too low to train the models to identify each image feature and each patient condition. Thus, patients with tracheostomy led to worse segmentation accuracies. The same might hold for postoperative patients, different stages of contrast agents, or different resolutions of CT scans.
Author Contributions
Conceptualization, A.W., M.F. and K.G.; methodology, A.W.; software, A.W., G.S. and J.R.; validation, A.W. and P.H.; formal analysis, A.W.; investigation, A.W.; resources, P.H. and S.A.; data curation, A.W.; writing—original draft preparation, A.W.; writing—review and editing, all authors; visualization, A.W. and K.G.; supervision, S.A., O.J. and M.F.; project administration, A.W.; funding acquisition, A.W., O.J., M.F. and K.G. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Screenshots of planning CT scans from exemplary patients of all four cohorts in sagittal, coronal and transversal view. (a) Open access HNC data set [
40,
41,
42], (b) - (d) in-house HNC data sets. All cohorts differ in their scanning set-up using different treatment couches and immobilization devices. (b) Shows artifacts due to dental implants, and (c) shows artifacts due to the stereotactic frames and underwent tracheostomy.
Figure 1.
Screenshots of planning CT scans from exemplary patients of all four cohorts in sagittal, coronal and transversal view. (a) Open access HNC data set [
40,
41,
42], (b) - (d) in-house HNC data sets. All cohorts differ in their scanning set-up using different treatment couches and immobilization devices. (b) Shows artifacts due to dental implants, and (c) shows artifacts due to the stereotactic frames and underwent tracheostomy.
Figure 2.
Visualization of all 71 anatomical structures manually delineated. Abbreviations: a. artery, an. anterior, i. inferior, m. middle, me. medius, p. posterior, s. superior, v. vein.
Figure 2.
Visualization of all 71 anatomical structures manually delineated. Abbreviations: a. artery, an. anterior, i. inferior, m. middle, me. medius, p. posterior, s. superior, v. vein.
Figure 3.
Mean DICE values between manual delineation and predicted label for each anatomical structure grouped by their tissue types. Means are calculated over all test patients for that the structure is present (maximum 18 test patients). Box plots show the median (cyan) and outliers (cross). Box (blue) reaching from the first quartile (Q1) to the third quartile (Q3), whiskers reaching to the 1.5 interquartile range. Quantities per group were: Air (6), Bones (11), Cartilages (2), Glands (3), Muscles (26), and Vessels (11).
Figure 3.
Mean DICE values between manual delineation and predicted label for each anatomical structure grouped by their tissue types. Means are calculated over all test patients for that the structure is present (maximum 18 test patients). Box plots show the median (cyan) and outliers (cross). Box (blue) reaching from the first quartile (Q1) to the third quartile (Q3), whiskers reaching to the 1.5 interquartile range. Quantities per group were: Air (6), Bones (11), Cartilages (2), Glands (3), Muscles (26), and Vessels (11).
Figure 4.
Mean HD and mean sDICE values between manual delineation and predicted label for each anatomical structure grouped by their tissue types. Means are calculated over all test patients for that the structure is present (maximum 18 test patients). Box plots show the median (cyan) and outliers (cross). Box (blue) reaching from the first quartile (Q1) to the third quartile (Q3), whiskers reaching to the 1.5 interquartile range. Quantities per group were: Air (6), Bones (11), Cartilages (2), Glands (3), Muscles (26), and Vessels (11).
Figure 4.
Mean HD and mean sDICE values between manual delineation and predicted label for each anatomical structure grouped by their tissue types. Means are calculated over all test patients for that the structure is present (maximum 18 test patients). Box plots show the median (cyan) and outliers (cross). Box (blue) reaching from the first quartile (Q1) to the third quartile (Q3), whiskers reaching to the 1.5 interquartile range. Quantities per group were: Air (6), Bones (11), Cartilages (2), Glands (3), Muscles (26), and Vessels (11).
Figure 5.
3D visualization of the subclavian artery (orange, green), the common carotid artery (yellow, brown), the internal carotid artery (dark green, cyan), the trachea (teal), and the constrictor muscles (pink, light green, blue). Contours are generated manually (left), by our trained nnU-Net models (middle), and by the TotalSegmentator (right). Black lines are there for heights comparison.
Figure 5.
3D visualization of the subclavian artery (orange, green), the common carotid artery (yellow, brown), the internal carotid artery (dark green, cyan), the trachea (teal), and the constrictor muscles (pink, light green, blue). Contours are generated manually (left), by our trained nnU-Net models (middle), and by the TotalSegmentator (right). Black lines are there for heights comparison.
Figure 6.
CT slices of two different patients with contours generated manually (left), contours generated by our trained nnU-Net models (middle), and the comparison of both contours without CT slice (right). White arrows indicate large deviations between both contours in the platysma (top row) and the tongue (bottom row). Deviations in the segmentations of the internal carotid artery are indicated by pink arrows (manual labels) and yellow arrows (predicted labels). The right tonsil (green arrow) is not visible.
Figure 6.
CT slices of two different patients with contours generated manually (left), contours generated by our trained nnU-Net models (middle), and the comparison of both contours without CT slice (right). White arrows indicate large deviations between both contours in the platysma (top row) and the tongue (bottom row). Deviations in the segmentations of the internal carotid artery are indicated by pink arrows (manual labels) and yellow arrows (predicted labels). The right tonsil (green arrow) is not visible.
Figure 7.
CT slice (top) with contours generated manually (area) for comparison (outline) with contours predicted by our trained nnU-Net models (left), and contours manually delineated by another trained observer (right). The second set of contours does not contain all 71 structures (no outlines). Green (right) and yellow (left) arrows point to corresponding segmentations of the posterior scalene muscle generated by one observer (darker color) or the other (lighter color). Same contours whiteout CT slice are visualized in the bottom row.
Figure 7.
CT slice (top) with contours generated manually (area) for comparison (outline) with contours predicted by our trained nnU-Net models (left), and contours manually delineated by another trained observer (right). The second set of contours does not contain all 71 structures (no outlines). Green (right) and yellow (left) arrows point to corresponding segmentations of the posterior scalene muscle generated by one observer (darker color) or the other (lighter color). Same contours whiteout CT slice are visualized in the bottom row.
Table 1.
List of all segmented anatomical structures (right (r), left (l)) and their combinations (e.g. sternum (M., C.)) sorted by tissue type. For each structure, the DICE (mean ± standard deviation) between the manual contours and our models’ predicted contours (pred.) is given, as well as the inter-observer variability in DICE (calculation based on a single patient data set). Asterisks (*) indicates inter-observer variability values outside the 3 interval given by the mean and standard deviation of the models’ comparison to the manual labels. The last column shows DICE previously reported results as mean ± standard deviation (single comparison) or the range of means (multiple comparisons). Superscript numbers indicate differences between the structure’s definition in the literature and the definition used in this paper. Explanations are found as footnote at the end of the table
Table 1.
List of all segmented anatomical structures (right (r), left (l)) and their combinations (e.g. sternum (M., C.)) sorted by tissue type. For each structure, the DICE (mean ± standard deviation) between the manual contours and our models’ predicted contours (pred.) is given, as well as the inter-observer variability in DICE (calculation based on a single patient data set). Asterisks (*) indicates inter-observer variability values outside the 3 interval given by the mean and standard deviation of the models’ comparison to the manual labels. The last column shows DICE previously reported results as mean ± standard deviation (single comparison) or the range of means (multiple comparisons). Superscript numbers indicate differences between the structure’s definition in the literature and the definition used in this paper. Explanations are found as footnote at the end of the table
|
Structure |
pred. vs. manual |
interobserver |
literature |
Air |
Auditory Canal (l) |
0.77 ± 0.09 |
|
0.83 ± 0.02 [50]2
|
Auditory Canal (r) |
0.80 ± 0.10 |
|
0.83 ± 0.02 [50]2
|
Larynx (air) |
0.86 ± 0.06 |
|
|
Lung (l) |
0.99 ± 0.01 |
|
0.98 [53]1, 2
|
Lung (r) |
0.99 ± 0.01 |
|
0.98 [53]1, 2
|
Trachea |
0.90 ± 0.07 |
|
|
Bones |
Cheek Bone (l) |
0.78 ± 0.04 |
|
|
Cheek Bone (r) |
0.78 ± 0.06 |
|
|
Clavicle (l) |
0.93 ± 0.02 |
|
|
Clavicle (r) |
0.93 ± 0.01 |
|
|
Hyoid Bone |
0.82 ± 0.07 |
0.76 |
|
Mandible |
0.88 ± 0.06 |
0.78 |
[0.86 - 0.99] [52,54,55,56] |
Sternum (M., C.) |
0.93 ± 0.04 |
|
0.83 [57]1
|
Sternum Corpus |
0.82 ± 0.22 |
|
0.90 ± 0.03 [58]1
|
Sternum Manubrium |
0.90 ± 0.06 |
0.88 |
|
Styloid Process (l) |
0.72 ± 0.14 |
|
|
Styloid Process (r) |
0.77 ± 0.08 |
|
|
Vertebra C1 |
0.86 ± 0.04 |
0.84 |
|
Ca. |
Cricoid Cartilage |
0.69 ± 0.15 |
0.78 |
0.66 ± 0.12 [52] |
Thyroid Cartilage |
0.85 ± 0.06 |
0.85 |
|
Gland |
Submandibular Gland (l) |
0.77 ± 0.17 |
|
[0.70 - 0.97] [51,52,54,55] |
Submandibular Gland (r) |
0.78 ± 0.13 |
|
[0.73 - 0.98] [51,52,54,55] |
Thyroid Gland |
0.81 ± 0.13 |
|
0.83 ± 0.08 [52] |
Vessels |
Brachiocephalic Artery |
0.84 ± 0.06 |
0.85 |
|
Brachiocephalic Vein (l) |
0.82 ± 0.10 |
0.77 |
|
Brachiocephalic Vein (r) |
0.82 ± 0.07 |
0.76 |
|
Common Carotid Artery (l) |
0.81 ± 0.08 |
0.72 |
|
Common Carotid Artery (r) |
0.78 ± 0.10 |
0.50 |
|
Internal Carotid Artery (l) |
0.61 ± 0.15 |
0.25 |
0.81, 0.86 [49,50]2
|
Internal Carotid Artery (r) |
0.55 ± 0.22 |
0.49 |
0.81, 0.86 [49,50]2
|
Internal Jugular Vein (l) |
0.78 ± 0.13 |
0.45 |
|
Internal Jugular Vein (r) |
0.75 ± 0.18 |
0.53 |
|
Subclavian Artery (l) |
0.74 ± 0.09 |
0.54 |
|
Subclavian Artery (r) |
0.74 ± 0.13 |
0.34
|
|
Muscles |
Constrictors (s., m., i.) |
0.56 ± 0.12 |
0.74 |
0.52, 0.68 [51,52] |
Inferior Constrictor |
0.44 ± 0.16 |
0.54 |
[0.65 - 0.80] [55,59] |
Middle Constrictor |
0.45 ± 0.18 |
0.66 |
[0.60 - 0.84] [55,59] |
Superior Constrictor |
0.48 ± 0.19 |
0.42 |
[0.67 - 0.83] [55,59] |
Digastric (l) |
0.52 ± 0.24 |
0.39 |
|
Digastric (r) |
0.46 ± 0.28 |
0.33 |
|
Levator Scapulae (l) |
0.87 ± 0.05 |
|
0.76 ± 0.01 [60] |
Levator Scapulae (r) |
0.83 ± 0.07 |
|
0.76 ± 0.01 [60] |
Platysma (l) |
0.59 ± 0.12 |
|
|
Platysma (r) |
0.52 ± 0.16 |
|
|
Prevertebral (l) |
0.74 ± 0.07 |
0.53
|
0.70 ± 0.01 [60] |
Prevertebral (r) |
0.76 ± 0.06 |
0.50
|
0.71 ± 0.01 [60] |
Scalene (an., me., p.) (l) |
0.74 ± 0.09 |
0.44
|
|
Scalene (an., me., p.) (r) |
0.71 ± 0.11 |
0.03
|
|
Anterior Scalene (l) |
0.82 ± 0.06 |
0.60
|
|
Anterior Scalene (r) |
0.80 ± 0.06 |
0.00
|
|
Muscles |
Medius Scalene (l) |
0.68 ± 0.10 |
0.14
|
|
Medius Scalene (r) |
0.66 ± 0.16 |
0.03
|
|
Posterior Scalene (l) |
0.40 ± 0.20 |
0.01 |
|
Posterior Scalene (r) |
0.42 ± 0.28 |
0.00 |
|
Sternothyroid (l) |
0.58 ± 0.08 |
|
|
Sternothyroid (r) |
0.59 ± 0.09 |
|
|
Sternocleidomastoid (l) |
0.84 ± 0.07 |
0.51
|
0.73 ± 0.02 [60] |
Sternocleidomastoid (r) |
0.81 ± 0.15 |
0.52 |
0.74 ± 0.02 [60] |
Thyrohyoid (l) |
0.50 ± 0.17 |
0.48 |
|
Thyrohyoid (r) |
0.56 ± 0.12 |
0.56 |
|
Trapezius (l) |
0.90 ± 0.03 |
0.65* |
0.41 ± 0.04 [60] |
Trapezius (r) |
0.89 ± 0.04 |
0.72* |
0.45 ± 0.04 [60] |
Tongue |
0.63 ± 0.17 |
|
|
|
Esophagus |
0.80 ± 0.10 |
|
[0.55 - 0.83] [52,55]3
|
|
Hard Palate |
0.63 ± 0.13 |
|
|
|
Hypopharynx |
0.64 ± 0.15 |
0.71 |
|
|
Nasal Cavity (l) |
0.86 ± 0.03 |
|
|
|
Nasal Cavity (r) |
0.86 ± 0.03 |
|
|
|
Nasopharynx |
0.83 ± 0.09 |
0.74 |
|
|
Oral Cavity |
0.85 ± 0.07 |
|
[0.85 - 0.93] [52,55] |
|
Oropharynx |
0.84 ± 0.09 |
0.83 |
|
|
Pharynx (nasop., orop., hyp.) |
0.82 ± 0.07 |
0.83 |
0.69 ± 0.06 [54] |
|
Skin |
0.99 ± 0.00 |
|
|
|
Soft Palate |
0.61 ± 0.19 |
|
|
|
Tonsil (l) |
0.08 ± 0.13 |
0.12 |
|
|
Tonsil (r) |
0.12 ± 0.15 |
0.15 |
|
Table 2.
List of all segmented anatomical structures (right (r), left (l)) and their combinations (e.g. sternum (M., C.)) sorted by tissue type. For each structure, the HD (95) and sDICE (2 mm) (mean ± standard deviation) between the manual contours and our models’ predicted contours (pred.) is given, as well as the inter-observer variability in HD (95) and sDICE (2 mm) (calculation based on a single patient data set). Asterisks (*) indicates inter-observer variability values outside the 3s interval given by the mean and standard deviation of the models’ comparison to the manual labels
Table 2.
List of all segmented anatomical structures (right (r), left (l)) and their combinations (e.g. sternum (M., C.)) sorted by tissue type. For each structure, the HD (95) and sDICE (2 mm) (mean ± standard deviation) between the manual contours and our models’ predicted contours (pred.) is given, as well as the inter-observer variability in HD (95) and sDICE (2 mm) (calculation based on a single patient data set). Asterisks (*) indicates inter-observer variability values outside the 3s interval given by the mean and standard deviation of the models’ comparison to the manual labels
|
|
HD (95) |
|
sDICE (2 mm) |
|
|
Structure |
pred. vs. manual |
interobserver |
pred. vs. manual |
interobserver |
Air |
Auditory Canal (l) |
5.16 ± 2.94 |
|
0.88 ± 0.08 |
|
Auditory Canal (r) |
4.76 ± 3.16 |
|
0.89 ± 0.09 |
|
Larynx (air) |
6.74 ± 4.13 |
|
0.89 ± 0.06 |
|
Lung (l) |
1.42 ± 1.00 |
|
0.97 ± 0.03 |
|
Lung (r) |
1.50 ± 0.86 |
|
0.98 ± 0.02 |
|
Trachea |
6.87 ± 5.49 |
|
0.90 ± 0.08 |
|
Bones |
Cheek Bone (l) |
4.23 ± 2.89 |
|
0.92 ± 0.05 |
|
Cheek Bone (r) |
4.36 ± 3.37 |
|
0.92 ± 0.07 |
|
Clavicle (l) |
1.33 ± 0.67 |
|
0.98 ± 0.02 |
|
Clavicle (r) |
1.25 ± 0.49 |
|
0.98 ± 0.01 |
|
Hyoid Bone |
3.23 ± 3.77 |
1.96 |
0.95 ± 0.06 |
0.97 |
Mandible |
2.31 ± 1.67 |
2.77 |
0.96 ± 0.04 |
0.88 |
Sternum (M., C.) |
1.98 ± 1.63 |
|
0.97 ± 0.04 |
|
Sternum Corpus |
5.87 ± 6.69 |
|
0.87 ± 0.20 |
|
Sternum Manubrium |
3.99 ± 4.18 |
3.00 |
0.93 ± 0.08 |
0.93 |
Styloid Process (l) |
5.72 ± 9.58 |
|
0.92 ± 0.13 |
|
Styloid Process (r) |
2.01 ± 0.97 |
|
0.97 ± 0.03 |
|
Vertebra C1 |
3.07 ± 1.24 |
3.16 |
0.93 ± 0.04 |
0.90 |
Ca. |
Cricoid Cartilage |
6.15 ± 3.30 |
3.16 |
0.82 ± 0.14 |
0.92 |
Thyroid Cartilage |
2.40 ± 2.10 |
0.98 |
0.96 ± 0.04 |
0.98 |
Gland |
Submandibular Gland (l) |
5.04 ± 4.28 |
|
0.85 ± 0.15 |
|
Submandibular Gland (r) |
4.50 ± 2.69 |
|
0.80 ± 0.23 |
|
Thyroid Gland |
6.12 ± 9.45 |
|
0.89 ± 0.13 |
|
Vessels |
Brachiocephalic Artery |
3.90 ± 2.66 |
3.00 |
0.89 ± 0.09 |
0.96 |
Brachiocephalic Vein (l) |
3.53 ± 1.58 |
6.00 |
0.90 ± 0.08 |
0.88 |
Brachiocephalic Vein (r) |
4.88 ± 2.09 |
4.08 |
0.86 ± 0.07 |
0.85 |
Common Carotid Artery (l) |
5.01 ± 7.04 |
2.94 |
0.94 ± 0.06 |
0.94 |
Common Carotid Artery (r) |
3.48 ± 2.69 |
4.38 |
0.92 ± 0.07 |
0.81 |
Internal Carotid Artery (l) |
7.53 ± 8.95 |
11.17 |
0.84 ± 0.12 |
0.38* |
Internal Carotid Artery (r) |
13.85 ± 15.86 |
4.38 |
0.75 ± 0.20 |
0.80 |
Internal Jugular Vein (l) |
9.57 ± 23.20 |
9.00 |
0.91 ± 0.10 |
0.64 |
Internal Jugular Vein (r) |
8.25 ± 14.72 |
6.20 |
0.87 ± 0.14 |
0.73 |
Subclavian Artery (l) |
16.36 ± 19.40 |
81.22* |
0.84 ± 0.11 |
0.54 |
Subclavian Artery (r) |
10.27 ± 12.35 |
75.01* |
0.83 ± 0.12 |
0.42* |
Muscles |
Constrictors (s., m., i.) |
7.19 ± 6.40 |
3.00 |
0.89 ± 0.08 |
0.95 |
Inferior Constrictor |
7.10 ± 6.16 |
2.77 |
0.82 ± 0.16 |
0.95 |
Middle Constrictor |
9.66 ± 6.41 |
9.00 |
0.72 ± 0.18 |
0.88 |
Superior Constrictor |
11.23 ± 8.38 |
9.00 |
0.73 ± 0.22 |
0.75 |
Digastric (l) |
6.08 ± 3.90 |
6.30 |
0.73 ± 0.22 |
0.58 |
Digastric (r) |
8.52 ± 5.28 |
6.96 |
0.64 ± 0.30 |
0.52 |
Levator Scapulae (l) |
3.86 ± 2.05 |
|
0.92 ± 0.05 |
|
Levator Scapulae (r) |
5.26 ± 2.87 |
|
0.88 ± 0.07 |
|
Platysma (l) |
13.02 ± 9.59 |
|
0.82 ± 0.12 |
|
Platysma (r) |
19.40 ± 11.75 |
|
0.75 ± 0.17 |
|
Prevertebral (l) |
7.35 ± 8.25 |
6.86 |
0.90 ± 0.05 |
0.75 |
Prevertebral (r) |
7.29 ± 8.51 |
6.28 |
0.91 ± 0.05 |
0.73* |
Scalene (an., me., p.) (l) |
5.74 ± 3.20 |
13.09 |
0.86 ± 0.08 |
0.64 |
Scalene (an., me., p.) (r) |
7.59 ± 5.19 |
15.80 |
0.82 ± 0.10 |
0.21* |
Anterior Scalene (l) |
7.36 ± 9.67 |
15.00 |
0.92 ± 0.07 |
0.85 |
Anterior Scalene (r) |
8.19 ± 9.73 |
16.69 |
0.89 ± 0.07 |
0.17* |
|
Medius Scalene (l) |
6.06 ± 2.84 |
9.82 |
0.81 ± 0.10 |
0.42* |
|
Medius Scalene (r) |
7.63 ± 4.11 |
19.16 |
0.78 ± 0.11 |
0.21
|
|
Posterior Scalene (l) |
14.84 ± 8.84 |
17.71 |
0.56 ± 0.23 |
0.14 |
|
Posterior Scalene (r) |
17.16 ± 16.53 |
19.45 |
0.57 ± 0.30 |
0.10 |
|
Sternothyroid (l) |
4.48 ± 2.36 |
|
0.89 ± 0.08 |
|
|
Sternothyroid (r) |
4.87 ± 2.03 |
|
0.89 ± 0.08 |
|
|
Sternocleidomastoid (l) |
4.94 ± 5.34 |
22.57
|
0.92 ± 0.08 |
0.50
|
|
Sternocleidomastoid (r) |
12.31 ± 24.65 |
20.98 |
0.88 ± 0.15 |
0.54 |
|
Thyrohyoid (l) |
4.16 ± 2.68 |
3.10 |
0.86 ± 0.12 |
0.91 |
|
Thyrohyoid (r) |
3.08 ± 1.18 |
4.04 |
0.90 ± 0.07 |
0.87 |
|
Trapezius (l) |
2.38 ± 0.76 |
12.96
|
0.96 ± 0.03 |
0.69
|
|
Trapezius (r) |
2.43 ± 0.59 |
9.42
|
0.95 ± 0.04 |
0.71
|
|
Tongue |
13.29 ± 5.51 |
|
0.43 ± 0.17 |
|
|
Esophagus |
6.15 ± 5.92 |
|
0.88 ± 0.10 |
|
|
Hard Palate |
7.60 ± 4.08 |
|
0.73 ± 0.12 |
|
|
Hypopharynx |
6.74 ± 3.85 |
2.94 |
0.83 ± 0.12 |
0.93 |
|
Nasal Cavity (l) |
2.30 ± 0.79 |
|
0.96 ± 0.02 |
|
|
Nasal Cavity (r) |
2.26 ± 0.74 |
|
0.96 ± 0.02 |
|
|
Nasopharynx |
4.84 ± 3.35 |
4.94 |
0.84 ± 0.12 |
0.72 |
|
Oral Cavity |
7.56 ± 3.80 |
|
0.67 ± 0.12 |
|
|
Oropharynx |
6.40 ± 4.89 |
6.00 |
0.88 ± 0.09 |
0.83 |
|
Pharynx (nasop., orop., hyp.) |
5.15 ± 2.78 |
3.30 |
0.89 ± 0.06 |
0.91 |
|
Skin |
1.88 ± 1.08 |
|
0.96 ± 0.05 |
|
|
Soft Palate |
9.33 ± 7.89 |
|
0.75 ± 0.18 |
|
|
Tonsil (l) |
10.57 ± 8.90 |
15.00 |
0.20 ± 0.23 |
0.26 |
|
Tonsil (r) |
11.15 ± 8.19 |
15.13 |
0.28 ± 0.27 |
0.31 |
Table 4.
Subset of segmented anatomical structures of this study for which segmentation labels are also available in the TotalSegmentator toolkit [
11]. For each structure, the DICE (mean ± standard deviation) between the TS predicted contour (pred.) and the manual contour is given, as well as the decline in mean DICE (diff.) between the TS predicated contour and our models’ predicted contour.
Table 4.
Subset of segmented anatomical structures of this study for which segmentation labels are also available in the TotalSegmentator toolkit [
11]. For each structure, the DICE (mean ± standard deviation) between the TS predicted contour (pred.) and the manual contour is given, as well as the decline in mean DICE (diff.) between the TS predicated contour and our models’ predicted contour.
Structure |
pred. vs. manual |
diff. |
Lung (l) |
0.98 ± 0.01 |
-0.01 |
Lung (r) |
0.98 ± 0.01 |
-0.01 |
Trachea |
0.80 ± 0.06 |
-0.10 |
Clavicle (l) |
0.89 ± 0.03 |
-0.04 |
Clavicle (r) |
0.88 ± 0.02 |
-0.06 |
Sternum (M., C.) |
0.90 ± 0.02 |
-0.02 |
Vertebra C1 |
0.81 ± 0.04 |
-0.05 |
Thyroid Gland |
0.71 ± 0.14 |
-0.10 |
Brachiocephalic Artery |
0.75 ± 0.07 |
-0.09 |
Brachiocephalic Vein (l) |
0.76 ± 0.10 |
-0.05 |
Brachiocephalic Vein (r) |
0.72 ± 0.08 |
-0.10 |
Common Carotid Artery (l) |
0.64 ± 0.13 |
-0.17 |
Common Carotid Artery (r) |
0.55 ± 0.18 |
-0.23 |
Subclavian Artery (l) |
0.67 ± 0.10 |
-0.07 |
Subclavian Artery (r) |
0.65 ± 0.14 |
-0.09 |
Esophagus |
0.77 ± 0.09 |
-0.04 |
Table 5.
Subset of segmented anatomical structures of this study for which segmentation labels are also available in the TotalSegmentator toolkit [
11]. For each structure, the HD and the sDICE (mean ± standard deviation, each) between the TS predicted contour (pred.) and the manual contour is given, as well as the decline in mean HD and sDICE (diff.) between the TS predicated contour and our models’ predicted contour.
Table 5.
Subset of segmented anatomical structures of this study for which segmentation labels are also available in the TotalSegmentator toolkit [
11]. For each structure, the HD and the sDICE (mean ± standard deviation, each) between the TS predicted contour (pred.) and the manual contour is given, as well as the decline in mean HD and sDICE (diff.) between the TS predicated contour and our models’ predicted contour.
|
HD (95) |
|
sDICE (2 mm) |
|
Structure |
pred. vs. manual |
diff. |
pred. vs. manual |
diff. |
Lung (l) |
2.18 ± 1.31 |
0.76 |
0.97 ± 0.03 |
-0.01 |
Lung (r) |
1.91 ± 1.31 |
0.41 |
0.97 ± 0.01 |
0.00 |
Trachea |
16.04 ± 6.73 |
9.17 |
0.80 ± 0.09 |
-0.10 |
Clavicle (l) |
2.54 ± 1.82 |
1.21 |
0.96 ± 0.03 |
-0.02 |
Clavicle (r) |
2.83 ± 1.69 |
1.57 |
0.94 ± 0.03 |
-0.04 |
Sternum (M., C.) |
2.98 ± 1.45 |
1.00 |
0.94 ± 0.03 |
-0.03 |
Vertebra C1 |
3.70 ± 1.52 |
0.63 |
0.90 ± 0.06 |
-0.03 |
Thyroid Gland |
8.89 ± 8.70 |
2.77 |
0.79 ± 0.15 |
-0.11 |
Brachiocephalic Artery |
9.29 ± 5.16 |
5.39 |
0.80 ± 0.08 |
-0.09 |
Brachiocephalic Vein (l) |
5.82 ± 2.07 |
2.28 |
0.86 ± 0.08 |
-0.04 |
Brachiocephalic Vein (r) |
7.68 ± 2.96 |
2.80 |
0.79 ± 0.08 |
-0.07 |
Common Carotid Artery (l) |
25.15 ± 17.16 |
20.14 |
0.80 ± 0.13 |
-0.13 |
Common Carotid Artery (r) |
28.41 ± 20.01 |
24.94 |
0.71 ± 0.17 |
-0.22 |
Subclavian Artery (l) |
23.94 ± 16.66 |
7.58 |
0.79 ± 0.10 |
-0.05 |
Subclavian Artery (r) |
20.88 ± 17.13 |
10.61 |
0.75 ± 0.14 |
-0.08 |
Esophagus |
9.80 ± 9.62 |
3.65 |
0.85 ± 0.10 |
-0.03 |