1. Introduction
Primary liver cancer is ranked as the third most frequent cause of death and the sixth most commonly diagnosed cancer in 2020 according to the Global Cancer Observatory [
1]. Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer, accounting for approximately 75-85% of cases [
2], representing a major public health burden worldwide. The incidence of HCC is most often linked with the presence of chronic liver disease and cirrhosis is the primary risk factor, with one third of cirrhotic patients reported to develop liver cancer during their lifetime [
2]. In Europe the most common cause of chronic liver disease is hepatitis C virus, followed by excessive alcohol intake [
3]. There is a male predominance when compared to women of 2:1 [
2].
While reducing the incidence of chronic viral hepatitis remains an important goal to prevent the development of chronic liver disease and HCC, other nonviral risk factors besides alcohol consumption are emerging as public health issues in the developed countries. Non-alcoholic fatty liver disease (NAFLD) is reported as the most common cause of hepatic disfunction worldwide [
4], with a prevalence of 25-25% in the general population [
5] and it is projected to reach around 33.5% in 2030 [
6]. Together with non-alcoholic steatohepatitis (NASH), they further influence the development of HCC [
2] as these two entities have a similar potential to progress to advanced liver fibrosis [
4].
HCC so far has been the main indication for transplantation in patients with oncologic disease. Together with NASH and NAFLD it is described as the fastest rising indication for hepatic transplant [
7]. In theory, it is described as the optimal treatment option because it has a double role of eliminating the underlying liver disease while also removing the lesion [
8]. However, the selection of transplant candidates that have developed HCC needs to be rigorous as there is a general organ shortage. The United Network for Organ Sharing (UNOS) has described a drop-out of 20% in patients awaiting transplantation [
9]. In order to reduce these figures, extended criteria for donors have been adopted like elder donors, fatty liver, cardiac arrest donors with unavoidable inferior post-transplant outcomes [
9]. These factors further stress the importance of patient selection and organ allocation to reduce mortality and improve post transplantation survival.
The demand for precision medicine and personalized treatments, together with the technological advances has led to an increasing amount of research regarding the application of artificial intelligence (AI) to medical images. The term and technology are not new as the first artificial neuron was described in 1943 [
10]. Today, AI is a large field of study that incorporates algorithms capable of solving tasks that normally require human intelligence. Machine learning (ML) is a subset of AI that involves extracting patterns from data without explicit programming [
11]. As the algorithm has a more complex structure with multiple components, or “layers”, the term deep learning (DL) is used as a subset of ML [
11]. A simplified version of the relationship between these AI divisions is presented in
Figure 1.
Radiomics is a type of ML that has gained attention because it has the ability to extract complex imaging data that could reflect underlying biological properties of tissues [
12]. The algorithm is able to obtain quantitative features like histogram, shape, texture, radial and transform-based characteristics which are too detailed for the normal human vision to analyse [
13]. These extracted features are analysed by researchers using other AI techniques and the most relevant ones are chosen for implementation [
13]. A simplified radiomics model workflow is portrayed in
Figure 2.
Convolutional neural networks (CNN) represent one of the most successful types of DL algorithms that work explicitly with images and has great potential in the radiology field [
14]. Compared to radiomics, which needed a “human-in-the-loop” approach to analyse the features, CNN’s provide a more “end-to-end” approach as they can segment, analyse and provide an output without human intervention [
15]. A simplified DL model workflow is portrayed in
Figure 3.
The advent of Electronic Health Records (EHR) and digital imaging has led to an increase in medical data with an estimated annual growth of 48% from 2013 to 2020 [
16]. The amount of annually increasing data in radiology makes it a main field of application for these algorithms, which promise to alleviate the imaging burden and help provide better patient care. Also, because medical images contain a lot of embedded information there is hope that more quantitative data can be extracted at a voxel-wise level for diagnostic, staging and prediction purposes.
Several AI models have been tested on clinical and laboratory data to better stratify organ allocation strategies and graft matching [
17,
18,
19]. It has been stated that its dynamical properties regarding testing and validation allow it to better adapt within different populations [
20].
Generally, there are four primary tasks that can be pursued in medical image analysis and interpretation: classification, localization, detection and segmentation [
21]. Classification involves assigning a label to the image (e.g., hepatocellular carcinoma, cholangiocarcinoma, haemangioma, etc.). Localization and detection involve the application of bounding boxes to the structures or lesions of interest and are often a preliminary step to other functions. Segmentation is a complex task as it has to assign pixels to a class (e.g., lesion) in a given image in order to create precise boundaries from surrounding tissues (e.g., liver).
The abdominal region was ranked third in the application of DL to radiology in an 8-year timespan between 2012 and 2020 [
22]. A more focused review of AI hepatic imaging applications ranked diagnosis as the most researched function, followed by prognosis and segmentation, also, HCC was by far the most common research interest [
23]. However, the number of clinically approved AI-applications is limited, caused by the lack of external and prospective validation as well as limited well curated and available datasets. Furthermore, the number of clinically approved hepatic algorithms is inferior to other organs and structures, with only two applications described in an analysis of 100 commercially available radiology products [
24].
Our objectives are to review the current imaging protocols and guidelines for liver transplantation in the setting of HCC and to do an overview of emerging AI applications that can be applied for better patient management.
2. Liver Transplant in HCC
In the following section we provide a brief review of imaging protocols and guidelines for liver transplantation in the setting of HCC.
Liver transplant is an optimal treatment for patients with HCC and cirrhosis because it targets both the underlying disease as well as the tumour [
8]. However, the patients eligible for this treatment have to be carefully selected because there is a general organ shortage and these patients will go through lifelong immunosuppression.
The most widely used criteria for orthotopic liver transplant (OLT) selection in patients with HCC are the Milan criteria [
25] developed in 1996. They are recommended by the European Association for the Study of the Liver (EASL) [
26] the European Society of Medical Oncology (ESMO) [
27], the National Comprehensive Cancer Network (NCCN) [
28] and by the American Association for the Study of Liver Diseases (AASLD) [
29]. According to Milan, LT is recommended in patients with one lesion less than or equal to 5 cm, or up to 3 lesions, each less than or equal to 3 cm. Using these criteria, the five-year survival rates are 65-80% [
26]. Because they delineate a group of patients with cirrhosis and HCC that have transplant results similar with those only with cirrhosis, they have been included since 2002 in UNOS, the organisation that handles organ transplant in the USA. The presence of extrahepatic disease or vascular tumour invasion are absolute contraindications to LT [
26,
27,
28]. These criteria apply to patients who are unsuitable for resection most often because of advanced underlying hepatic disease.
The practice of living donor liver transplant is not very popular in Europe, as it represents around 6-7% of all LT performed yearly, according to data from Euro-transplant statistics in 2020-2021 [
30]. However, together with marginal grafts it remains an option that can be applied in selected patients and in centres with experience [
26,
27].
3. Extending Milan
With LT being the therapy with the highest probability of curing HCC [
31] a lot of research has been done to find the best solution to extend the Milan criteria and to find new markers that better stratify patients so as to improve selection of candidates for this treatment option [
32,
33,
34,
35,
36,
37,
38,
39,
40]. All these criteria describe the presence of extrahepatic disease or macrovascular tumour invasion as absolute contraindications to LT. A summary of transplant criteria is provided in
Table 1.
Several studies have investigated the relevance of alpha fetoprotein (AFP) tumour marker in the management of these patients [
41,
42] with higher risk of recurrence in patients with high AFP levels. Thus, it has been included in the Metroticket V2.0, AFP model, Hangzhou criteria and a threshold of 1000 ng/dl is currently applied in the UNOS criteria.
Other criteria emerge that try to include more robust data like histopathological information, with the inclusion of tumour differentiation in the Extended Toronto criteria and the evaluation of tumour grading in the Hangzhou criteria. Volumetric information can also be used since lesions can sometimes have a variable shape, thus, a threshold criteria has been developed by in the TTV criteria.
While the Milan criteria remain the most widely recommended in the international guidelines, national policies have allowed the adoption of other models as well [
43]. The AFP model has been used in France since 2012. In Italy the Milan, UCSF, TTV, Up-to-7 criteria and the AFP model are all accepted. In Spain both Milan and Up-to-7 criteria are used.
To increase the chance for transplant in patients with HCC the use of loco-regional treatments is supported either to reduce the risk of drop-out in patients within Milan criteria (“Bridging”) or to downstage patients beyond Milan criteria [
26,
27,
28]. The response to loco-regional treatments can be used as a marker for transplant outcome prediction [
44] and it has been included in the TRAIN criteria [
40] because a good response is associated with less probability of microvascular invasion or low tumour grading [
44]. The TRAIN score also proved to be the best predictor for microvascular invasion.
Although a consensus on what is the best option for expanding the LT criteria has not been met, biological and dynamical markers are likely to replace morphological data [
45]. In this context AI comes as a potential aid to process complex information, better stratify these patients and provide AI-markers that reflect tumour biology and aggressiveness. Even further, complex AI-algorithms can integrate heterogenous information like demographics, clinical, laboratory, genetic and imaging data into a model.
There is a clear increase in the number of articles regarding AI solutions in hepatic transplant shown in the graph below (
Figure 4) that analyses data from 2019 to 2022 on Pubmed using they keyworks “Liver transplant”, “Artificial intelligence”, “Machine learning”, “Neural networks” and Boolean operators AND, OR.
4. Discussion and limitations
The ideal scenario for developing AI-applications is to integrate multiple functions in one model that could help with transplant patients. This would provide segmentations with volumetric data and simultaneously analyze pixel-level details regarding tumor features that would stratify patient risk. The individual assessment of characteristics like grading, MVI or the presence of CK19 and GPC3 could be combined in a complex model for risk assessment since some of these markers probably have common imaging features. For example, qualitative evaluation of rim arterial enhancement and irregular margins can be associated with both MVI [
90] and the presence of CK19 [
91] while T1 hypointensity is associated with both low grade lesions [
92] and MVI [
90]. Similarly, reduced hepatobilliary phase intensity can be found in both low grade lesions [
92] and CK19+ lesions [
91].
One barrier to the development of these models is the lack of public datasets. For segmentation there are some public datasets available for CT like the LITS [
56] and 3DIRCAD [
93] but only CHAOS [
94] for MRI. Increasing the amount of training data while providing multi-center acquisitions can boost the performance of AI-models [
95]. These multi-center datasets have to be heterogenous enough to eliminate biases relating to race, gender, ethnicity and age. With regards to the development of biological markers like MVI, GLY3, CK19 or recurrence prediction all of the studies were done retrospectively and no public databases exist. Open competitions on a common dataset would allow better comparison between models so that the performance be evaluated consistently.
For the development of reproducible and transparent models, authors have to provide enough details to allow for full comprehension of the scope and methods used. Some AI adapted publishing guidelines, like CLAIM (Checklist for Artificial Intelligence in Medical Imaging) [
96], have been made available for both authors and reviewers to assure quality standards are met.
AI-models are frequently reported as “black-boxes” because even though they provide predictive answers, at least for DL, no explanation for these outputs exist. Therefore, the question of why some predictions are made remains unanswered. Having a transparent feature selection process that can be evaluated by radiologist might help bridge the gap between clinicians and AI-models. In response to this, the concept of Explainable Artificial Intelligence (XAI) [
97] has emerged to make sure models are not just performant but also that the imaging features that have the most impact on the decision can be highlighted for further study.