Performance reporting design in artificial intelligence studies using image-based TNM staging and prognostic parameters in rectal cancer: a systematic review

Article information

Ann Coloproctol. 2024;40(1):13-26

Publication date (electronic) : 2024 February 28

doi : https://doi.org/10.3393/ac.2023.00892.0127

Minsung Kim ¹

, Taeyong Park ²

, Bo Young Oh ¹

, Min Jeong Kim ³

, Bum-Joo Cho ²

, Il Tae Son^,¹

¹Department of Surgery, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang, Korea

²Medical Artificial Intelligence Center, Hallym University Medical Center, Anyang, Korea

³Department of Radiology, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang, Korea

Correspondence to: Il Tae Son, MD Department of Surgery, Hallym University Medical Center, Hallym University College of Medicine, 22 Gwanpyeong-ro 170beon-gil, Dongan-gu, Anyang14068, Korea Email: 1tae99@hanmail.net

Received 2023 December 26; Revised 2024 January 15; Accepted 2024 January 16.

Abstract

Purpose

The integration of artificial intelligence (AI) and magnetic resonance imaging in rectal cancer has the potential to enhance diagnostic accuracy by identifying subtle patterns and aiding tumor delineation and lymph node assessment. According to our systematic review focusing on convolutional neural networks, AI-driven tumor staging and the prediction of treatment response facilitate tailored treatment strategies for patients with rectal cancer.

Methods

This paper summarizes the current landscape of AI in the imaging field of rectal cancer, emphasizing the performance reporting design based on the quality of the dataset, model performance, and external validation.

Results

AI-driven tumor segmentation has demonstrated promising results using various convolutional neural network models. AI-based predictions of staging and treatment response have exhibited potential as auxiliary tools for personalized treatment strategies. Some studies have indicated superior performance than conventional models in predicting microsatellite instability and KRAS status, offering noninvasive and cost-effective alternatives for identifying genetic mutations.

Conclusion

Image-based AI studies for rectal cancer have shown acceptable diagnostic performance but face several challenges, including limited dataset sizes with standardized data, the need for multicenter studies, and the absence of oncologic relevance and external validation for clinical implantation. Overcoming these pitfalls and hurdles is essential for the feasible integration of AI models in clinical settings for rectal cancer, warranting further research.

Keywords: Rectal neoplasms; Artificial intelligence; Convolutional neural network

INTRODUCTION

Over 1 million people worldwide die due to colorectal cancer (CRC) each year. According to the statistics by the US National Cancer Institute, CRC is the third most common cancer in both men and women [1]. Despite advancements in treatment and implementation of a nationwide screening program, the mortality rate has steadily increased [2]. The current status of treatment of patients with rectal cancer requires a multidisciplinary strategy encompassing local excision, total mesorectal excision (TME), chemotherapy, and radiotherapy [3]. For patients with early rectal cancer, local excision is an optional treatment within the spectrum of organ-preserving strategies [4, 5]. In patients with advanced rectal cancer, TME is the treatment of choice [6, 7]. Recently, minimally invasive surgical techniques, such as robotic surgery and transanal surgery, have been increasingly performed [8, 9]. In patients with advanced rectal cancer, if there is a good response to treatment after preoperative chemoradiotherapy (CRT), local excision or a nonoperative management, also known as the “watch-and-wait” strategy, may be attempted [10, 11]. It is important to select the appropriate candidate for tailored treatment in rectal cancer, which requires accurate diagnosis and assessment of response to preoperative treatment [12, 13].

Magnetic resonance imaging (MRI) is considered to be the most valuable imaging modality for primary staging and restaging after CRT, guiding subsequent medical decisions in rectal cancer management [14–18]. With advancements in artificial intelligence (AI), researchers and clinicians have been exploring innovative ways to utilize MRI data through AI algorithms to improve diagnostic accuracy and prognostic prediction. By harnessing the power of AI, it is possible to analyze MRI images with high speed and accuracy [19]. AI algorithms can identify subtle patterns within images, aiding in the precise delineation of tumor borders, assessment of lymph node involvement, and evaluation of potential metastases. Accurate staging through AI-driven imaging diagnosis can enable tailored treatment strategies, optimizing the selection of patients for neoadjuvant therapies and guiding clinicians in making decisions regarding the extent of surgical resection [20, 21]. Radiomics, an AI method, aims to establish models that enhance diagnostic accuracy by extracting and analyzing first- and high-order features from medical images [22]. Radiomics has advantages for small datasets; however, it may not perform as effectively as neural networks, particularly for large datasets [22, 23]. One of the most commonly used neural network architectures in medical imaging is a convolutional neural network (CNN). CNNs consist of convolutional, pooled, and fully connected layers. CNNs progressively identify abstract and intricate features, making them pivotal in medical imaging research for tasks such as the differential diagnosis of tumors, tumor segmentation, lesion detection, and accelerated imaging [23, 24].

With the rapid advancement of AI in medical imaging of rectal cancer, this study aimed to provide a summary of several important topics in current image-based studies using AI for patients with rectal cancer by investigating the performance reporting design based on the quality of the dataset, annotation with ground-truth labelling, model performance, and external validation. In this study, we focused on how neural networks see images identical to images clinicians see during examination. We excluded studies using radiomics to transform actual images.

METHODS

This study analyzes the utilization of images, architectures employed, dataset constitution, annotation, and diagnostic performance of each research investigation. We reviewed tumor segmentation, TNM staging, genotyping, risk factors such as circumferential resection margin (CRM), and treatment response corresponding to the current guidelines for rectal cancer treatment [15, 25]. A systematic search of the Cochrane Library, PubMed (MEDLINE), Embase, and IEEE Xplore databases was performed for studies published between 2017 and 2023. This study focused on neural network models for medical images, and excluded studies on surgical procedures using only radiomics.

RESULTS

AI in rectal cancer imaging

Segmentation

As AI technologies continue to advance, researchers have begun to explore the feasibility of applying AI techniques to tumor segmentation. Because of individual variations in the perception of the disease, manual labelling may be subjective and time-consuming. AI-based tumor segmentation can be more objective than manual labelling and can reduce labor burden. Although MRI provides clear delineation of rectal structures and tumor appearance, accurate segmentation is challenging because of the complex background of rectal images [23].

Table 1 summarizes AI studies on segmentation of rectal cancer [26–35]. A CNN was initially applied to the segmentation of rectal cancer. Trebeschi et al. [26] used a CNN model based on T2-weighted image (T2WI) and diffusion-weighted image (DWI) for 132 patients with rectal cancer. The performance of the AI model achieved a dice similarity coefficient (DSC) of 0.70 and an area under the receiver operating characteristic curve (AUROC) of 0.99. As an excellent segmentation framework for medical images, U-Net [36] has been applied for the segmentation of rectal cancer. Wang et al. [27] used a 2-dimensional (2D) U-Net based on T2WI obtained from 113 patients with rectal cancer. The DSC, Hausdorff distance, average surface distance, and Jaccard index values were 0.74, 20.44, 3.25, and 0.60, respectively. Kim et al. [28] obtained MRI of 133 patients with rectal cancer. The ground truth of the tumor for all 2D MRI was defined by 2 gastrointestinal radiologists, and 2D slices were manually selected by gastrointestinal radiologists. U-Net [36], FCN-8 (fully convolutional networks, 8 pixels) [37], and SegNet [38] were used for tumor segmentation, and their performances were compared. U-Net was superior to the other models and achieved a DSC of 0.81, sensitivity of 0.79, and specificity of 0.98. Pang et al. [29] used U-Net based on T2WI obtained from 134 patients with rectal cancer and externally validated 34 patients from different hospitals. The DSC, sensitivity, and specificity values were 0.95, 0.97, and 0.96, respectively. Knuth et al. [30] collected 2 cohorts of patients with rectal cancer from different hospitals and used 2D U-Net to perform tumor segmentation on T2WI. The DSC was 0.78. DeSilvio et al. [31] used region-specific U-Net and multiclass U-Net based on T2WI obtained from 92 patients with rectal cancer and performed external validation with 11 patients from different hospitals. The performance of the region-specific U-Net was superior to that of the multiclass U-Net. The region-specific U-Net achieved a DSC of 0.91 and a Hausdorff distance of 2.45. Researchers have used models other than U-Net. Zhang et al. [32] used a 3D V-Net [39] based on T2WI and DWI obtained from 202 patients with rectal cancer. The DSC of T2WI was 0.89±0.21 and that of DWI was 0.96±0.06. Fig. 1 shows examples of tumor segmentation of AI model [32].

Table 1.

Artificial intelligence studies on segmentation of rectal cancer

Fig. 1.

Examples of rectal cancer segmentation. Illustration of automated segmentation using 3-dimensional V-Net versus ground truth on rectal magnetic resonance images of a 51-year old man. Purple indicates tumor, yellow indicates normal rectal wall, and blue indicates lumen. Reprinted from Zhang et al. [32], available under the Creative Commons Attribution License.

Studies have also been conducted on lymph node segmentation using AI. Zhao et al. [33] developed an AI model for lymph node detection and segmentation based on multiparametric MRI scans obtained from 293 patients with rectal cancer. For lymph node detection, the AI model achieved a sensitivity, positive predictive value (PPV), and false-positive rate per case of 0.80, 0.74, and 8.6 in internal testing, and 0.63, 0.65, and 8.2 in external testing. The detection performance was superior to that of junior radiologists with less of 10 years of experience. For lymph node segmentation, the DSC was in the range of 0.81 to 0.82. Fig. 2 shows examples of lymph node segmentation of AI model [33].

Fig. 2.

Examples of lymph node segmentation. Ground truth results are shown in yellow, and segmentation results by the artificial intelligence model are shown in red. The number besides the lymph node is the corresponding dice similarity coefficient. Reprinted from Zhao et al. [33], available under the Creative Commons License.

Because manual annotation is a time-consuming and labor-intensive task, interest in AI segmentation models has increased because artificial intelligence segmentation models have achieved good performance. Through continuous iteration and optimization of the automatic segmentation model, radiologists can be effectively assisted for faster and more accurate annotation [23]. Jian et al. [34] used U-Net and VGG-16 [40] based on T2WI obtained from 512 patients with colorectal cancer. VGG-16 was used as the base model to extract features from tumor images, and 5 side-output blocks were used to obtain accurate tumor segmentation results. After cropping the region of interest, the VGG-16 model performed automatic segmentation without intervention. The segmentation performance of VGG-16 was superior to that of U-Net. The VGG-16 achieved a DSC, PPV, specificity, sensitivity, Hammoude distance, and Hausdorff distance of 0.84, 0.83, 0.97, 0.88, 0.27, and 8.2, respectively. Zhu et al. [35] used a 3D U-Net based on DWI obtained from 300 patients with rectal cancer. The region of the rectal tumor was delineated using the DWI by experienced radiologists as the ground truth. Automatic segmentation was compared with semiautomatic segmentation. Semiautomatic segmentation first requires manually assigning a threshold value for voxel selection and then automatically segmenting the largest connected region as the tumor region. The automatic segmentation model exhibited DSC of 0.68±0.14, and the semiautomatic segmentation model showed DSC of 0.61±0.23. The automatic segmentation model was superior to the semiautomatic segmentation model (P=0.035). As automated segmentation models continue to advance, they will offer substantial assistance in clinical practice.

Staging

According to current guidelines for the treatment of rectal cancer, accurate staging is essential. MRI is considered the optimal modality for local tumor and lymph node staging of rectal cancer [16, 23, 28]. The predictive accuracy of pathological T categorization of rectal cancer using MRI has been reported to be approximately 71% to 91% [28, 41]. MRI evaluates lymph node status by measuring the short axial diameter and can achieve 58% to 70% sensitivity and 75% to 85% specificity for identifying metastasis [42]. However, a surplus of imaging data coupled with a shortage of radiologists has led to an increased workload for radiologists. This has led to highly stressful environments and disparities in diagnostic accuracy among radiologists. MRI interpretations are influenced by radiologists, occasionally leading to misdiagnoses [43].

Table 2 summarizes AI studies on the staging of rectal cancer [28, 43–49]. AI has been rapidly applied in MRI-based T staging, and most studies have focused on binary classification to distinguish between T1–T2 and T3–T4 to determine whether preoperative CRT should be performed. Kim et al. [28] used U-Net [36], AlexNet [50], and InceptionV3 [51] to discriminate between T2 and T3 based on the MR images obtained from 133 patients with rectal cancer. These models performed better when the input images were used as a tumor segmentation map. The inception model exhibited the best performance, with an accuracy of 0.94, sensitivity of 0.88, and specificity of 1.00. Wu et al. [43] used a Faster R-CNN model [52] for T staging based on T2WI obtained from 183 patients with rectal cancer. After 50 epochs of learning, the diagnostic performances were AUROC of 0.95 to 1.00 for the T1, T2, T3, and T4 stages in horizontal, sagittal, and coronal planes.

Table 2.

AI studies on rectal cancer staging

MRI offers limited semantic diagnostic clues for lymph node metastasis, such as the size, shape, and margins of lymph nodes, which are insufficient for the precise diagnosis of N staging in rectal cancer. The prediction of lymph node status has recently become a significant area of interest in recently [23]. Several studies have been conducted using the Faster R-CNN model [52] for N staging [44, 45]. Lu et al. [44] reported that AI-based staging of lymph node metastasis is accurate and fast. After a 4-step iteration training of the Faster R-CNN using T2WI and DWI obtained from 351 patients with rectal cancer, 414 patients with rectal cancer from other institutions were tested to verify the diagnostic performance. The AUROC was 0.91, and the diagnostic time was 20 seconds, which surpassed that of the radiologist (10 minutes). Ding et al. [45] used a Faster R-CNN [52] to detect lymph node metastases and predict prognosis. The T2WI and DWI of 414 patients with rectal cancer were analyzed using the AI model, and the results were compared with those of radiologists and pathologists. The correlation between the AI model and radiologists was 0.91, that between the AI model and pathologists was 0.45, and that between the radiologists and pathologists was only 0.13. The κ coefficient for N staging between the AI model and pathologists was 0.57, and that between the radiologists and pathologists was 0.47. The AI model was superior to the radiologists in terms of N staging, but its accuracy was inferior to that of the pathologists. Zhou et al. [46] used another CNN model to diagnose lymph node metastases based on pelvic MRI in 301 patients with rectal cancer. The accuracy of the AI model was not different from that of the radiologists; however, the AI model was much faster than the radiologists by 10 and 600 seconds, respectively. Li et al. [47] used the InceptionV3 model [51] for the recognition and detection of the lymph node status via transfer learning based on T2WI obtained from 129 patients with rectal cancer. The sensitivity, specificity, PPV, and negative predictive value (NPV) were 0.95, 0.95, 0.95, and 0.95, respectively. The AUROC was 0.99. The diagnostic performance of the AI model was compared with that of 2 radiologists, and it was found to be superior to the radiologists in all aspects. The more advanced the image diagnosis system based on AI models, the more efficient, accurate, and stable the staging system, and the errors caused by differences in radiologists’ diagnostic abilities will be reduced to a certain extent.

Several characteristics of rectal cancer, such as involvement of the CRM, may influence recurrence and metastasis [18, 23]. An accurate preoperative diagnosis of these factors may be helpful in establishing a tailored treatment plan. MRI is regarded as the best examination method for evaluating the involvement of CRM, with a specificity of 0.94; however, it requires considerable experience and time because of the large amount of imaging data [48, 53]. An AI model for the diagnosis of positive CRM may provide a reliable solution. Wang et al. [48] used a Faster R-CNN model [52] based on T2WI obtained from 240 patients with rectal cancer. The proportion of positive and negative CRM was 1:2. The accuracy, sensitivity, and specificity of the model were 0.93, 0.84, and 0.96, respectively. The AUROC was 0.93. Xu et al. [49] also used a Faster R-CNN model [52] based on T2WI obtained from 350 patients with rectal cancer. The accuracy, sensitivity, specificity, PPV, and NPV of this model were 0.88, 0.86, 0.90, 0.81, and 0.93, respectively. The AUROC was 0.93. The automatic recognition time of the model for a single image was 0.2 seconds. Thus, AI has the potential to predict the risk factors for rectal cancer and may be a good tool for personalized treatment strategies.

Genotyping

The US National Comprehensive Cancer Network (NCCN) and European Society for Medical Oncology (ESMO) guidelines recommend that all patients with rectal cancer should be tested for microsatellite instability (MSI) and KRAS mutations to establish individualized treatment strategies, thus maximizing the benefits for patients with rectal cancer [25, 54, 55]. Invasive colonoscopic biopsies or surgical specimens are essential for genetic testing using immunohistochemistry- or polymerase chain reaction (PCR)-based assays in clinical practice [56, 57]. However, this approach has several limitations. Genetic testing is not commonly performed due to its tedious procedures and heavy financial burden [58]. Moreover, the sampling procedure has potential complications [59]. Sampling errors may exist because of insufficient or tumor heterogeneity [60]. Therefore, noninvasive, feasible, low-cost, and timely methods to identify genetic mutations in rectal cancer have aroused widespread use [23].

Table 3 summarizes AI studies on genetic mutations in rectal cancer [61–63]. MSI, a consequence of the loss of one of the more mismatch repair genes, has gained considerable attention because of its significance in rectal cancer prognosis and treatment. MSI has demonstrated no benefit from 5-fluorouracil based adjuvant chemotherapy [64, 65]. More importantly, recent studies have demonstrated MSI is a predictive biomarker for immunotherapy [66, 67]. Zhang et al. [61] used the 3D MobilenetV2 model [68] based on T2WI obtained from 491 patients with rectal cancer to predict the MSI status. The performance of the AI model was compared with that of a clinical model, which is a multivariate binary logistic regression classifier based on clinical characteristics. The sensitivity and specificity of the AI model was 0.89 and 0.74, respectively. The AUROC of the AI model was 0.82. The sensitivity and specificity of the clinical model were 1.0 and 0.31, respectively. The AUROC of the clinical model was 0.61. The imaging model is superior to the clinical model in predicting the MSI status. Cao et al. [62] used the Resnet101 model [69] based on enhanced abdominopelvic computed tomography obtained from 1,606 patients with CRC to develop an AI model and from 206 patients with CRC for external validation. The AI model achieved a 0.99 accuracy, 1.0 sensitivity, and 0.97 specificity in the internal validation, and achieved 0.91 accuracy, 0.90 sensitivity, and 0.93 specificity in the external validation. For internal and external validations, the AUROC of the AI model was 0.99 in internal validation and 0.92 in external validation.

Table 3.

AI studies of genetic mutations in rectal cancer

KRAS is a small G-protein that plays a role in the epidermal growth factor receptor (EGFR) pathway. Patients with rectal cancer and the KRAS mutant type show a lower response to anti-EGFR monoclonal antibodies and worse prognosis [55]. He et al. [63] used the ResNet model [69] based on enhanced abdominopelvic computed tomography images obtained from 157 patients with CRC. The diagnostic performance of the ResNet model was compared to that of a radiomics model using a random forest classifier. The ResNet model achieved a sensitivity of 0.59, specificity of 1.0, and AUROC of 0.93. The radiomics model achieved a sensitivity of 0.7, a specificity of 0.85, and an AUROC of 0.82. The ResNet model showed a superior predictive ability.

Response to therapy

Neoadjuvant CRT is a critical treatment strategy for locally advanced rectal cancer. Following CRT, 15% to 27% of patients undergo surgery despite achieving a pathological complete response (pCR) [70]. For such patients, it remains a challenge to decide whether to perform TME, which is associated with significant complications and morbidity. Various studies have shown that patients who achieve a pCR have significantly reduced local recurrence rates. Consequently, less invasive treatment options such as sphincter-saving local excision and “watch-and-wait” approaches are becoming favored in clinical practice [71, 72]. MRI plays an essential role in identifying tumor regression grade (TRG) and predicting pCR [15]. Recently, the development of radiomics and deep learning based on MRI has demonstrated impressive results in the prediction of pCR or good response (GR), defined by downstaging to ypT0–1N0 or TRG0–1 [23, 73–76]. However, these studies involved handcrafted segmentation, manual labelling, and feature definition without any deformability [77, 78]. This review focuses on image-based AI models for predicting pCR or GR. Table 4 summarizes the AI studies for predicting treatment response after neoadjuvant chemoradiotherapy in rectal cancer [78–81]. Shi et al. [79] used a CNN based on T2WI obtained from 51 patients with rectal cancer, both before and after 3 or 4 weeks of CRT. The performance of the CNN model was compared to that of the radiomics model. For predicting pCR, the AUROC of the CNN model was 0.83 and that of the radiomics model was 0.81. For predicting GR, the AUROC of the CNN model was 0.74 and that of the radiomics model was 0.92. There were no differences in predicting pCR; however, the CNN model was inferior to the radiomics model in predicting GR (P=0.04). Zhang et al. [80] used a CNN model based on T2WI obtained from 290 patients with rectal cancer, both before and after CRT. The performance of the CNN model was evaluated using T2WI obtained from 93 patients at external institutions. The accuracy, sensitivity, specificity, and AUROC of predicting pCR were 0.98, 1.0, 0.97, and 0.99, respectively. Zhu et al. [81] used a CNN model based on pre-CRT T2WI obtained from 700 patients with rectal cancer. The sensitivity, specificity, and AUROC for predicting the GR were 0.93, 0.62, and 0.81, respectively. Jang et al. [78] used ShuffleNet [82] based on the post-CRT T2WI obtained from 466 patients with rectal cancer. The prediction performance of the AI model was compared with that of senior radiologists and radiation oncologists. The accuracy, sensitivity, and specificity of the AI model for predicting a pCR were 0.85, 0.3, and 0.96, respectively. The accuracy, sensitivity, and specificity of GR prediction of the AI model were 0.72, 0.54, and 0.81, respectively. The AI model had superior predictive performance compared to radiologists and radiation oncologists. Thus, an image-based AI model for predicting treatment response has the potential to help establish tailored treatment strategies for patients with locally advanced rectal cancer.

Table 4.

AI studies for predicting treatment response after neoadjuvant CRT in rectal cancer

DISCUSSION

In this systematic review, most image-based AI studies for rectal cancer research focused on tumor segmentation, staging, and treatment response. According to our review, these parameters, which could have heterogeneity according to the diagnostic tool or a radiologist’s interpretations, still require human effort over time, such as manual annotation rather than automatization. Furthermore, we hypothesized that the diagnostic performance of AI model could surpass that of humans within a brief timeframe, as indicated in some studies. In the era of a plethora of AI algorithms, there has been a recent emphasis on the importance of the performance reporting design of AI research. Simultaneously, the clinical application and approval of AI technologies tend to overhype their significance, necessitating considerations not only of ethical issues but also of stringent conditions such as the US Food and Drug Administration (FDA) 501(k) clearance [83]. In this study, we recognize that the current status of image-based AI studies for rectal cancer might be limited owing to various issues, including the sample size of the dataset, robust validation such as external validation or randomized control studies in comparison with conventional algorithms or human assessment, automated processing, and prognostic relevance based on the analysis of survival outcomes.

According to our systematic review, the sample size of the test dataset for performance was <100 in most studies. The key to improving the performance of the AI model is only obtaining more data; if possible, it is better to have a standard reference, preferably based on normal cases, for a robust dataset. However, large datasets collected from multiple institutions have some pitfalls in terms of data heterogeneity and noise, which might induce bias or loss of function with the need for a normalization process. The process of distinguishing signals from noise has become more challenging over time. A major push towards unsupervised learning techniques might enable the full utilization of vast archives and deal with difficulties in curating and labelling data. Moreover, various investigators experienced hurdles to external data exfiltration because of the particular approval for data exfiltration. This could hamper multicenter study design and even external validation with a risk for low-quality and low-quantity data. In this systematic review, we found a few multicenter studies. However, these aspects are helpful and mandatory for the clinical use of AI devices. Recently, despite the change towards non-handcrafted engineering, without human interference, a few image-based AI studies for rectal cancer showed automatization to yield performance in this field. Although the benefit of automatization over handcrafted segmentation remains unclear, an automatized process for performance might be challenging in image-based study of rectal cancer using neural networks.

The current TNM staging system with parameters, including T and N risk factors, CRM, and tumor response, has been demonstrated by confirmative evidence from many previous studies. We considered the absence of data from studies demonstrating the relationship between AI performance and prognosis through oncologic analyses involving survival outcomes, including overall, disease-free, or local recurrence. Furthermore, the opacity of the algorithms in the studies we included in our systematic review poses a challenge for their clinical implementation, given to the use of black-box algorithms. The prognostic significance of AI-based imaging parameters cannot be assigned to patients when clinicians lack comprehension. Therefore, the reporting system of AI research should extend beyond reporting diagnostic values such as accuracy, specificity, sensitivity, and ROC value. Instead, it should strive to elucidate the underlying reasons for predictions, aiming to enhance understanding and knowledge of the algorithm. Although such questioning is possible with explicitly programmed mathematical models such as conventional algorithms, neural networks based on deep learning have opaque inner workings [20]. Furthermore, in terms of image texture, feature map images displayed by AI still require more specific, clear annotation and image texture when compared to annotations for image parameters characterized by radiologists.

In conclusion, we have found that the current status of image-based AI models for rectal cancer faces multiple challenges. These include limited dataset sizes, absence of standardized references, and the difficulty for designing multicenter studies with external validation due to an insufficient dataset, despite achieving acceptable diagnostic performance. These findings suggest that the application of this model may not be feasible in clinical practice. Furthermore, an oncologic association between AI-driven classes and the prognosis of patients with rectal cancer is warranted. Overcoming pitfalls and hurdles is essential for the feasible integration of AI models into clinical settings for rectal cancer and further research based on advanced techniques, such as unsupervised learning and robust validation from high-quality labelled large datasets with standard references.

Notes

Conflict of interest

Bo Young Oh and Il Tae Son are Editorial Board members of Annals of Coloproctology, but were not involved in in the peer reviewer selection, evaluation, or decision process of this article. No other potential conflict of interest relevant to this article was reported.

Funding

None.

Author contributions

Conceptualization: ITS, BJC; Formal analysis: MK, PT; Investigation: MK, PT, MJK, BYO; Methodology: PT; Project administration: ITS, BJC; Supervision: MJK, BYO, Validation: MJK, BYO, Writing–original draft: MK; Writing–review & editing: all authors. All authors read and approved the final manuscript.

References

1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2020. CA Cancer J Clin 2020;70:7–30.

2. Kim MH, Park S, Yi N, Kang B, Park IJ. Colorectal cancer mortality trends in the era of cancer survivorship in Korea: 2000-2020. Ann Coloproctol 2022;38:343–52.

3. Dulskas A, Caushaj PF, Grigoravicius D, Zheng L, Fortunato R, Nunoo-Mensah JW, et al. International Society of University Colon and Rectal Surgeons survey of surgeons’ preference on rectal cancer treatment. Ann Coloproctol 2023;39:307–14.

4. Hyun JH, Alhanafy MK, Park HC, Park SM, Park SC, Sohn DK, et al. Initial local excision for clinical T1 rectal cancer showed comparable overall survival despite high local recurrence rate: a propensity-matched analysis. Ann Coloproctol 2022;38:166–75.

5. Kim CH. The risk-benefit trade-off in local excision of early rectal cancer. Ann Coloproctol 2022;38:95–6.

6. Heald RJ, Husband EM, Ryall RD. The mesorectum in rectal cancer surgery: the clue to pelvic recurrence? Br J Surg 1982;69:613–6.

7. Mahendran B, Balasubramanya S, Sebastiani S, Smolarek S. Extended lymphadenectomy in locally advanced rectal cancers: a systematic review. Ann Coloproctol 2022;38:3–12.

8. Nasir IU, Shah MF, Panteleimonitis S, Figueiredo N, Parvaiz A. Spotlight on laparoscopy in the surgical resection of locally advanced rectal cancer: multicenter propensity score match study. Ann Coloproctol 2022;38:307–13.

9. Oh BY. Advances in surgery for locally advanced rectal cancer. Ann Coloproctol 2022;38:279–80.

10. Park MY, Yu CS, Kim TW, Kim JH, Park JH, Lee JL, et al. Efficacy of preoperative chemoradiotherapy in patients with cT2N0 distal rectal cancer. Ann Coloproctol 2023;39:250–9.

11. Son GM. Organ preservation for early rectal cancer using preoperative chemoradiotherapy. Ann Coloproctol 2023;39:191–2.

12. Laohawiriyakamol S, Chaochankit W, Wanichsuwan W, Kanjanapradit K, Laohawiriyakamol T. An investigation into tumor regression grade as a parameter for locally advanced rectal cancer and 5-year overall survival rate. Ann Coloproctol 2023;39:59–70.

13. Park IJ. Precision medicine for primary rectal cancer will become a reality. Ann Coloproctol 2022;38:281–2.

14. Glynne-Jones R, Wyrwicz L, Tiret E, Brown G, Rödel C, Cervantes A, et al. Rectal cancer: ESMO clinical practice guidelines for diagnosis, treatment and follow-up. Ann Oncol 2017;28(suppl_4):iv22–40.

15. Benson AB, Venook AP, Al-Hawary MM, Azad N, Chen YJ, Ciombor KK, et al. Rectal cancer, version 2.2022, NCCN clinical practice guidelines in oncology. J Natl Compr Canc Netw 2022;20:1139–67.

16. Horvat N, Carlos Tavares Rocha C, Clemente Oliveira B, Petkovska I, Gollub MJ. MRI of rectal cancer: tumor staging, imaging techniques, and management. Radiographics 2019;39:367–87.

17. Dieguez A. Rectal cancer staging: focus on the prognostic significance of the findings described by high-resolution magnetic resonance imaging. Cancer Imaging 2013;13:277–97.

18. Wang PP, Deng CL, Wu B. Magnetic resonance imaging-based artificial intelligence model in rectal cancer. World J Gastroenterol 2021;27:2122–30.

19. Koh DM, Papanikolaou N, Bick U, Illing R, Kahn CE Jr, Kalpathi-Cramer J, et al. Artificial intelligence and machine learning in cancer imaging. Commun Med (Lond) 2022;2:133.

20. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJ. Artificial intelligence in radiology. Nat Rev Cancer 2018;18:500–10.

21. Shimizu H, Nakayama KI. Artificial intelligence in oncology. Cancer Sci 2020;111:1452–60.

22. Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer 2012;48:441–6.

23. Wong C, Fu Y, Li M, Mu S, Chu X, Fu J, et al. MRI-based artificial intelligence in rectal cancer. J Magn Reson Imaging 2023;57:45–56.

24. Anwar SM, Majid M, Qayyum A, Awais M, Alnowami M, Khan MK. Medical image analysis using convolutional neural networks: a review. J Med Syst 2018;42:226.

25. Luchini C, Bibeau F, Ligtenberg MJ, Singh N, Nottegar A, Bosse T, et al. ESMO recommendations on microsatellite instability testing for immunotherapy in cancer, and its relationship with PD-1/PD-L1 expression and tumour mutational burden: a systematic review-based approach. Ann Oncol 2019;30:1232–43.

26. Trebeschi S, van Griethuysen JJ, Lambregts DM, Lahaye MJ, Parmar C, Bakers FC, et al. Deep learning for fully-automated localization and segmentation of rectal cancer on multiparametric MR. Sci Rep 2017;7:5301.

27. Wang J, Lu J, Qin G, Shen L, Sun Y, Ying H, et al. Technical note: a deep learning-based autosegmentation of rectal tumors in MR images. Med Phys 2018;45:2560–4.

28. Kim J, Oh JE, Lee J, Kim MJ, Hur BY, Sohn DK, et al. Rectal cancer: toward fully automatic discrimination of T2 and T3 rectal cancers using deep convolutional neural network. Int J Imaging Syst Technol 2019;29:247–59.

29. Pang X, Wang F, Zhang Q, Li Y, Huang R, Yin X, et al. A pipeline for predicting the treatment response of neoadjuvant chemoradiotherapy for locally advanced rectal cancer using single MRI modality: combining deep segmentation network and radiomics analysis based on “suspicious region”. Front Oncol 2021;11:711747.

30. Knuth F, Adde IA, Huynh BN, Groendahl AR, Winter RM, Negård A, et al. MRI-based automatic segmentation of rectal cancer using 2D U-Net on two independent cohorts. Acta Oncol 2022;61:255–63.

31. DeSilvio T, Antunes JT, Bera K, Chirra P, Le H, Liska D, et al. Region-specific deep learning models for accurate segmentation of rectal structures on post-chemoradiation T2w MRI: a multi-institutional, multi-reader study. Front Med (Lausanne) 2023;10:1149056.

32. Zhang G, Chen L, Liu A, Pan X, Shu J, Han Y, et al. Comparable performance of deep learning-based to manual-based tumor segmentation in KRAS/NRAS/BRAF mutation prediction with MR-based radiomics in rectal cancer. Front Oncol 2021;11:696706.

33. Zhao X, Xie P, Wang M, Li W, Pickhardt PJ, Xia W, et al. Deep learning-based fully automated detection and segmentation of lymph nodes on multiparametric-MRI for rectal cancer: a multicentre study. EBioMedicine 2020;56:102780.

34. Jian J, Xiong F, Xia W, Zhang R, Gu J, Wu X, et al. Fully convolutional networks (FCNs)-based segmentation method for colorectal tumors on T2-weighted magnetic resonance images. Australas Phys Eng Sci Med 2018;41:393–401.

35. Zhu HT, Zhang XY, Shi YJ, Li XT, Sun YS. Automatic segmentation of rectal tumor on diffusion-weighted images by deep learning with U-Net. J Appl Clin Med Phys 2021;22:324–31.

36. Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation [Preprint]. Posted 2015;May. 18. arXiv:1505.04597. https://doi.org/10.48550/arXiv.1505.04597.

37. Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 2017;39:640–51.

38. Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 2017;39:2481–95.

39. Milletari F, Navab N, Ahmadi SA. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In : 2016 Fourth International Conference on 3D Vision (3DV); 2016 Oct 25–28; Standford, CA, USA. IEEE; 2016. p. 565–71.

40. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [Preprint]. Posted 2015;Apr. 10. arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556.

41. Taylor FG, Swift RI, Blomqvist L, Brown G. A systematic approach to the interpretation of preoperative staging MRI for rectal cancer. AJR Am J Roentgenol 2008;191:1827–35.

42. Park JS, Jang YJ, Choi GS, Park SY, Kim HJ, Kang H, et al. Accuracy of preoperative MRI in predicting pathology stage in rectal cancers: node-for-node matched histopathology validation of MRI features. Dis Colon Rectum 2014;57:32–8.

43. Wu QY, Liu SL, Sun P, Li Y, Liu GW, Liu SS, et al. Establishment and clinical application value of an automatic diagnosis platform for rectal cancer T-staging based on a deep neural network. Chin Med J (Engl) 2021;134:821–8.

44. Lu Y, Yu Q, Gao Y, Zhou Y, Liu G, Dong Q, et al. Identification of metastatic lymph nodes in MR imaging with faster region-based convolutional neural networks. Cancer Res 2018;78:5135–43.

45. Ding L, Liu GW, Zhao BC, Zhou YP, Li S, Zhang ZD, et al. Artificial intelligence system of faster region-based convolutional neural network surpassing senior radiologists in evaluation of metastatic lymph nodes of rectal cancer. Chin Med J (Engl) 2019;132:379–87.

46. Zhou YP, Li S, Zhang XX, Zhang ZD, Gao YX, Ding L, et al. High definition MRI rectal lymph node aided diagnostic system based on deep neural network. Zhonghua Wai Ke Za Zhi 2019;57:108–13.

47. Li J, Zhou Y, Wang P, Zhao H, Wang X, Tang N, et al. Deep transfer learning based on magnetic resonance imaging can improve the diagnosis of lymph node metastasis in patients with rectal cancer. Quant Imaging Med Surg 2021;11:2477–85.

48. Wang D, Xu J, Zhang Z, Li S, Zhang X, Zhou Y, et al. Evaluation of rectal cancer circumferential resection margin using faster region-based convolutional neural network in high-resolution magnetic resonance images. Dis Colon Rectum 2020;63:143–51.

49. Xu JH, Zhou XM, Ma JL, Liu SS, Zhang MS, Zheng XF, et al. Application of convolutional neural network to risk evaluation of positive circumferential resection margin of rectal cancer by magnetic resonance imaging. Zhonghua Wei Chang Wai Ke Za Zhi 2020;23:572–7.

50. Krizhevsky A, Sutskever I, Hinton GE. In: Pereira F, Burges CJ, Bottou L, Weinberger KQ, editors. Proceedings of the 25th International Conference on Neural Information Processing Systems; 2012 Dec 3–6; Lake Tahoe, NV, USA. Curran Associates Inc; 2012. p. 1097–105.

51. Szegedy C, Wei L, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015 Jun 7–12; Boston, MA, USA. IEEE; 2015. p. 1–9.

52. Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 2017;39:1137–49.

53. Al-Sukhni E, Milot L, Fruitman M, Beyene J, Victor JC, Schmocker S, et al. Diagnostic accuracy of MRI for assessment of T category, lymph node metastases, and circumferential resection margin involvement in patients with rectal cancer: a systematic review and meta-analysis. Ann Surg Oncol 2012;19:2212–23.

54. Benson AB, Venook AP, Al-Hawary MM, Arain MA, Chen YJ, Ciombor KK, et al. NCCN guidelines insights: rectal cancer, version 6.2020. J Natl Compr Canc Netw 2020;18:806–15.

55. Van Cutsem E, Lenz HJ, Köhne CH, Heinemann V, Tejpar S, Melezínek I, et al. Fluorouracil, leucovorin, and irinotecan plus cetuximab treatment and RAS mutations in colorectal cancer. J Clin Oncol 2015;33:692–700.

56. Cerretelli G, Ager A, Arends MJ, Frayling IM. Molecular pathology of Lynch syndrome. J Pathol 2020;250:518–31.

57. Kim JC, Bodmer WF. Genotypic and phenotypic characteristics of hereditary colorectal cancer. Ann Coloproctol 2021;37:368–81.

58. Kawakami H, Zaanan A, Sinicrope FA. Microsatellite instability testing and its role in the management of colorectal cancer. Curr Treat Options Oncol 2015;16:30.

59. Meng X, Xia W, Xie P, Zhang R, Li W, Wang M, et al. Preoperative radiomic signature based on multiparametric magnetic resonance imaging for noninvasive evaluation of biological characteristics in rectal cancer. Eur Radiol 2019;29:3200–9.

60. Greenbaum A, Martin DR, Bocklage T, Lee JH, Ness SA, Rajput A. Tumor heterogeneity as a predictor of response to neoadjuvant chemotherapy in locally advanced rectal cancer. Clin Colorectal Cancer 2019;18:102–9.

61. Zhang W, Yin H, Huang Z, Zhao J, Zheng H, He D, et al. Development and validation of MRI-based deep learning models for prediction of microsatellite instability in rectal cancer. Cancer Med 2021;10:4164–73.

62. Cao W, Hu H, Guo J, Qin Q, Lian Y, Li J, et al. CT-based deep learning model for the prediction of DNA mismatch repair deficient colorectal cancer: a diagnostic study. J Transl Med 2023;21:214.

63. He K, Liu X, Li M, Li X, Yang H, Zhang H. Noninvasive KRAS mutation estimation in colorectal cancer using a deep learning method based on CT imaging. BMC Med Imaging 2020;20:59.

64. Sargent DJ, Marsoni S, Monges G, Thibodeau SN, Labianca R, Hamilton SR, et al. Defective mismatch repair as a predictive marker for lack of efficacy of fluorouracil-based adjuvant therapy in colon cancer. J Clin Oncol 2010;28:3219–26.

65. Ribic CM, Sargent DJ, Moore MJ, Thibodeau SN, French AJ, Goldberg RM, et al. Tumor microsatellite-instability status as a predictor of benefit from fluorouracil-based adjuvant chemotherapy for colon cancer. N Engl J Med 2003;349:247–57.

66. Le DT, Uram JN, Wang H, Bartlett BR, Kemberling H, Eyring AD, et al. PD-1 blockade in tumors with mismatch-repair deficiency. N Engl J Med 2015;372:2509–20.

67. Llosa NJ, Cruise M, Tam A, Wicks EC, Hechenbleikner EM, Taube JM, et al. The vigorous immune microenvironment of microsatellite instable colon cancer is balanced by multiple counter-inhibitory checkpoints. Cancer Discov 2015;5:43–51.

68. Howard A, Sandler M, Chen B, Wang W, Chen LC, Tan M, et al. Searching for MobileNetV3. In : 2019 Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); 2019 Oct 27–Nov 2; Seoul, Korea. IEEE; 2020. p. 1314–24.

69. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In : 29th IEEE Conference on Computer Vision and Pattern Recognition; 2016 Jun 27–30; Las Vegas, NV, USA. IEEE; 2016. p. 770–8.

70. Maas M, Nelemans PJ, Valentini V, Das P, Rödel C, Kuo LJ, et al. Long-term outcome in patients with a pathological complete response after chemoradiation for rectal cancer: a pooled analysis of individual patient data. Lancet Oncol 2010;11:835–44.

71. Renehan AG, Malcomson L, Emsley R, Gollins S, Maw A, Myint AS, et al. Watch-and-wait approach versus surgical resection after chemoradiotherapy for patients with rectal cancer (the OnCoRe project): a propensity-score matched cohort analysis. Lancet Oncol 2016;17:174–83.

72. Marijnen CA. Organ preservation in rectal cancer: have all questions been answered? Lancet Oncol 2015;16:e13–22.

73. Bulens P, Couwenberg A, Intven M, Debucquoy A, Vandecaveye V, Van Cutsem E, et al. Predicting the tumor response to chemoradiotherapy for rectal cancer: model development and external validation using MRI radiomics. Radiother Oncol 2020;142:246–52.

74. Horvat N, Veeraraghavan H, Khan M, Blazic I, Zheng J, Capanu M, et al. MR imaging of rectal cancer: radiomics analysis to assess treatment response after neoadjuvant therapy. Radiology 2018;287:833–43.

75. Yi X, Pei Q, Zhang Y, Zhu H, Wang Z, Chen C, et al. MRI-based radiomics predicts tumor response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer. Front Oncol 2019;9:552.

76. Ferrari R, Mancini-Terracciano C, Voena C, Rengo M, Zerunian M, Ciardiello A, et al. MR-based artificial intelligence model to assess response to therapy in locally advanced rectal cancer. Eur J Radiol 2019;118:1–9.

77. Lou B, Doken S, Zhuang T, Wingerter D, Gidwani M, Mistry N, et al. An image-based deep learning framework for individualizing radiotherapy dose. Lancet Digit Health 2019;1:e136–47.

78. Jang BS, Lim YJ, Song C, Jeon SH, Lee KW, Kang SB, et al. Image-based deep learning model for predicting pathological response in rectal cancer using post-chemoradiotherapy magnetic resonance imaging. Radiother Oncol 2021;161:183–90.

79. Shi L, Zhang Y, Nie K, Sun X, Niu T, Yue N, et al. Machine learning for prediction of chemoradiation therapy response in rectal cancer using pre-treatment and mid-radiation multi-parametric MRI. Magn Reson Imaging 2019;61:33–40.

80. Zhang XY, Wang L, Zhu HT, Li ZW, Ye M, Li XT, et al. Predicting rectal cancer response to neoadjuvant chemoradiotherapy using deep learning of diffusion kurtosis MRI. Radiology 2020;296:56–64.

81. Zhu HT, Zhang XY, Shi YJ, Li XT, Sun YS. A deep learning model to predict the response to neoadjuvant chemoradiotherapy by the pretreatment apparent diffusion coefficient images of locally advanced rectal cancer. Front Oncol 2020;10:574337.

82. Zhang X, Zhou X, Lin M, Sun J. ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In : 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018 Jun 18–23; Salt Lake City, UT, USA. IEEE; 2018. p. 6848–56.

83. Muehlematter UJ, Bluethgen C, Vokinger KN. FDA-cleared artificial intelligence and machine learning-based medical devices and their 510(k) predicate networks. Lancet Digit Health 2023;5:e618–26.

Article information Continued

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Study	Country	Modality	Architecture	Internal dataset	External dataset	Automatization	Performance	Comparison
Tumor segmentation
Trebeschi et al. [26] (2017)	The Netherlands	T2WI, DWI	CNN	Training, 56	None	Manual annotation	DSC, 0.70	None
				Validation, 62			AUROC, 0.99
				Test, 14
Wang et al. [27] (2018)	China	T2WI	CNN (2D U-Net)	Training, 84	None	Manual annotation	DSC, 0.74	None
				Validation, 9			Hausdorff distance, 20.44
				Test, 20			Average surface distance, 3.25
							Jaccard index, 0.60
Kim et al. [28] (2019)	Korea	T2WI	CNN (U-Net, FCN-8, SegNet)	133	None	Manual annotation	U-Net:	U-Net is superior to other models
							DSC, 0.81
							Sensitivity, 0.79
							Specificity, 0.98
Pang et al. [29] (2021)	China	T2WI	CNN (U-Net)	Training, 88	34	Manual annotation	DSC, 0.95	None
				Test, 46			Sensitivity, 0.97
							Specificity, 0.96
Knuth et al. [30] (2022)	Norway	T2WI	CNN (2D U-Net)	109	83	Manual annotation	DSC, 0.78	None
DeSilvio et al. [31] (2023)	USA	T2WI	CNN (region-specific U-Net, multiclass U-Net)	Training, 44	11	Manual annotation	Region-specific U-Net:	Region-specific U-Net is superior to multiclass U-Net
				Validation, 44			DSC, 0.91
				Test, 49			Hausdorff distance, 2.45
Zhang et al. [32] (2021)	China	T2WI, DWI	CNN (3D V-Net)	Training, 108	None	Manual annotation	DSC, 0.96	None
Zhang et al. [32] (2021)	China	T2WI, DWI	CNN (3D V-Net)	Test, 94	None	Manual annotation	DSC, 0.96	None
Jian et al. [34] (2018)	China	T2WI	CNN (VGG-16)	Training, 410	None	Fully automatic	DSC, 0.84	VGG-16 is superior to U-Net
				Test, 102			PPV, 0.83
							Sensitivity, 0.88
							Specificity, 0.97
							Hammoude distance, 0.27
							Hausdorff distance, 8.2
Zhu et al. [35] (2021)	China	DWI	CNN (3D U-Net)	Training, 180	None	Fully automatic	DSC, 0.68	Automatic model is superior to semiautomatic model
				Validation, 60
				Test, 60
LN segmentation
Zhao et al. [33] (2020)	China	T2WI, DWI	CNN (Mask R-CNN)	293	50	Manual annotation	Detection:	Mask R-CNN is superior to junior radiologists in LN detection
							Sensitivity, 0.63
							PPV, 0.65
							False-positive rate per case, 8.2
							Segmentation:
							DSC, 0.81-0.82

Study	Country	Modality	Architecture	Internal dataset	External dataset	Performance	Comparison
T stage
Kim et al. [28] (2019)	Korea	T2WI	CNN (AlexNet, Inception)	133	None	Inception:	Inception is superior to AlexNet
						Accuracy, 0.94
						Sensitivity, 0.88
						Specificity, 1.0
Wu et al. [43] (2021)	China	T2WI	CNN (Faster R-CNN)	183	None	Coronal view:	None
						AUROC (T1), 0.96
						AUROC (T2), 0.97
						AUROC (T3), 0.97
						AUROC (T4), 0.97
N stage
Lu et al. [44] (2018)	China	T2WI, DWI	CNN (Faster R-CNN)	351	414	AUROC, 0.91	No difference in accuracy compared to radiologists
Lu et al. [44] (2018)	China	T2WI, DWI	CNN (Faster R-CNN)	351	414	Diagnostic time, 20 sec vs. 600 sec	Faster than radiologists
Ding et al. [45] (2019)	China	T2WI, DWI	CNN (Faster R-CNN)	414	None	κ coefficient:	AI model is superior to radiologists
						AI vs. pathologist, 0.57
						Radiologist vs. pathologist, 0.47
Zhou et al. [46] (2019)	China	Pelvic HD MRI	CNN	Training, 201	None	AUROC, 0.89	No difference in accuracy compared with radiologists
Zhou et al. [46] (2019)	China	Pelvic HD MRI	CNN	Test, 100	None	Diagnostic time, 10 sec vs. 600 sec	Faster than radiologists
Li et al. [47] (2021)	China	T2WI	CNN (InceptionV3)	129	None	Sensitivity, 0.95	AI is superior to radiologists
						Specificity, 0.95
						PPV, 0.95
						NPV, 0.95
						AUROC, 0.99
Circumferential resection margin
Wang et al. [48] (2020)	China	T2WI	CNN (Faster R-CNN)	Training, 192	None	Accuracy, 0.93	None
				Test, 48		Sensitivity, 0.84
						Specificity, 0.96
						AUROC, 0.95
Xu et al. [49] (2020)	China	T2WI	CNN (Faster R-CNN)	Training, 300	None	Accuracy, 0.88	None
				Test, 50		Sensitivity, 0.86
						Specificity, 0.90
						AUROC, 0.93

Study	Country	Modality	Architecture	Internal dataset	External dataset	Performance	Comparison
Microsatellite instability
Zhang et al. [61] (2021)	China	T2WI	CNN (3D MobileNetV2)	Training, 395	None	Image model:	Image model is superior to clinical model
				Validation, 395		Sensitivity, 0.89
				Test, 96		Specificity, 0.74
						AUROC, 0.82
						Clinical model:
						Sensitivity, 1.00
						Specificity, 0.31
						AUROC, 0.61
Cao et al. [62] (2023)	China	Enhanced APCT	CNN (Resnet101)	Training, 1,124	206	Internal validation:	None
				Test, 482		Accuracy, 0.99
						Sensitivity, 1.0
						Specificity, 0.97
						AUROC, 0.99
						External validation:
						Accuracy, 0.91
						Sensitivity, 0.90
						Specificity, 0.93
						AUROC, 0.92
KRAS mutation
He et al. [63] (2020)	China	Enhanced APCT	CNN (ResNet)	Training, 117	None	AI model:	AI model is superior to radiomics model
				Test, 40		Sensitivity, 0.59
						Specificity, 1.0
						AUROC, 0.93
						Radiomics model:
						Sensitivity, 0.70
						Specificity, 0.85
						AUROC, 0.82

Study	Country	Modality	Architecture	Internal dataset	External dataset	Performance	Comparison
Shi et al. [79] (2019)	China	T2WI (pre- and mid-CRT)	CNN	51	None	pCR:	No difference in predicting pCR
						AUROC (CNN), 0.83	CNN model is inferior to radiomics model in predicting GR
						AUROC (radiomics), 0.81
						GR:
						AUROC (CNN), 0.74
						AUROC (radiomics), 0.92
Zhang et al. [80] (2020)	China	T2WI (pre- and post-CRT)	CNN (models A, B, C)	Training, 290	93	pCR:	Model A is superior to the other models
						Accuracy, 0.98
						Sensitivity, 1.00
						Specificity, 0.97
						AUROC, 0.99
Zhu et al. [81] (2020)	China	T2WI (pre-CRT)	CNN	Training, 400	None	GR:	None
				Validation, 100		Sensitivity, 0.93
				Test, 200		Specificity, 0.62
						AUROC, 0.81
Jang et al. [78] (2021)	Korea	T2WI (post-CRT)	CNN (ShuffleNet)	Training, 303	None	pCR:	AI model is superior to human for predicting pCR and GR
				Validation, 46		Accuracy, 0.85
				Test, 117		Sensitivity, 0.30
						Specificity, 0.96
						GR:
						Accuracy 0.72
						Sensitivity 0.54
						Specificity 0.81