이하선 암 수술 환자들의 생존 예측을 위한 머신러닝 알고리즘 개발

Machine Learning-Based Predictor for Treatment Outcomes of Patients With Salivary Gland Cancer After Operation

Article information

Korean J Otorhinolaryngol-Head Neck Surg. 2022;65(6):334-342

Publication date (electronic) : 2022 May 12

doi : https://doi.org/10.3342/kjorl-hns.2021.00871

Min Cheol Jeong ¹, Yoon Woo Koh ², Eun Chang Choi ², Jae-Yol Lim ¹

, Se-Heon Kim ², Young Min Park^,¹

¹Department of Otorhinolaryngology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea

²Department of Otorhinolaryngology, Yonsei University College of Medicine, Seoul, Korea

정민철¹, 고윤우², 최은창², 임재열¹

, 김세헌², 박영민^,¹

¹연세대학교 의과대학 강남세브란스병원 이비인후과학교실

²연세대학교 의과대학 이비인후과학교실

Address for correspondence Young Min Park, MD, PhD Department of Otorhinolaryngology, Gangnam Severance Hospital, Yonsei University College of Medicine, 211 Eonju-ro, Gangnam-gu, Seoul 06273, Korea Tel +82-2-2019-3460 Fax +82-2-3463-4750 E-mail autumnfe@daum.net

Received 2021 August 29; Revised 2021 November 4; Accepted 2021 November 4.

Trans Abstract

Background and Objectives

The purpose of this study was to analyze the survival data of salivary gland cancer (SGCs) patients to construct machine learning and deep learning models that can predict survival and use them to stratify SGC patients according to risk estimate.

Subjects and Method

We retrospectively analyzed the clinicopathologic data from 460 patients with SGCs from 2006 to 2018.

Results

In Cox proportional hazard (CPH) model, pM, stage, lymphovascular invasion, lymph node ratio, and age exhibited significant correlation with patient’s survival. In the CPH model, the c-index value for the training set was 0.85, and that for the test set was 0.81. In the Random Survival Forest model, the c-index value for the training set was 0.86, and that for the test set was 0.82. Stage and age exhibited high importance in both the Random Survival Forest and CPH models. In the deep learning-based model, the c-index value was 0.72 for the training set and 0.72 for the test set. Among the three models mentioned above, the Random Survival Forest model exhibited the highest performance in predicting the survival of SGC patients.

Conclusion

A survival prediction model using machine learning techniques showed acceptable performance in predicting the survival of SGC patients. Although large-scale clinical and multicenter studies should be conducted to establish more powerful predictive model, we expect that individualized treatment can be realized according to risk stratification made by the machine learning model.

Keywords: Deep learning; Machine learning; Prognosis; Salivary gland cancer

Introduction

Salivary gland cancers (SGCs) are rare carcinomas that occur in approximately 3 cases per 100000 people a year and account for less than 3% of all head and neck cancers. It is a highly heterogeneous disease involving about 22 different histologic types of carcinomas. Approximately 70% of SGCs occur in the parotid gland, but they also may occur in other major and minor salivary glands, and the clinical course and prognosis of the disease may vary depending on primary subsites or histologic grade [1]. Histologic grade, advanced T classification, lymph nodes (LNs) metastasis, lymphovascular invasion (LVI), perineural invasion (PNI), and etc. have been known to be prognostic factors of SGCs [2-4]. However, current TNM stating system only reflects the extent of primary tumor (T), nodal factors (N), and distant metastasis (M) and does not reflect any other significant factors. Due to its simplicity of current TNM staging system, some SGCs patients showed unmatched treatment outcomes to their staging. As an inaccurate staging may pose patients at risk of insufficient treatment or overtreatment, a robust and accurate prediction model should be established.

The log-rank test and the Cox proportional hazard (CPH) model are used to analyze survival data of cancer patients to analyze prognostic factors that have a significant effect on survival. However, since these models assume a linear combination between variables, it is not fit enough to analyze the survival data of cancer patients and has the disadvantage of being difficult to apply especially when multiple variables are correlated with each other. Also, survival data of cancer patients are not binary structure but have the characteristics of censored data and ‘time to event.’ When analyzing the high dimensional data of cancer patients, we should consider these unique characteristics. The state-of-art machine learning technique which is more suitable to non-linear model of variables can overcome limitations of traditional statistical methods mentioned above and it has exhibited superior performance to the existing model in a previous study mainly conducted on oral cancer patients [5,6]. To our knowledge, no studies related to machine learning or deep learning model have been conducted to predict the survival of SGCs patients. In this study, we have analyzed the survival data of SGCs patients and tried to construct machine learning and deep learning models for predicting SGCs patient’s survival.

Subjects and Methods

Study participants

From January 2006 to December 2018, data of patients diagnosed with SGCs and treated at Severance Hospital were retrospectively analyzed. The inclusion criteria are as follows: 1) primary malignant tumor in major or minor salivary gland and 2) diagnosed with and underwent surgery for SGCs with sufficient clinical and pathological information was available through medical records. Exclusion criteria are as follows: 1) distant metastasis at time of diagnosis, 2) surgery or radiotherapy previously performed in the head and neck area, and 3) loss during follow-up after surgery was excluded from the study. Finally, a total of 460 patients was included in the study consisting of 240 males and 220 females. The ages ranged from 14 to 99 years with a mean of 53.7 years. Tumor stage was classified based on the 8th American Joint Committee on Cancer (AJCC) staging system. This study was approved by the Institutional Review Board (IRB) of Yonsei University (3-2020-0504). Need for individual informed consent was waived as this study had a retrospective design.

Machine learning model

Of the data set, 80% were classified as training data and used for learning, while the remaining 20% were used as test data to validate the performance of machine learning model. Survival data of cancer patients cannot be predicted by binary classification and ‘time to event’ and censored data should be considered for predicting survival. Several metrics such as accuracy, precision, recall, and area under receiver operating curve (AUC) which are commonly used to estimate the performance of binary classification are not suitable in cancer survival analysis. Therefore, Harrell’s c-index was used to analyze the performance of a survival prediction model for SGC patients [7]. The Harrell’s c-index is the most widely used index to evaluate the predictive accuracy of survival time considering death in a survival analysis model. It is a representative method for evaluating the performance of the survival prediction model and Harrell’s c-index uses a statistical method that indicates whether the predicted death time of a patient through the model is sequentially matched with the real death time of the actual patients. When the c-index value is 0.5, it corresponds to the average value of the random model and c-index value of 1 refers to a perfect match of death time ranking. C-index value over 0.8 usually indicates the strong predictive model [7,8].

We choose the following three representative models for predicting patient’s survival. The Random Survival Forest, CPH, and deep learning-based survival model (DeepSurv) were used to construct machine learning survival prediction model for SGCs patients [8-10]. The linear model CPH, the nonlinear model RFS, and the deep learning model Deepsurv were used to predict the survival of SGC patients, and to determine which model can show the best performance in predicting the survival of these patients. As Random Survival Forest uses all variables for constructing a model and assesses nonlinear effects of them, it can reduce variance and bias. To construct the Random Survival Forest models, the 0.14.0 version of scikit survival was used. The DeepSurv model devised by Katzman, et al. [8] is an open-source python module and updates weights through a feed forward network and back propagation process through a multi-layer neural network structure and presents a negative log partial likelihood. CPH model was constructed by using R program with moonBook package.

Statistical analysis

Patient demographic information, tumor location and stage, pathologic findings, recurrence, date of recurrence, recurrence site, death, date of death, and cause of death were collected and analyzed. Chi-square or Fisher’s exact test was used to evaluate differences in categorical variables between the two independent groups. An independent two-sample t-test was used to assess differences in continuous variables between the two independent groups. The Kaplan-Meier curve was used to analyze patient survival, and outcomes were assessed using a log-rank test. A p-value <0.05 was considered to indicate statistical significance. Statistical analyses were performed using R version 4.03.

Results

Clinical information of patients

A total of 460 patients was included in this study, and all patients underwent surgery as an initial treatment. After surgical treatment, 184 patients (40%) received postoperative radiotherapy, and 70 patients (15.2%) underwent concurrent chemoradiation. The primary site of cancer was the parotid gland in 388 patients (84.3%), submandibular gland in 46 patients (10.0%), sublingual gland in 20 patients (4.3%), and minor salivary gland in 6 patients (1.3%). There were 34 cases (7.4%) with facial nerve (FN) palsy before surgery. On pathologic examination of surgical specimens, 146 patients (31.7%) showed positive surgical margins, and 314 patients (68.3%) showed negative margins. LVI findings were observed in 74 patients (16.1%), PNI findings in 117 patients (25.4%), and extranodal extension (ENE) findings in 70 patients (15.2%). In pT classification, T1 was 145 (31.5%), T2 was 195 (42.4%), T3 was 70 (15.2%), and T4 was 50 (10.9%). In pN classification, N0 was 354 (77.0%), N1 was 28 (6.1%), N2 was 72 (15.7%), and N3 was 6 (1.3%). On the TNM staging system, stage I was 129 (28.0%), stage II was 153 (33.3%), stage III was 75 (16.3%), and stage IV was 103 (22.4%). On survival analysis using Kaplan-Meier curve, significant differences were observed between stage IV and other stages, but there was no significant difference between stages II and III (Fig. 1A). Other information of all patients is summarized in Table 1. In addition, baseline differences between the training and test data sets were analyzed and summarized in Table 1, and the overall survival difference between the two groups was analyzed through Kaplan-Meier survival analysis with the log-rank test (Fig. 1B). There were no significant differences between training and test data set.

Fig. 1.

Survival analysis of patients using Kaplan-Meier curve. A: Kaplan-Meier curve using the 8th TNM stage of all salivary gland cancer patients. B: Kaplan-Meier curves of training and test sets.

Table 1.

Information of all patients with salivary gland cancer enrolled in the study

Clinical prognostic factors

The clinical prognostic factors used were: patient’s age, sex, primary site, FN palsy, adjuvant treatment, margin status, LVI, PNI, ENE, pT, pN, number of metastatic LNs, lymph node ratio (LNR), histology subtype. In the case of a linear model, multicollinearity may be a problem, so the GridSearchCV method was used to select the type of variable to be used in the CPH model. For nonlinear models, RFS and Deepsurv models, all variables were used to build the model without special variable selection because multicollinearity was not an issue.

CPH model

CPH model has been constructed using R program with moonBook package. Firstly, univariate analysis was performed to evaluate prognostic factors for survival in SGCs patients. pM, LVI, ENE, FN palsy, PNI, stage, adjuvant therapy, pN, surgical margin, pT, LNR, number of metastatic LNs, age, subsite, and sex showed significant correlations with SGCs patient’s survival. On multivariate analysis, pM, stage, LVI, LNR, and age exhibited significant correlation with patient’s survival (Fig. 2). In the CPH model, when the model was built using all the parameters mentioned above, the c-index value was 0.85 for the training set and 0.81 for the test set. After tuning parameter values through the GridSearchCV function, optimal performance was achieved when 8 parameters including age, LVI, PNI, ENE, pT, pN, stage, and number of metastatic LNs were input into the model. When the CPH model was built based on these 8 parameters, the c-index value was 0.85 for the training set and 0.80 for the test set.

Fig. 2.

Cox proportional hazard model. A: The results of univariate analysis. B: The results of multivariate analysis. *p<0.05; **p<0.01; ***p<0.001. LVI, lymphovascular invasion; ECS, extracapsular spread; FN, facial nerve; PNI, perineural invasion; LNR, lymph node ratio; HR, hazard ratio; CI, confidence interval.

Random Forest Survival model

In the Random Survival Forest model, the c-index value for the training set was 0.86, and that for the test set was 0.82. Prediction error is calculated using OOB data (Fig. 3). The variable importance (VIMP) was obtained by measuring decrease in prediction accuracy when randomizing a particular variable. Variable with higher VIMP tends to contribute more to predictive accuracy. Feature importance of variables was shown in the order of stage, number of metastatic LNs, age, pN, pT, FN palsy, LNR, ENE, and surgical margin. Stage and age exhibited high importance in both the Random Survival Forest and CPH models. Interaction between variables is measured based on minimal depth. Stage and the number of metastatic LNs showed lowest minimal depth and are expected to be associated with other variables.

Fig. 3.

Random forest survival model. A: Random Forest OOB prediction error estimates as a function of the number of trees in the forest. B: Estimated survival of testing set. C: VIMP. Blue bars indicate positive VIMP, red indicates negative VIMP. Importance is relative to positive length of bars. D: Variable interaction plot. OOB, out of band; FN, facial nerve; LNR, lymph node ratio; ECS, extracapsular spread; LVI, lymphovascular invasion; PNI, perineural invasion; VIMP, variable importance.

Deep learning-based model

In the DeepSurv model, the c-index value was 0.72 for the training set and was 0.72 for the test set. The learning process of the DeepSurv model is visualized in Fig. 4 and overfitting was not observed in the DeepSurv model. Among the three models mentioned above, Random Survival Forest exhibited the highest performance in predicting survival of SGC patients.

Fig. 4.

Learning process of DeepSurv model. A plot of loss (A) and a plot of accuracy (B) on training and testing sets.

Discussion

Due to the histological diversity and rarity of SGCs, there have been no high-quality randomized clinical trials establishing optimal treatment guidelines and analyzing prognostic factors. In previous study, pT, pN, sex, PNI, and histology were reported as risk factors for prognosis and distant metastasis in SGCs [11]. Another retrospective study reported that tumor site and presence of FN palsy were important prognostic factors [12]. In this study, we confirmed that T subsites and LVI findings were important prognostic factors related to disease recurrence, and pM, stage, LVI, LNR, age were important prognostic factors related to death on multivariate analysis. As mentioned above, the results of prognostic factors for SGCs differ from each other according to researchers. Therefore, further research is needed to determine why the prognostic factors of SGC patients differ according to researchers. However, those heterogenous prognostic factors should be considered for constructing survival prediction model of SGCs patients.

Currently, the TNM staging system is the most widely used tool for predicting the prognosis of solid cancer. The system classifies patient stage based on anatomical extent of the tumor but ignores other important factors such as patient age, pain, and histological grade [13]. In addition, since the TNM staging system does not reflect the biologic behavior of the tumor, some SGCs patients showed unexpected treatment outcomes not fit for their tumor staging. Therefore, some researchers have tried to make prognostic nomograms that reflect anatomical, biologic, and biochemical prognostic factors for predicting SGCs patient’s survival. A representative prognostic nomogram system for predicting recurrence after treatment of SGCs was proposed by three independent institutions. The nomogram system proposed by Memorial Sloan Kettering Cancer Center to predict the possibility of 5-year recurrence of SGCs showed a c-index of 0.84 in the validation group [14]. Mannelli, et al. [15] developed a prognostic nomogram to predict the possibility of 5-year recurrence after treatment of SGCs. The c-index value was 0.82, but no external validation was performed. Although these prognostic nomograms reflect various prognostic-related factors and show higher performance for predicting patient’s prognosis compared to previous TNM stage, further research should be performed to validate its clinical usefulness for predicting treatment outcomes of SGCs patients [16].

Traditional hazard-based models used to analyze prognostic factors in cancer patients can assume linear proportional hazards conditions and analyze the impact of variables influencing the survival curve. However, when there is multicollinearity between variables, the impact of the variable on the result can be diluted. On the other hand, machine learning algorithms are suitable for constructing nonlinear interaction models and are less affected by multicollinearity between variables. Therefore, it is possible to have all variables as inputs while decreasing bias and variance. Also, it can be used to present optimal treatment modalities that can benefit patients most or to predict the outcome of treatment. Among the machine learning models used in our study, Random Survival Forest exhibited the highest performance in predicting survival of SGC patients. The CPH model demonstrated the second highest performance, and the deep learning-based model showed the lowest performance. To confirm that the machine learning models did not overfit the training set, the performance of the corresponding model was measured and compared using the test set. No model overfitted the training set in our study.

In recent years, research on radiomics based on the analysis of imaging tests such as CT, MRI, PET has been actively conducted. This allows analysis of tumor aggressiveness or behavior based on imaging features extracted from imaging studies and can predict recurrence of disease and survival [17-19]. Also, the next generation sequencing technique has been used widely to find genetic information such as mutation and gene expression profile in the field of cancer genomics. If imaging features and genetic information related to prognosis of SGCs can be incorporated into our machine models, a more robust and accurate prediction model will be expected to be constructed. Then, patients can be classified by risk stratification using a robust prediction model, a precision medicine can be proposed. In other words, individualized therapy such as intensified therapy for high-risk patients and deintensified therapy for low-risk patients can be performed.

Considering the nature of machine learning models, it is a well-known fact that the performance of the model could be improved if high-quality, large-scale data are available. In general, if about 10000 patients are secured, sufficient performance can be expected in building a machine learning model in general. Also, considering the difference of prognosis according to the histological grade of SGC is significant, it is ideal to establish an independent model according to the lowgrade or high-grade tissue type. However, due to the rarity of SGC, there are practical problems to collect a sufficient number of patients through a single-institutional study. This limitation needs to be solved through collaborative research not only from multi-institutions but also from several countries. Before that, for large-scale research to be conducted, this kind of background clinical study and a report on the clinical applicability of a machine learning model must be accumulated.

To overcome those limitations, we will secure a sufficient number of patients through a multicenter clinical study planned in the future. Through additional multi-center clinical research, we will conduct external validation of the established model, and furthermore we will implement more robust predictive model construction and performance verification.

In conclusion, a survival prediction model using machine learning techniques showed exert acceptable performance in predicting survival of SGC patients. Although large-scale clinical and multicenter studies should be conducted to establish more powerful predictive model, we expect that individualized treatment will be possible according to risk stratification using a machine learning model.

Acknowledgements

This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (NRF-2020R1F1A1052903).

Notes

Author Contribution

Conceptualization: Min Cheol Jeong, Young Min Park, Yoon Woo Koh, Eun Chang Choi, Se-Heon Kim. Data curation: Min Cheol Jeong, Young Min Park, Jae-Yol Lim, Yoon Woo Koh, Se-Heon Kim. Formal analysis: Young Min Park, Yoon Woo Koh, Eun Chang Choi, Se-Heon Kim. Funding acquisition: Young Min Park, Eun Chang Choi, Se-Heon Kim. Investigation: Young Min Park, Eun Chang Choi. Methodology: Young Min Park, Jae-Yol Lim. Project administration: Young Min Park, Yoon Woo Koh, Se-Heon Kim. Resources: Min Cheol Jeong, Young Min Park, Eun Chang Choi. Software: Young Min Park. Supervision: Young Min Park, Yoon Woo Koh, Eun Chang Choi, Se-Heon Kim. Validation: Young Min Park, Eun Chang Choi, Se-Heon Kim. Writing—original draft: Min Cheol Jeong, Young Min Park, Yoon Woo Koh, Se-Heon Kim. Writing—review & editing: Young Min Park, Jae-Yol Lim, Yoon Woo Koh, Eun Chang Choi, Se-Heon Kim.

References

1. Lewis AG, Tong T, Maghami E. Diagnosis and management of malignant salivary gland tumors of the parotid gland. Otolaryngol Clin North Am 2016;49(2):343–80.

2. Ali S, Palmer FL, Yu C, DiLorenzo M, Shah JP, Kattan MW, et al. Postoperative nomograms predictive of survival after surgical management of malignant tumors of the major salivary glands. Ann Surg Oncol 2014;21(2):637–42.

3. Ali S, Bryant R, Palmer FL, DiLorenzo M, Shah JP, Patel SG, et al. Distant metastases in patients with carcinoma of the major salivary glands. Ann Surg Oncol 2015;22(12):4014–9.

4. Ettl T, Gosau M, Brockhoff G, Schwarz-Furlan S, Agaimy A, Reichert TE, et al. Predictors of cervical lymph node metastasis in salivary gland cancer. Head Neck 2014;36(4):517–23.

5. Saintigny P, Zhang L, Fan YH, El-Naggar AK, Papadimitrakopoulou VA, Feng L, et al. Gene expression profiling predicts the development of oral cancer. Cancer Prev Res (Phila) 2011;4(2):218–29.

6. Kann BH, Aneja S, Loganadane GV, Kelly JR, Smith SM, Decker RH, et al. Pretreatment identification of head and neck cancer nodal metastasis and extranodal extension using deep learning neural networks. Sci Rep 2018;8(1):14036.

7. Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA 1982;247(18):2543–6.

8. Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 2018;18(1):24.

9. Ishwaran H, Gerds TA, Kogalur UB, Moore RD, Gange SJ, Lau BM. Random survival forests for competing risks. Biostatistics 2014;15(4):757–73.

10. Su X, Zhou T, Yan X, Fan J, Yang S. Interaction trees with censored survival data. Int J Biostat 2008;4(1)Article 2.

11. Lombardi D, McGurk M, Vander Poorten V, Guzzo M, Accorona R, Rampinelli V, et al. Surgical treatment of salivary malignant tumors. Oral Oncol 2017;65:102–13.

12. Godballe C, Schultz JH, Krogdahl A, Møller-Grøntved A, Johansen J. Parotid carcinoma: Impact of clinical factors on prognosis in a histologically revised series. Laryngoscop 2003;113(8):1411–7.

13. Vander Poorten VL, Balm AJ, Hilgers FJ, Tan IB, Loftus-Coll BM, Keus RB, et al. The development of a prognostic score for patients with parotid carcinoma. Cancer 1999;85(9):2057–67.

14. Hay A, Migliacci J, Zanoni DK, Patel S, Yu C, Kattan MW, et al. Validation of nomograms for overall survival, cancer-specific survival, and recurrence in carcinoma of the major salivary glands. Head Neck 2018;40(5):1008–15.

15. Mannelli G, Alessandro F, Martina F, Lorenzo C, Bettiol A, Vannacci A, et al. Nomograms predictive for oncological outcomes in malignant parotid tumours: Recurrence and mortality rates of 228 patients from a single institution. Eur Arch Otorhinolaryngol In press. 2019;

16. Balachandran VP, Gonen M, Smith JJ, DeMatteo RP. Nomograms in oncology: More than meets the eye. Lancet Oncol 2015;16(4):e173–80.

17. Chen F, Ma X, Li S, Li Z, Jia Y, Xia Y, et al. MRI-based radiomics of rectal cancer: Assessment of the local recurrence at the site of anastomosis. Acad Radiol 2021;28 Suppl 1:S87–94.

18. Zhang H, Hu S, Wang X, He J, Liu W, Yu C, et al. Prediction of cervical lymph node metastasis using MRI radiomics approach in papillary thyroid carcinoma: A feasibility study. Technol Cancer Res Treat 2020;19:1533033820969451.

19. Zhang M, Bao Y, Rui W, Shangguan C, Liu J, Xu J, et al. Performance of 18F-FDG PET/CT radiomics for predicting EGFR mutation status in patients with non-small cell lung cancer. Front Oncol 2020;10:568857.

Article information Continued

This is an open access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table 1.

Information of all patients with salivary gland cancer enrolled in the study

Variables	All patients (n=460)	Test set (n=115)	Training set (n=345)	p-value
Sex				0.332
Female	220 (47.8)	50 (43.5)	170 (49.3)
Male	240 (52.2)	65 (56.5)	175 (50.7)
Age (yr), mean (range)	53.7 (14-99)	52.5±17.1	54.0±17.1	0.399
Subsite				0.722
Minor	6 (1.3)	1 (0.9)	5 (1.4)
Parotid	388 (84.3)	96 (83.5)	292 (84.6)
SLG	20 (4.3)	7 (6.1)	13 (3.8)
SMG	46 (10.0)	11 (9.6)	35 (10.1)
FN palsy				0.415
No	426 (92.6)	104 (90.4)	322 (93.3)
Yes	34 (7.4)	11 (9.6)	23 (6.7)
Adjuvant Tx				0.573
RTx	184 (40.0)	45 (39.1)	139 (40.3)
CCRTx	70 (15.2)	21 (18.3)	49 (14.2)
None	206 (44.8)	49 (42.6)	157 (45.5)
Pathology				0.133
Acinic cell ca	59	7 (6.1)	52 (15.1)
Adenoca, NOS	18	4 (3.5)	14 (4.1)
Adenoid cystic ca	63	19 (16.5)	44 (12.8)
Basal cell adenoca	14	4 (3.5)	10 (2.9)
CXPA	36	8 (7.0)	28 (8.1)
Cribriform cystadenocarcinoma	1	0 (0.0)	1 (0.3)
Epithelial-myoepithelial ca	20	5 (4.3)	15 (4.3)
Lymphoepithelial ca	4	0 (0.0)	4 (1.2)
Mucoepidermoid ca	157	52 (45.2)	105 (30.4)
Oncocytic ca	3	1 (0.9)	2 (0.6)
Salivary duct ca	53	10 (8.7)	43 (12.5)
SCCa	26	5 (4.3)	21 (6.1)
Secretary ca	6	0 (0.0)	6 (1.7)
TNM stage				0.614
I	129 (28.0)	37 (32.2)	92 (26.7)
II	153 (33.3)	35 (30.4)	118 (34.2)
III	75 (16.3)	20 (17.4)	55 (15.9)
IV	103 (22.4)	23 (20.0)	80 (23.2)
LVI				0.241
No	386 (83.9)	101 (87.8)	285 (82.6)
Yes	74 (16.1)	14 (12.2)	60 (17.4)
PNI				0.951
No	343 (74.6)	85 (73.9)	258 (74.8)
Yes	117 (25.4)	30 (26.1)	87 (25.2)
ENE				0.999
No	390 (84.8)	98 (85.2)	292 (84.6)
Yes	70 (15.2)	17 (14.8)	53 (15.4)
Margin				0.105
Negative	314 (68.3)	71 (61.7)	243 (70.4)
Positive	146 (31.7)	44 (38.3)	102 (29.6)
Recurrence	105 (22.8)	27 (23.5)	78 (22.6)	0.949
Death events	69 (15.0)	16 (13.9)	53 (15.4)	0.821

Data are presented as No. of pts (%). SLG, sublingual gland; SMG, submandibular gland; FN, facial nerve; Tx, treatment; RTx, radiotherapy; CCRTx, concurrent chemoradiation; ca, carcinoma; NOS, not otherwise specified; CXPA, carcinoma ex pleomorphic adenoma; SCCa, squamous cell carcinoma; LVI, lymphovascular invasion; PNI, perineural invasion; ENE, extranodal extension