A Semiparametric Approach to Mitigating the Impact of Outliers in ROC Curve Generation for Image Analysis

Regis Cortez Bueno; Renan Gimeniz Marques; Raphael Antonio de Souza; Sivanilza Teixeira Machado

doi:10.5753/jbcs.2025.5288

Authors

Regis Cortez Bueno Federal Institute of São Paulo https://orcid.org/0000-0002-2923-4930
Renan Gimeniz Marques Federal Institute of São Paulo https://orcid.org/0009-0008-0782-0614
Raphael Antonio de Souza Federal Institute of São Paulo https://orcid.org/0000-0002-0952-1887
Sivanilza Teixeira Machado Federal Institute of São Paulo https://orcid.org/0000-0003-2746-7885

DOI:

https://doi.org/10.5753/jbcs.2025.5288

Keywords:

Receiver Operating Characteristic, Roc Curve, ROC Analysis, Images Analysis, Outliers

Abstract

Artificial intelligence enables the development of machine learning algorithms that can identify and categorize patterns using large amounts of data across various areas. Computational tools were created to analyze these algorithms, allowing for the validation and comparison of their results. The Receiver Operating Characteristic (ROC) is an important statistical technique used for analyzing binary classification models. A ROC curve is commonly utilized in image analysis as a validation metric to compare images generated by a classification model with images created by humans, referred to as Ground Truth (GT). Currently, machine learning algorithms produce ROC curves with a limited number of points, even when trained on large-scale datasets. The result is the presence of outliers which can significantly distort the ROC curve, potentially leading to inaccurate conclusions about the model's performance. This study introduces a novel method for preventing outliers in the creation of ROC curves, guaranteeing a reliable and robust evaluation of image classification models. We implemented our algorithm in Python using a dataset of 1000 grayscale contour images. Performance was compared against Logistic Regression, SVM, Random Forests and SKlearn using ROC curves, AUC, precision, accuracy, and F1-score. Statistical significance was assessed via paired t-tests and Cohen’s d for effect size, with outlier detection via Local Outlier Factor. Results demonstrated that SPROC showed a refined curve with more precise AUC values on noisy images in contrast to machine learning approaches.

Downloads

Download data is not yet available.

Author Biographies

Renan Gimeniz Marques, Federal Institute of São Paulo

Undergraduate Student in Control and Automation Engineering.

He is assigned to the image pattern recognition laboratory on campus, where he conducted an undergraduate research project in the field of image processing and ROC curves.

Raphael Antonio de Souza, Federal Institute of São Paulo

He holds a master's degree in computer science, is a professor in engineering and business administration programs, and is a researcher at the image pattern recognition laboratory, working on research involving dropout prediction and machine learning. He uses metrics derived from ROC curves to measure the performance of algorithms.

Sivanilza Teixeira Machado, Federal Institute of São Paulo

She is a professor at the Federal Institute of São Paulo (IFSP) – Suzano Campus. She holds a degree in Logistics with an emphasis on Transportation from the Faculty of Technology of São Paulo (FATEC). She is a specialist in Agribusiness Marketing from the Federal University of Paraná (UFPR), holds a master's degree in Agricultural Engineering from the Federal University of Grande Dourados (UFGD), and a Ph.D. in Production Engineering from Paulista University (UNIP). Furthermore, she conducts research in data analysis in the fields of logistics, agribusiness supply chains, and supply networks.

References

Alghushairy, O., Alsini, R., Soule, T., and Ma, X. (2020). A review of local outlier factor algorithms for outlier detection in big data streams. DOI: 10.3390/bdcc5010001.

Breiman, L. (2001). Random forests. 45:5-32. DOI: 10.1023/A:1010933404324.

Bueno, R. C., Masotti, P. H., Justo, J. F., Andrade, D. A., Rocha, M. S., Torres, W. M., and de Mesquita, R. N. (2018). Two-phase flow bubble detection method applied to natural circulation system using fuzzy image processing. Nuclear Engineering and Design, 335:255-264. DOI: 10.1016/j.nucengdes.2018.05.026.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1):155-159. DOI: 10.1037/0033-2909.112.1.155.

Cook, J. A. (2017). Roc curves and nonrandom data. Pattern Recognition Letters, 85:35-41. DOI: 10.1016/j.patrec.2016.11.015.

Du, Z. X., Chang, F. Q., Wang, Z. J., Zhou, D. M., Li, Y., and Yang, J. H. (2022). A risk prediction model for acute kidney injury in patients with pulmonary tuberculosis during anti-tuberculosis treatment. Renal Failure, 44:625-635. DOI: 10.1080/0886022X.2022.2058405.

Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27:861-874. DOI: 10.1016/j.patrec.2005.10.010.

Friedman, N., Geiger, D., Provan, G., Langley, P., and Smyth, P. (1997). Bayesian network classifiers *. 29:131-163. Available at:[link].

Gao, Y., Li, T., Han, M., Li, X., Wu, D., Xu, Y., Zhu, Y., Liu, Y., Wang, X., and Wang, L. (2020). Diagnostic utility of clinical laboratory data determinations for patients with the severe covid-19. Journal of Medical Virology, 92:791-796. DOI: 10.1002/jmv.25770.

Ghamry, F. M., El-Banby, G. M., El-Fishawy, A. S., El-Samie, F. E., and Dessouky, M. I. (2024). A survey of anomaly detection techniques. Journal of Optics (India), 53:756-774. DOI: 10.1007/s12596-023-01147-4.

Hannun, A. Y., Rajpurkar, P., Haghpanahi, M., Tison, G. H., Bourn, C., Turakhia, M. P., and Ng, A. Y. (2019). Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine, 25:65-69. DOI: 10.1038/s41591-018-0268-3.

He, X., Gallas, B. D., and Frey, E. C. (2010). Three-class roc analysistoward a general decision theoretic solution. IEEE Transactions on Medical Imaging, 29:206-215. DOI: 10.1109/TMI.2009.2034516.

Hong, H., Liu, J., Bui, D. T., Pradhan, B., Acharya, T. D., Pham, B. T., Zhu, A. X., Chen, W., and Ahmad, B. B. (2018). Landslide susceptibility mapping using j48 decision tree with adaboost, bagging and rotation forest ensembles in the guangchang area (china). Catena, 163:399-413. DOI: 10.1016/j.catena.2018.01.005.

Keidar, D., Yaron, D., Goldstein, E., Shachar, Y., Blass, A., Charbinsky, L., Aharony, I., Lifshitz, L., Lumelsky, D., Neeman, Z., Mizrachi, M., Hajouj, M., Eizenbach, N., Sela, E., Weiss, C. S., Levin, P., Benjaminov, O., Shabshin, N., Elyada, Y. M., and Eldar, Y. C. (2020). Covid-19 classification of x-ray images using deep neural networks. DOI: 10.1007/s00330-021-08050-1/Published.

Khawaja, A. M., Asayesh, B. M., Hainzl, S., and Schorlemmer, D. (2023). Towards improving the spatial testability of aftershock forecast models. Natural Hazards and Earth System Sciences, 23:2683-2696. DOI: 10.5194/nhess-23-2683-2023.

Khosravi, K., Pham, B. T., Chapi, K., Shirzadi, A., Shahabi, H., Revhaug, I., Prakash, I., and Bui, D. T. (2018). A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at haraz watershed, northern iran. Science of the Total Environment, 627:744-755. DOI: 10.1016/j.scitotenv.2018.01.266.

Kun-Peng, Z., Xiao-Long, M., and Chun-Lin, Z. (2018). Overexpressed circpvt1, a potential new circular rna biomarker, contributes to doxorubicin and cisplatin resistance of osteosarcoma cells by regulating abcb1. International Journal of Biological Sciences, 14:321-330. DOI: 10.7150/ijbs.24360.

Li, K., Fang, Y., Li, W., Pan, C., Qin, P., Zhong, Y., Liu, X., Huang, M., Liao, Y., and Li, S. (2020). Ct image visual quantitative evaluation and clinical classification of coronavirus disease (covid-19). DOI: 10.1007/s00330-020-06817-6/Published.

Li, M., Lin, Z., Mech, R., Yumer, E., and Ramanan, D. (2019). Photo-sketching: Inferring contour drawings from images. DOI: 10.1109/wacv.2019.00154.

Martin, O. (2024). Bayesian Analysis with Python - Third Edition: A Practical Guide to Probabilistic Modeling. Packt Publishing. Book.

McGowan, L. D., Bullen, J. A., and Obuchowski, N. A. (2016). Location bias in roc studies. Statistics in Biopharmaceutical Research, 8:258-267. DOI: 10.1080/19466315.2016.1173583.

Moreira, D. (2020). Comparing empirical roc curves using a java application: Cercus. DOI: 10.1007/978-3-030-24302-9_3.

Nahm, F. S. (2022). Receiver operating characteristic curve: overview and practical use for clinicians. Korean Journal of Anesthesiology, 75:25-36. DOI: 10.4097/kja.21209.

Niu, M., Song, K., Huang, L., Wang, Q., Yan, Y., and Meng, Q. (2021). Unsupervised saliency detection of rail surface defects using stereoscopic images. IEEE Transactions on Industrial Informatics, 17:2271-2281. DOI: 10.1109/TII.2020.3004397.

Pourghasemi, H. R. and Rahmati, O. (2018). Prediction of the landslide susceptibility: Which algorithm, which precision? Catena, 162:177-192. DOI: 10.1016/j.catena.2017.11.022.

Sachs, M. C. (2017). Plotroc: A tool for plotting roc curves. Journal of Statistical Software, 79. DOI: 10.18637/jss.v079.c02.

Schott, S. M. C., da Silva, M. C. B., de Andrade, D. A., and de Mesquita, R. N. (2024). Convolutional neural network-based pattern recognition in natural circulation instability images. Concilium, 24:267-288. DOI: 10.53660/clm-2919-24d10.

Shkurnikov, M., Nersisyan, S., Jankevic, T., Galatenko, A., Gordeev, I., Vechorko, V., and Tonevitsky, A. (2021). Association of hla class i genotypes with severity of coronavirus disease-19. Frontiers in Immunology, 12. DOI: 10.3389/fimmu.2021.641900.

Student (1908). The probable error of a mean. Biometrika, 6(1):1-25. DOI: 10.2307/2331554.

Termeh, S. V. R., Kornejady, A., Pourghasemi, H. R., and Keesstra, S. (2018). Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms. Science of the Total Environment, 615:438-451. DOI: 10.1016/j.scitotenv.2017.09.262.

Wang, D., Fan, G., Wu, S., Yang, T., Xu, J., Yang, L., Zhao, J., Zhang, X., Bai, C., Kang, J., Ran, P., Shen, H., Wen, F., Huang, K., Chen, Y., Sun, T., Shan, G., Lin, Y., Xu, G., Wang, R., Shi, Z., Xu, Y., Ye, X., Song, Y., Wang, Q., Zhou, Y., Li, W., Ding, L., Wan, C., Yao, W., Guo, Y., Xiao, F., Lu, Y., Peng, X., Zhang, B., Xiao, D., Wang, Z., Bu, X., Zhang, H., Zhang, X., An, L., Zhang, S., Zhu, J., Cao, Z., Zhan, Q., Yang, Y., Liang, L., Dai, H., Cao, B., He, J., and Wang, C. (2022). Development and validation of a screening questionnaire of copd from a large epidemiological study in china. COPD: Journal of Chronic Obstructive Pulmonary Disease, 19:118-124. DOI: 10.1080/15412555.2022.2042504.

Wu, J.-p., Ding, W.-Z., Wang, Y.-L., Liu, S., Zhang, X.-q., Yang, Q., Cai, W.-J., Yu, X.-l., Liu, F.-y., Kong, D., et al. (2022). Radiomics analysis of ultrasound to predict recurrence of hepatocellular carcinoma after microwave ablation. International Journal of Hyperthermia, 39(1):595-604. DOI: 10.1080/02656736.2022.2062463.

Zeiler, M. D. and Fergus, R. (2014). Lncs 8689 - visualizing and understanding convolutional networks. CoRR, abs/1311.2901. DOI: 10.48550/arXiv.1311.2901.

Zhao, S., Pan, H., Guo, Q., Xie, W., and Wang, J. (2022). Platelet to white blood cell ratio was an independent prognostic predictor in acute myeloid leukemia. Hematology (United Kingdom), 27:426-430. DOI: 10.1080/16078454.2022.2055857.

Zhao, W., Lu, M., Wang, X., and Guo, Y. (2021). The role of sarcopenia questionnaires in hospitalized patients with chronic heart failure. Aging Clinical and Experimental Research, 33:339-344. DOI: 10.1007/s40520-020-01561-9.