Empirical Comparison of EEG Signal Classification Techniques through Genetic Programming-based AutoML: An Extended Study
DOI:
https://doi.org/10.5753/jidm.2024.3369Keywords:
AutoML, Classification, EEG, End-to-end Machine Learning, Genetic Programming, Sleep SpindlesAbstract
Machine Learning (ML) applications using complex data often need multiple preprocessing techniques and predictive models to find a solution that meets their needs. In this context, Automated Machine Learning (AutoML) techniques help to provide automated data preparation and modeling and improve ML pipelines. AutoML can follow different strategies, among them Genetic Programming (GP). GP stands out for its ability to create pipelines of arbitrary format, with high interpretability and the ability to customize information from the data domain context. This paper presents a comparative study of two AutoML approaches optimized with GP for the time series classification problem and its characterization through four domain-based feature sets. We selected the Electroencephalogram (EEG) signals as a case of study due to their high complexity, spatial and temporal co-variance, and non-stationarity. Our data characterization shows that using only spectral or time-domain features is unsuitable for achieving high-performance pipelines. Our results reveal how AutoML can generate more accurate and interpretable solutions than the literature's complex or ad hoc models. The proposed approach facilitates the analysis of dimensional reduction through fitness convergence, tree depth, and generated features.
Downloads
References
Acharya, U. R., Hagiwara, Y., Deshpande, S. N., Suren, S., Koh, J. E. W., Oh, S. L., Arunkumar, N., Ciaccio, E. J., and Lim, C. M. (2019). Characterization of focal eeg signals: a review. Future Generation Computer Systems, 91:290–299.
Ahmed, B., Redissi, A., and Tafreshi, R. (2009). A characterization of sleep spindles in eeg. In World Congress on Medical Physics and Biomedical Engineering, September 7 - 12, 2009, Munich, Germany. Springer Berlin Heidelberg.
Al-Salman, W., Li, Y., and Wen, P. (2019). Detecting sleep spindles in eegs using wavelet fourier analysis and statistical features. Biomedical Signal Processing and Control, 48:80–92.
Amin, H. U., Malik, A. S., Ahmad, R. F., Badruddin, N., Kamel, N., Hussain, M., and Chooi, W.-T. (2015). Feature extraction and classification for eeg signals using wavelet transform and machine learning techniques. Australasian physical & engineering sciences in medicine, 38(1):139–149.
Arzani, B., Hsieh, K., and Chen, H. (2021). Interpretable feedback for automl and a proposal for domain-customized automl for networking. In Proceedings of the Twentieth ACM Workshop on Hot Topics in Networks, pages 53–60.
Azevedo, A. I. R. L. and Santos, M. F. (2008). Kdd, semma and crisp-dm: a parallel overview. IADS-DM.
Bellman, R. E. et al. (1957). Dynamic programming, ser. Cambridge Studies in Speech Science and Communication. Princeton University Press, Princeton.
Bisong, E. (2019a). Google automl: cloud vision. In Building Machine Learning and Deep Learning Models on Google Cloud Platform, pages 581–598. Springer.
Bisong, E. (2019b). An overview of google cloud platform services. Building Machine Learning and Deep Learning Models on Google Cloud Platform, pages 7–10.
Bontempi, G., Taieb, S. B., and Le Borgne, Y.-A. (2012). Machine learning strategies for time series forecasting. In European business intelligence summer school, pages 62–77. Springer.
Bosch, N. et al. (2021). Automl feature engineering for student modeling yields high accuracy, but limited interpretability. Journal of Educational Data Mining, 13(2):55–79.
Cerrada, M., Trujillo, L., Hernández, D. E., Correa Zevallos, H. A., Macancela, J. C., Cabrera, D., and Vinicio Sánchez, R. (2022). Automl for feature selection and model tuning applied to fault severity diagnosis in spur gearboxes. Mathematical and Computational Applications, 27(1):6.
Clemens, Z., Fabo, D., and Halasz, P. (2005). Overnight verbal memory retention correlates with the number of sleep spindles. Neuroscience, 132(2):529–535.
Corradino, C. and Bucolo, M. (2015). Automatic preprocessing of eeg signals in long time scale. In 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 4110–4113. IEEE.
de Sá, A. G., Freitas, A. A., and Pappa, G. L. (2018). Automated selection and configuration of multi-label classification algorithms with grammar-based genetic programming. In International Conference on Parallel Problem Solving from Nature, pages 308–320. Springer.
Devuyst, S., Dutoit, T., Didier, J.-F., Meers, F., Stanus, E., Stenuit, P., and Kerkhofs, M. (2006). Automatic sleep spindle detection in patients with sleep disorders. In 2006 International Conference of the IEEE Engineering in Medicine and Biology Society, pages 3883–3886. IEEE.
Devuyst, S., Dutoit, T., Stenuit, P., and Kerkhofs, M. (2011). Automatic sleep spindles detection—overview and development of a standard proposal assessment method. In 2011 Annual international conference of the IEEE engineering in medicine and biology society, pages 1713–1716. IEEE.
Drozdal, J., Weisz, J., Wang, D., Dass, G., Yao, B., Zhao, C., Muller, M., Ju, L., and Su, H. (2020). Trust in automl: exploring information needs for establishing trust in automated machine learning systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces, pages 297–307.
Dua, D. and Graff, C. (2017). UCI machine learning repository.
Eberhard, P., Schiehlen, W., and Bestle, D. (1999). Some advantages of stochastic methods in multicriteria optimization of multibody systems. Archive of Applied Mechanics, 69(8):543–554.
Fabris, F. and Freitas, A. A. (2019). Analysing the overfit of the auto-sklearn automated machine learning tool. In International Conference on Machine Learning, Optimization, and Data Science, pages 508–520. Springer.
Ferreira, L., Pilastri, A., Martins, C. M., Pires, P. M., and Cortez, P. (2021). A comparison of automl tools for machine learning, deep learning and xgboost. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.
Geurts, P. (2001). Pattern extraction for time series classification. In European Conference on Principles of Data Mining and Knowledge Discovery, pages 115–127. Springer.
Gijsbers, P., LeDell, E., Poirier, S., Thomas, J., Bischl, B., and Vanschoren, J. (2019). An open source automl benchmark. arXiv preprint arXiv:1907.00909 [cs.LG]. Accepted at AutoML Workshop at ICML 2019.
Gomez-Pilar, J., Gutiérrez-Tobal, G. C., Poza, J., Fogel, S., Doyon, J., Northoff, G., and Hornero, R. (2021). Spectral and temporal characterization of sleep spindles— methodological implications. Journal of Neural Engineering, 18(3):036014.
Gonçalves, I., Silva, S., Melo, J. B., and Carreiras, J. (2012). Random sampling technique for overfitting control in genetic programming. In European Conference on Genetic Programming, pages 218–229. Springer.
Guo, L., Rivero, D., Dorado, J., Munteanu, C. R., and Pazos, A. (2011). Automatic feature extraction using genetic programming: An application to epileptic eeg classification. Expert Systems with Applications, 38(8):10425–10436.
Hazarika, N., Chen, J. Z., Tsoi, A. C., and Sergejew, A. (1997). Classification of eeg signals using the wavelet transform. Signal processing, 59(1):61–72.
Hughes, G. (1968). On the mean accuracy of statistical pattern recognizers. IEEE transactions on information theory, 14(1):55–63.
Hutter, F., Caruana, R., Bardenet, R., Bilenko, M., Guyon, I., Kegl, B., and Larochelle, H. (2014). Automl 2014@ icml. In AutoML 2014 Workshop@ ICML.
Hutter, F., Kotthoff, L., and Vanschoren, J. (2019). Automated Machine Learning. Springer.
Ilyas, M. Z., Saad, P., and Ahmad, M. I. (2015). A survey of analysis and classification of eeg signals for brain-computer interfaces. In 2015 2nd International Conference on Biomedical Engineering (ICoBE), pages 1–6. IEEE.
Iranmanesh, S. and Rodriguez-Villegas, E. (2017). An ultralow-power sleep spindle detection system on chip. IEEE transactions on biomedical circuits and systems, 11(4):858–866.
Kevric, J. and Subasi, A. (2017). Comparison of signal decomposition methods in classification of eeg signals for motor-imagery bci system. Biomedical Signal Processing and Control, 31:398–406.
Koza, J. R. (1994). Genetic programming as a means for programming computers by natural selection. Statistics and computing, 4(2):87–112.
Lachner-Piza, D., Epitashvili, N., Schulze-Bonhage, A., Stieglitz, T., Jacobs, J., and Dümpelmann, M. (2018). A single channel sleep-spindle detector based on multivariate classification of eeg epochs: Mussdet. Journal of neuroscience methods, 297:31–43.
LeDell, E. and Poirier, S. (2020). H2O AutoML: Scalable automatic machine learning. 7th ICML Workshop on Automated Machine Learning (AutoML).
Lubba, C. H., Sethi, S. S., Knaute, P., Schultz, S. R., Fulcher, B. D., and Jones, N. S. (2019). catch22: Canonical time-series characteristics. Data Mining and Knowledge Discovery, 33(6):1821–1852.
Mei, N., Grossberg, M. D., Ng, K., Navarro, K. T., and Ellmore, T. M. (2017). Identifying sleep spindles with multichannel eeg and classification optimization. Computers in biology and medicine, 89:441–453.
Miranda, I. M., Aranha, C., de Carvalho, A. P. L., and Garcia, L. P. F. (2022). Genetic programming-based automl for eeg signal classification - a comparative study. In Anais do X Symposium on Knowledge Discovery, Mining and Learning. SBC.
Miranda, Í. M., Aranha, C., and Ladeira, M. (2019). Classification of eeg signals using genetic programming for feature construction. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 1275–1283.
Motamedi-Fakhr, S., Moshrefi-Torbati, M., Hill, M., Hill, C. M., and White, P. R. (2014). Signal processing techniques applied to human sleep eeg signals—a review. Biomedical Signal Processing and Control.
Niedermeyer, E. and Ribeiro, M. (2000). Considerations of nonconvulsive status epilepticus. Clinical Electroencephalography, 31(4):192–195.
Olson, R. S. and Moore, J. H. (2019). Tpot: A tree-based pipeline optimization tool for automating machine learning. In Automated Machine Learning, pages 151–160. Springer.
Olson, R. S., Urbanowicz, R. J., Andrews, P. C., Lavender, N. A., Moore, J. H., et al. (2016). Automating biomedical data science through tree-based pipeline optimization. In European Conference on the Applications of Evolutionary Computation, pages 123–137. Springer.
O’Reilly, C. and Nielsen, T. (2014). Assessing eeg sleep spindle propagation. part 2: experimental characterization. Journal of Neuroscience Methods, 221:215–227.
Parekh, A., Selesnick, I. W., Rapoport, D. M., and Ayappa, I. (2015). Detection of k-complexes and sleep spindles (detoks) using sparse optimization. Journal of neuroscience methods, 251:37–46.
Patti, C. R., Chaparro-Vargas, R., and Cvetkovic, D. (2014). Automated sleep spindle detection using novel eeg features and mixture models. In 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 2221–2224. IEEE.
Patti, C. R., Shahrbabaki, S. S., Dissanayaka, C., and Cvetkovic, D. (2015). Application of random forest classifier for automatic sleep spindle detection. In 2015 IEEE Biomedical Circuits and Systems Conference (BioCAS), pages 1–4. IEEE.
Poli, R., Langdon, W. B., and McPhee, N. F. (2008). A field guide to genetic programming.
Purcell, S., Manoach, D., Demanuele, C., Cade, B., Mariani, S., Cox, R., Panagiotaropoulou, G., Saxena, R., Pan, J., Smoller, J., et al. (2017). Characterizing sleep spindles in 11,630 individuals from the national sleep research resource. Nature communications, 8(1):1–16.
Shang, Z., Zgraggen, E., Buratti, B., Kossmann, F., Eichmann, P., Chung, Y., Binnig, C., Upfal, E., and Kraska, T. (2019). Democratizing data science through interactive curation of ml pipelines. In Proceedings of the 2019 international conference on management of data, pages 1171–1188.
Suchopárová, G. and Neruda, R. (2020). Genens: An automl system for ensemble optimization based on developmental genetic programming. In 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pages 631–638. IEEE.
Tran, B., Xue, B., and Zhang, M. (2016). Genetic programming for feature construction and selection in classification on high-dimensional data. Memetic Computing, 8(1):3–15.
Tsanas, A. and Clifford, G. D. (2015). Stage-independent, single lead eeg sleep spindle detection using the continuous wavelet transform and local weighted smoothing. Frontiers in human neuroscience, 9:181.
Tuggener, L., Amirian, M., Rombach, K., Lörwald, S., Varlet, A., Westermann, C., and Stadelmann, T. (2019). Automated machine learning in practice: state of the art and recent results. In 2019 6th Swiss Conference on Data Science (SDS), pages 31–36. IEEE.
Unser, M. and Aldroubi, A. (1996). A review of wavelets in biomedical applications. Proceedings of the IEEE, 84(4):626–638.
Vanneschi, L., Castelli, M., and Silva, S. (2010). Measuring bloat, overfitting and functional complexity in genetic programming. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 877–884.
Wang, H., Ma, C., and Zhou, L. (2009). A brief review of machine learning and its application. In 2009 international conference on information engineering and computer science, pages 1–4. IEEE.
Weiner, O. M. and Dang-Vu, T. T. (2016). Spindle oscillations in sleep disorders: a systematic review. Neural plasticity, 2016.
Xanthopoulos, I., Tsamardinos, I., Christophides, V., Simon, E., and Salinger, A. (2020). Putting the human back in the automl loop. In EDBT/ICDT Workshops.
Xin, D., Wu, E. Y., Lee, D. J.-L., Salehi, N., and Parameswaran, A. (2021). Whither automl? understanding the role of automation in machine learning workflows. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–16.
Yamazaki, M., Tucker, D. M., Fujimoto, A., Yamazoe, T., Okanishi, T., Yokota, T., Enoki, H., and Yamamoto, T. (2012). Comparison of dense array eeg with simultaneous intracranial eeg for interictal spike detection and localization. Epilepsy research, 98(2-3):166–173.
Yang, F., Elmer, J., and Zadorozhny, V. I. (2021). Smart-prognosis: Automatic ensemble classification for quantitative eeg analysis in patients resuscitated from cardiac arrest. Knowledge-Based Systems, 212:106579.
Zhuang, X., Li, Y., and Peng, N. (2016). Enhanced automatic sleep spindle detection: a sliding window-based wavelet analysis and comparison using a proposal assessment method. In Applied Informatics, volume 3. SpringerOpen.
Zöller, M.-A. and Huber, M. F. (2021). Benchmark and survey of automated machine learning frameworks. Journal of Artificial Intelligence Research, 70:409–472.