Anomaly Detection in Sound Activity with Generative Adversarial Network Models

Wilson A. de Oliveira Neto; Elloá B. Guedes; Carlos Maurício S. Figueiredo

doi:10.5753/jisa.2024.3897

Authors

Wilson A. de Oliveira Neto Amazonas Federal University https://orcid.org/0000-0001-9027-3966
Elloá B. Guedes Amazonas State University https://orcid.org/0000-0002-7264-701X
Carlos Maurício S. Figueiredo Amazonas State University https://orcid.org/0000-0002-4484-4411

DOI:

https://doi.org/10.5753/jisa.2024.3897

Keywords:

Anomaly Detection, Sound Activity, Generative Adversarial Networks, Deep Learning

Abstract

In state-of-art anomaly detection research, prevailing methodologies predominantly employ Generative Adversarial Networks and Autoencoders for image-based applications. Despite the efficacy demonstrated in the visual domain, there remains a notable dearth of studies showcasing the application of these architectures in anomaly detection within the sound domain. This paper introduces tailored adaptations of cutting-edge architectures for anomaly detection in audio and conducts a comprehensive comparative analysis to substantiate the viability of this novel approach. The evaluation is performed on the DCASE 2020 dataset, encompassing over 180 hours of industrial machinery sound recordings. Our results indicate superior anomaly classification, with an average Area Under the Curve (AUC) of 88.16% and partial AUC of 78.05%, surpassing the performance of established baselines. This study not only extends the applicability of advanced architectures to the audio domain but also establishes their effectiveness in the challenging context of industrial sound anomaly detection.

Downloads

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Available online [link].

Akcay, S., Atapour-Abarghouei, A., and Breckon, T. (2019a). GANomaly: Semi-supervised anomaly detection via adversarial training. In Lecture Notes in Computer Science, page 622–637. Springer International Publishing. DOI: 10.1007/978-3-030-20893-6_39.

Akcay, S., Atapour-Abarghouei, A., and Breckon, T. (2019b). Skip-GANomaly: Skip connected and adversarially trained encoder-decoder anomaly detection. In 2019 International Joint Conference on Neural Networks (IJCNN), pages 1-8. IEEE. DOI: 10.1109/IJCNN.2019.8851808.

An, J. and Cho, S. (2015). Variational autoencoder based anomaly detection using reconstruction probability. Special lecture on IE, 2(1):1-18. Available online [link].

Bakir, H., c Cayir, A. N., and Navruz, T. S. (2024). A comprehensive experimental study for analyzing the effects of data augmentation techniques on voice classification. Multimedia Tools and Applications, 83(6):17601-17628. DOI: 10.1007/s11042-023-16200-4.

Chachada, S. and Kuo, C.-C. J. (2014). Environmental sound recognition: A survey. APSIPA Transactions on Signal and Information Processing, 3:e14. DOI: 10.1109/APSIPA.2013.6694338.

Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM Comput. Surv., 41(3). DOI: 10.1145/1541880.1541882.

Chen, S., Wang, J., Wang, J., and Xu, Z. (2024). Mdam: Multi-dimensional attention module for anomalous sound detection. In Luo, B., Cheng, L., Wu, Z.-G., Li, H., and Li, C., editors, Neural Information Processing, pages 48-60, Singapore. Springer Nature Singapore. DOI: 10.1007/978-981-99-8178-6_4.

Cheng, Z., Zhu, E., Wang, S., Zhang, P., and Li, W. (2021). Unsupervised outlier detection via transformation invariant autoencoder. IEEE Access, 9:43991-44002. DOI: 10.1109/ACCESS.2021.3065838.

Dwivedi, D., Ganguly, A., and Haragopal, V. (2023). Contrast between simple and complex classification algorithms. In Goswami, T. and Sinha, G., editors, Statistical Modeling in Machine Learning, pages 93-110. Academic Press. DOI: 10.1016/B978-0-323-91776-6.00016-6.

Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8):861–874. DOI: 10.1016/j.patrec.2005.10.010.

Gemmeke, J. F., Ellis, D. P. W., Freedman, D., Jansen, A., Lawrence, W., Moore, R. C., Plakal, M., and Ritter, M. (2017). Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 776-780. DOI: 10.1109/ICASSP.2017.7952261.

Giri, R., Tenneti, S. V., Helwani, K., Cheng, F., Isik, U., and Krishnaswamy, A. (2020). Unsupervised anomalous sound detection using self-supervised classification and group masked autoencoder for density estimation. Technical report, DCASE2020 Challenge. Available online [link].

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems, volume 27, pages 2672-2680, Barcelona. Curran Associates, Inc. Book.

Kawaguchi, Y., Imoto, K., Koizumi, Y., Harada, N., Niizumi, D., Dohi, K., Tanabe, R., Purohit, H., and Endo, T. (2021). Dcase 2021 challenge task 2 development dataset. DOI: 10.5281/zenodo.4562016.

Kingma, D. P. and Ba, J. L. (2015). Adam: a method for stochastic optimization. In Proceedings of 3rd International Conference on Learning Representations, San Diego, CA. ArXiv. DOI: 10.48550/arXiv.1412.6980.

Kittler, J., Kaloskampis, I., Zor, C., Xu, Y., Hicks, Y., and Wang, W. (2018). Intelligent signal processing mechanisms for nuanced anomaly detection in action audio-visual data streams. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6563-6567. DOI: 10.1109/ICASSP.2018.8461595.

Koizumi, Y., Kawaguchi, Y., and Imoto, K. (2020a). Description and discussion on dcase2020 challenge task2: unsupervised anomalous sound detection for machine condition monitoring. DOI: 10.48550/arXiv.2006.05822.

Koizumi, Y., Kawaguchi, Y., Imoto, K., Nakamura, T., Nikaido, Y., Tanabe, R., Purohit, H., Suefusa, K., Endo, T., Yasuda, M., and Harada, N. (2020b). Description and discussion on DCASE2020 challenge task2: Unsupervised anomalous sound detection for machine condition monitoring. DOI: 10.48550/arXiv.2006.05822.

Koizumi, Y., Saito, S., Uematsu, H., Harada, N., and Imoto, K. (2019). ToyADMOS: A dataset of miniature-machine operating sounds for anomalous sound detection. In Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 308-312. DOI: 10.1109/WASPAA.2019.8937164.

Langr, J. and Bok, V. (2019). Generative Adversarial Networks in Action - Deep Learning with Generative Adversarial Networks. Manning Publications, Shelter Island. Book.

Liu, G., Lan, S., Zhang, T., Huang, W., and Wang, W. (2021). Sagan: Skip-attention gan for anomaly detection. In 2021 IEEE International Conference on Image Processing (ICIP), pages 2468-2472. DOI: 10.1109/ICIP42928.2021.9506332.

Müller, R., Illium, S., and Linnhoff-Popien, C. (2021). Deep recurrent interpolation networks for anomalous sound detection. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1-7. DOI: 10.1109/IJCNN52387.2021.9533560.

Nanni, L., Maguolo, G., and Paci, M. (2020). Data augmentation approaches for improving animal audio classification. Ecological Informatics, 57:101084. DOI: 10.1016/j.ecoinf.2020.101084.

Neto, W. O. and Figueiredo, C. (2023). Análise de redes GANs para detecção de anomalias em atividade sonoras. In Anais do XV Simpósio Brasileiro de Computação Ubíqua e Pervasiva, pages 11-20, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/sbcup.2023.230034.

Pang, G., Shen, C., Cao, L., and Hengel, A. V. D. (2021). Deep learning for anomaly detection: A review. ACM Computing Surveys, 54(2):1–38. DOI: 10.1145/3439950.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024-8035. Curran Associates, Inc. Available online [link].

Purohit, H., Tanabe, R., Ichige, T., Endo, T., Nikaido, Y., Suefusa, K., and Kawaguchi, Y. (2019). MIMII Dataset: Sound dataset for malfunctioning industrial machine investigation and inspection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 209-213. DOI: 10.48550/arXiv.1909.09347.

Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597. DOI: 10.48550/arXiv.1505.04597.

Rovetta, S., Mnasri, Z., and Masulli, F. (2020). Detection of hazardous road events from audio streams: An ensemble outlier detection approach. In 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), pages 1-6. DOI: 10.1109/EAIS48028.2020.9122704.

Sabuhi, M., Zhou, M., Bezemer, C.-P., and Musilek, P. (2021). Applications of generative adversarial networks in anomaly detection: A systematic literature review. IEEE Access, 9:161003–161029. DOI: 10.1109/ACCESS.2021.3131949.

Salamon, J., Jacoby, C., and Bello, J. P. (2014). A dataset and taxonomy for urban sound research. In Proceedings of the 22nd ACM International Conference on Multimedia, MM ’14, page 1041–1044, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/2647868.2655045.

Schlegl, T., Seeböck, P., Waldstein, S. M., Langs, G., and Schmidt-Erfurth, U. (2019). f-anogan: Fast unsupervised anomaly detection with generative adversarial networks. Medical Image Analysis, 54:30 - 44. DOI: 10.1016/j.media.2019.01.010.

Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61:85-117. DOI: 10.1016/j.neunet.2014.09.003.

Suefusa, K., Nishida, T., Purohit, H., Tanabe, R., Endo, T., and Kawaguchi, Y. (2020). Anomalous sound detection based on interpolation deep neural network. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 271-275. DOI: 10.1109/ICASSP40776.2020.9054344.

Sutskever, I., Martens, J., Dahl, G., and Hinton, G. (2013). On the importance of initialization and momentum in deep learning. In Dasgupta, S. and McAllester, D., editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1139-1147, Atlanta, Georgia, USA. PMLR. Available online [link].

Tanabe, R., Purohit, H., Dohi, K., Endo, T., Nikaido, Y., Nakamura, T., and Kawaguchi, Y. (2021). MIMII DUE: Sound dataset for malfunctioning industrial machine investigation and inspection with domain shifts due to changes in operational and environmental conditions. In arXiv e-prints: 2006.05822, 1–4. DOI: 10.48550/arXiv.2105.02702.

Van Rossum, G. and Drake, F. L. (2009). Python 3 Reference Manual. CreateSpace, Scotts Valley, CA. Book.

Wang, Q., Du, J., Wu, H.-X., Pan, J., Ma, F., and Lee, C.-H. (2023). A four-stage data augmentation approach to resnet-conformer based acoustic modeling for sound event localization and detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:1251-1264. DOI: 10.1109/TASLP.2023.3256088.

Wei, S., Zou, S., Liao, F., and weimin lang (2020). A comparison on data augmentation methods based on deep learning for audio classification. Journal of Physics: Conference Series, 1453(1):012085. DOI: 10.1088/1742-6596/1453/1/012085.

Wilkinghoff, K. (2023). Design choices for learning embeddings from auxiliary tasks for domain generalization in anomalous sound detection. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1-5. DOI: 10.1109/ICASSP49357.2023.10097176.

Xu, W., Jang-Jaccard, J., Singh, A., Wei, Y., and Sabrina, F. (2021). Improving performance of autoencoder-based network anomaly detection on nsl-kdd dataset. IEEE Access, 9:140136-140146. DOI: 10.1109/ACCESS.2021.3116612.

Zenati, H., Foo, C. S., Lecouat, B., Manek, G., and Chandrasekhar, V. R. (2018). Efficient gan-based anomaly detection. DOI: 10.48550/arXiv.1802.06222.

Zhao, Y., Zou, Z., Wu, L., and Li, Y. (2015). Frequency detection algorithm for frequency diversity signal based on STFT. In 2015 Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC), pages 790-793. DOI: 10.1109/IMCCC.2015.173.

Zhou, C. and Paffenroth, R. C. (2017). Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 665–674, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3097983.3098052.

Zhou, X., Xiong, J., Zhang, X., Liu, X., and Wei, J. (2021). A radio anomaly detection algorithm based on modified generative adversarial network. IEEE Wireless Communications Letters, 10(7):1552-1556. DOI: 10.1109/LWC.2021.3074135.

Anomaly Detection in Sound Activity with Generative Adversarial Network Models

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Metrics:

Make a Submission