Unsupervised Heterogeneous Graph Neural Networks for One-Class Tasks: Exploring Early Fusion Operators

Marcos Paulo Silva Gôlo; Marcelo Isaias de Moraes Junior; Rudinei Goularte; Ricardo Marcondes Marcacini

doi:10.5753/jis.2024.4109

Authors

Marcos Paulo Silva Gôlo University of São Paulo https://orcid.org/0000-0002-9093-8195
Marcelo Isaias de Moraes Junior University of São Paulo https://orcid.org/0000-0002-7831-2165
Rudinei Goularte University of São Paulo https://orcid.org/0000-0003-1531-1576
Ricardo Marcondes Marcacini University of São Paulo https://orcid.org/0000-0002-2309-3487

DOI:

https://doi.org/10.5753/jis.2024.4109

Keywords:

Heterogeneous Early Fusion, One-Class Learning, Heterogeneous Graphs, Multimodal Graphs

Abstract

Heterogeneous graphs are an essential structure that models real-world data through different types of nodes and relationships between them, including multimodality, which comprises different types of data such as text, image, and audio. Graph Neural Networks (GNNs) are a prominent graph representation learning method that takes advantage of the graph structure and its attributes that, when applied to the multimodal heterogeneous graph, learn a unique semantic space for the different modalities. Consequently, it allows multimodal fusion through simple operators such as sum, average, or multiplication, generating unified representations considering the supplementary and complementarity relationships between the modalities. In multimodal heterogeneous graphs, the labeling process tends to be even more costly due to the multiple modalities analyzed, in addition to the imbalance of classes inherent to some applications. In order to overcome these problems in applications that comprise a class of interest, One-Class Learning (OCL) is used. Given the lack of studies on multimodal early fusion in heterogeneous graphs for OCL tasks, we proposed a method based on unsupervised GNN for heterogeneous graphs and evaluated different early fusion operators. In this paper, we extend another work by evaluating the behavior of the main GNN convolutions in the method. We highlight that using operators such as average, addition, and subtraction were the best early fusion operators. In addition, GNN layers that do not use an attention mechanism performed better. In this way, we argue for heterogeneous graph neural networks in multimodal using early fusion simple operators instead of well-often-used concatenation and less complex convolutions.

Downloads

Author Biography

Marcos Paulo Silva Gôlo, University of São Paulo

Possui graduação em Sistemas de Informação pela Universidade Federal De Mato Grosso Do Sul campus de Três Lagoas com ênfase em Inteligência Artificial. É aluno de mestrado em Ciências de Computação e Matemática Computacional pelo Instituto de Ciências de Computação e Matemática Computacional da Universidade de São Paulo em São Carlos na linha de pesquisa de Inteligência Artificial e já foi aprovado na defesa de qualificação. Tem interesse na área de one-class classification para textos.

References

Alam, S., Sonbhadra, S. K., Agarwal, S., and Nagabhushan, P. (2020). One-class support vector classifiers: A survey. Knowledge-Based Systems, 196:105754. DOI: https://doi.org/10.1016/j.knosys.2020.105754.

Atrey, P. K., Hossain, M. A., El Saddik, A., and Kankanhalli, M. S. (2010). Multimodal fusion for multimedia analysis: a survey. Multimedia systems, 16:345–379. DOI: https://doi.org/10.1007/s00530-010-0182-0.

Baltrušaitis, T., Ahuja, C., and Morency, L.-P. (2018). Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, 41(2):423–443. DOI: https://doi.org/10.1109/TPAMI.2018.2798607.

Beserra, A. A. and Goularte, R. (2023). Multimodal early fusion operators for temporal video scene segmentation tasks. Multimedia Tools and Applications, 82:1–18. DOI: https://doi.org/10.1007/s11042-023-14953-6.

Beserra, A. A., Kishi, R. M., and Goularte, R. (2020). Evaluating early fusion operators at mid-level feature space. In Proceedings of the Brazilian Symposium on Multimedia and the Web, pages 113–120, online. ACM. DOI: https://doi.org/10.1145/3428658.3431079.

Beserra, A. A. R. (2022). Operadores de fusão prévia para segmentação temporal de vídeo em cenas. Master’s thesis, Universidade de São Paulo.

Brody, S., Alon, U., and Yahav, E. (2022). How attentive are graph attention networks? In International Conference on Learning Representations. DOI: [link].

da Silva, A., Gôlo, M., and Marcacini, R. (2022). Unsupervised heterogeneous graph neural network for hit song prediction through one class learning. In 10th Symposium on Knowledge Discovery, Mining and Learning (KDMiLe), pages –, Campinas, SP, Brazil. SBC. DOI: https://doi.org/10.5753/kdmile.2022.227954.

de Souza, M. C., Nogueira, B. M., Rossi, R. G., Marcacini, R. M., Dos Santos, B. N., and Rezende, S. O. (2022). A network-based positive and unlabeled learning approach for fake news detection. Machine Learning, 111(10):3549–3592. DOI: https://doi.org/10.1007/s10994-021-06111-6.

de Souza, M. C., Nogueira, B. M., Rossi, R. G., Marcacini, R. M., and Rezende, S. O. (2021). A heterogeneous network-based positive and unlabeled learning approach to detect fake news. In Intelligent Systems: 10th Brazilian Conference, BRACIS 2021, Virtual Event, November 29–December 3, 2021, Proceedings, Part II, pages 3–18, online. Springer. DOI: https://doi.org/10.1007/978-3-030-91699-2_1.

Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7(1):1–30. DOI: [link].

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL 2019: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minnesota. Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/N19-1423.

do Carmo, P. and Marcacini, R. (2021). Embedding propagation over heterogeneous event networks for link prediction. In 2021 IEEE International Conference on Big Data (Big Data), pages 4812–4821, online. IEEE. DOI: https://doi.org/10.1109/BigData52589.2021.9671645.

Emmert-Streib, F. and Dehmer, M. (2022). Taxonomy of machine learning paradigms: A data-centric perspective. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 12(5):e1470. DOI: https://doi.org/10.1002/widm.1470.

Ganz, T., Ashraf, I., Härterich, M., and Rieck, K. (2023). Detecting backdoors in collaboration graphs of software repositories. In Proceedings of the Thirteenth Conference on Data and Application Security and Privacy, pages 189–200, Charlotte, NC, USA. ACM. DOI: https://doi.org/10.1145/3577923.3583657.

Gôlo, M., Caravanti, M., Rossi, R., Rezende, S., Nogueira, B., and Marcacini, R. (2021). Learning textual representations from multiple modalities to detect fake news through one-class learning. In Proceedings of the Brazilian Symposium on Multimedia and the Web, pages 197–204, Online. ACM. DOI: https://doi.org/10.1145/3470482.3479634.

Gôlo, M. P. S., De Moraes, M. I., Goularte, R., and Marcacini, R. M. (2023a). On the use of early fusion operators on heterogeneous graph neural networks for one-class learning. In Proceedings of the 29th Brazilian Symposium on Multimedia and the Web, pages 128–136. DOI: https://doi.org/10.1145/3617023.3617041.

Gôlo, M. P. S., de Souza, M. C., Rossi, R. G., Rezende, S. O., Nogueira, B. M., and Marcacini, R. M. (2023b). One-class learning for fake news detection through multimodal variational autoencoders. Engineering Applications of Artificial Intelligence, 122:106088. DOI: https://doi.org/10.1016/j.engappai.2023.106088.

Guo, Q., Zhuang, F., Qin, C., Zhu, H., Xie, X., Xiong, H., and He, Q. (2020). A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering, 34(8):3549–3568. DOI: https://doi.org/10.1109/TKDE.2020.3028705.

Guo, W., Wang, J., and Wang, S. (2019). Deep multimodal representation learning: A survey. IEEE Access, 7:63373–63394. DOI: https://doi.org/10.1109/ACCESS.2019.2916887.

Gôlo, M., Moraes, L., Goularte, R., and Marcacini, R. (2022). One-class recommendation through unsupervised graph neural networks for link prediction. In 10th Symposium on Knowledge Discovery, Mining and Learning (KDMiLe), pages –, campinas, SP, Brazil. SBC. DOI: https://doi.org/10.5753/kdmile.2022.227810.

Hamilton, W., Ying, Z., and Leskovec, J. (2017). Inductive representation learning on large graphs. In Advances in neural information processing systems, volume 30. DOI: [link].

Huang, Z., Gu, Y., and Zhao, Q. (2022). One-class directed heterogeneous graph neural network for intrusion detection. In 6th International Conference on Innovation in Artificial Intelligence (ICIAI), pages 178–184, Guangzhou, China. ACM. DOI: https://doi.org/10.1145/3529466.3529480.

Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., and Muller, P.-A. (2019). Deep learning for time series classification: a review. Data Mining and Knowledge Discovery, 33(4):917–963. DOI: https://doi.org/10.1007/s10618-019-00619-1.

Jakob, P., Madan, M., Schmid-Schirling, T., and Valada, A. (2021). Multi-perspective anomaly detection. Sensors, 21(16):5311. DOI: https://doi.org/10.3390/s21165311.

Khan, S. S. and Madden, M. G. (2014). One-class classifica-tion: taxonomy of study and review of techniques. The Knowledge Engineering Review, 29(3):345–374. DOI: https://doi.org/10.1017/S026988891300043X.

Kipf, T. N. and Welling, M. (2016). Variational graph auto-encoders. In NIPS Workshop on Bayesian Deep Learning, pages 1–3, Barcelona, Spain. NIPS. DOI: [link].

Kipf, T. N. and Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), pages 1–14, Toulon, France. OpenReview. DOI: [link].

Kumar, A., Kim, J., Cai, W., Fulham, M., and Feng, D. (2013). Content-based medical image retrieval: a survey of applications to multidimensional and multimodality data. Journal of digital imaging, 26:1025–1039. DOI: https://doi.org/10.1007/s10278-013-9619-2.

Liu, X., Gao, F., Zhang, Q., and Zhao, H. (2019). Graph convolution for multimodal information extraction from visually rich documents. In Proceedings of NAACL-HLT, pages 32–39, Minneapolis, Minnesota. Association for Computational Linguistics. DOI: http://dx.doi.org/10.18653/v1/N19-2005.

Mattos, J. P. R. and Marcacini, R. M. (2021). Semisupervised graph attention networks for event representation learning. In 2021 IEEE International Conference on Data Mining (ICDM), pages 1234–1239, online. IEEE. DOI: https://doi.org/10.1109/ICDM51629.2021.00150.

Nguyen, T. and Grishman, R. (2018). Graph convolutional networks with argument-aware pooling for event detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, pages 5900–5907, Vancouver, Canada. AAAI. DOI: https://doi.org/10.1609/aaai.v32i1.12039.

Otter, D., Medina, J., and Kalita, J. (2020). A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems, 32(2):604–624. DOI: https://doi.org/10.1109/TNNLS.2020.2979670.

Rahman, M. S. (2017). Basic graph theory, volume 9. Springer, online. DOI: https://doi.org/10.1007/978-3-319-49475-3.

Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S. A., Binder, A., Müller, E., and Kloft, M. (2018). Deep one-class classification. In International Conference on Machine Learning (ICML), pages 4393–4402, Stockholm, SWEDEN. PMLR. DOI: [link].

Schinas, M., Papadopoulos, S., Petkos, G., Kompatsiaris, Y., and Mitkas, P. A. (2015). Multimodal graph-based event detection and summarization in social media streams. In Proceedings of the 23rd ACM international conference on Multimedia, pages 189–192, Brisbane, Australia. ACM. DOI: https://doi.org/10.1145/2733373.2809933.

Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural computation, 13(7):1443–1471. DOI: https://doi.org/10.1162/089976601750264965.

Tax, D. M. J. (2001). One-class classification: Concept learning in the absence of counter-examples. PhD thesis, Technische Universiteit Delft.

Van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-sne. Journal of machine learning research, 9(11):2579–2605. DOI: [link].

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30:1–12. DOI: [link].

Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2018). Graph attention networks. stat, 1050:4. DOI: [link].

Wang, X., Bo, D., Shi, C., Fan, S., Ye, Y., and Philip, S. Y. (2022). A survey on heterogeneous graph embedding: methods, techniques, applications and sources. IEEE Transactions on Big Data, 9:415 – 436. DOI: https://doi.org/10.1109/TBDATA.2022.3177455.

Wang, X., Jin, B., Du, Y., Cui, P., Tan, Y., and Yang, Y. (2021). One-class graph neural networks for anomaly detection in attributed networks. Neural computing and applications, 33(18):12073–12085. DOI: https://doi.org/10.1007/s00521-021-05924-9.

Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Philip, S. Y. (2020). A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24. DOI: https://doi.org/10.1109/TNNLS.2020.2978386.

Xia, F., Sun, K., Yu, S., Aziz, A., Wan, L., Pan, S., and Liu, H. (2021). Graph learning: A survey. IEEE Transactions on Artificial Intelligence, 2(2):109–127. DOI: https://doi.org/10.1109/TAI.2021.3076021.

Xu, K., Hu, W., Leskovec, J., and Jegelka, S. (2019). How powerful are graph neural networks? In International Conference on Learning Representations, pages 1–17, New Orleans. OpenReview. DOI: [link].

Zhou, D. and Schölkopf, B. (2004). A regularization framework for learning from graph data. In ICML 2004 Workshop on Statistical Relational Learning and Its Connections to Other Fields (SRL 2004), pages 132–137, Alberta, Canada. MPG Pure. DOI: [link].

Zhou, H. and Mao, K. (2022). Document-level event argument extraction by leveraging redundant information and closed boundary loss. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3041–3052, Seattle, Washington. ACL. DOI: http://dx.doi.org/10.18653/v1/2022.naacl-main.222.

Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., and Sun, M. (2020). Graph neural networks: A review of methods and applications. AI Open, 1:57–81. DOI: https://doi.org/10.1016/j.aiopen.2021.01.001.