Adapting Convolutions for Effective Omnidirectional Image Processing

Romulo Marconato Stringhini; Thiago Lopes Trugillo da Silveira; Claudio Rosito Jung

doi:10.5753/jbcs.2026.5654

Authors

Romulo Marconato Stringhini Federal University of Rio Grande do Sul https://orcid.org/0000-0003-3230-3092
Thiago Lopes Trugillo da Silveira Federal University of Santa Maria https://orcid.org/0000-0001-6788-2667
Claudio Rosito Jung Federal University of Rio Grande do Sul https://orcid.org/0000-0002-4711-5783

DOI:

https://doi.org/10.5753/jbcs.2026.5654

Keywords:

Omnidirectional Images, Convolutional Neural Network, Distortions, Dilated Convolution, Object Classification, Gravity Alignment

Abstract

Omnidirectional images present unique challenges for traditional convolutional neural networks (CNNs) due to the non-uniform sampling inherent in the equirectangular projection (ERP). This projection introduces distortions, especially near the poles, and conventional fixed-size kernels in planar CNNs are not designed to handle these distortions effectively. To address this issue, we previously introduced the Spherically-Weighted Horizontally Dilated Convolutions (SWHDC) module which adjusts dilated convolutions by applying appropriate weights to account for the varying sampling density across ERP image rows. In this extended work, we provide a more comprehensive evaluation of the SWHDC module by benchmarking it against several state-of-the-art methods on the 3D object classification task. Additionally, we integrate SWHDCs into different backbones to further investigate their effectiveness in tackling the gravity alignment problem. Experimental results confirm that our approach not only improves classification accuracy but also enhances gravity alignment performances without increasing the amount of trainable parameters of the baseline backbones. These findings further establish SWHDC as a robust alternative for processing omnidirectional images in different visual computing applications.

Downloads

Download data is not yet available.

References

Ai, H., Cao, Z., Zhu, J., Bai, H., Chen, Y., and Wang, L. (2022). Deep learning for omnidirectional vision: A survey and new perspectives. arXiv preprint arXiv:2205.10468. DOI: 10.48550/arXiv.2205.10468.

Bai, J., Qin, H., Lai, S., Guo, J., and Guo, Y. (2024). Glpanodepth: Global-to-local panoramic depth estimation. IEEE Transactions on Image Processing, 33:2936-2949. DOI: 10.1109/tip.2024.3386403.

Bello, S. A., Yu, S., Wang, C., Adam, J. M., and Li, J. (2020). Deep learning on 3d point clouds. Remote Sensing, 12(11):1729. DOI: 10.48550/arXiv.2001.06280.

Bergmann, M. A., Pinto, P. G., da Silveira, T. L., and Jung, C. R. (2021). Gravity alignment for single panorama depth inference. In 2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 41-48. IEEE. DOI: 10.1109/sibgrapi54419.2021.00015.

Brock, A., Lim, T., Ritchie, J. M., and Weston, N. (2016). Generative and discriminative voxel modeling with convolutional neural networks. preprint arXiv:1608.04236. DOI: 10.48550/arxiv.1608.04236.

Cao, Z., Huang, Q., and Karthik, R. (2017). 3d object classification via spherical projections. In 2017 international conference on 3D Vision (3DV), pages 566-574. IEEE. DOI: 10.1109/3dv.2017.00070.

Carlsson et al., O. (2024). Heal-swin: A vision transformer on the sphere. In IEEE/CVF CVPR, pages 6067-6077. DOI: 10.48550/arXiv.2307.07313.

Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al. (2015). Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012. DOI: 10.48550/arxiv.1512.03012.

Chen, H., Li, S., and Li, J. (2023). An end-to-end network for upright adjustment of panoramic images. Procedia Computer Science, 222:435-447. DOI: 10.1016/j.procs.2023.08.182.

Chen, H.-X., Li, K., Fu, Z., Liu, M., Chen, Z., and Guo, Y. (2021). Distortion-aware monocular depth estimation for omnidirectional images. IEEE Signal Processing Letters, 28:334-338. DOI: 10.1109/lsp.2021.3050712.

Cheraghian, A. and Petersson, L. (2019). 3dcapsule: Extending the capsule architecture to classify 3d point clouds. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1194-1202. IEEE. DOI: 10.1109/wacv.2019.00132.

Cho et al., S. (2022). Spherical transformer. preprint arXiv:2202.04942. DOI: 10.48550/arXiv.2202.04942.

Cohen, T., Geiger, M., Köhler, J., and Welling, M. (2017). Convolutional networks for spherical signals. arXiv preprint arXiv:1709.04893. DOI: 10.48550/arxiv.1709.04893.

Coors, B., Condurache, A. P., and Geiger, A. (2018). Spherenet: Learning spherical representations for detection and classification in omnidirectional images. In Proceedings of the European conference on computer vision (ECCV), pages 518-533. DOI: 10.1007/978-3-030-01240-3_32.

da Silveira, T. L. and Jung, C. R. (2023). Omnidirectional visual computing: Foundations, challenges, and applications. Computers & Graphics. DOI: 10.2139/ssrn.4350212.

da Silveira, T. L., Pinto, P. G., Murrugarra-Llerena, J., and Jung, C. R. (2022). 3d scene geometry estimation from 360 imagery: A survey. ACM Computing Surveys, 55(4):1-39. DOI: 10.1145/3519021.

Dai et al., F. (2020). Dilated convolutional neural networks for panoramic image saliency prediction. In IEEE ICASSP, pages 2558-2562. DOI: 10.1109/icassp40776.2020.9053888.

Dai et al., Z. (2021). Coatnet: Marrying convolution and attention for all data sizes. NeuIPS, 34:3965-3977. DOI: 10.48550/arxiv.2106.04803.

Davidson, B., Alvi, M. S., and Henriques, J. F. (2020). 360 camera alignment via segmentation. In European Conference on Computer Vision, pages 579-595. Springer. DOI: 10.1007/978-3-030-58604-1_35.

Ding, B., Tang, L., Gao, Z., and He, Y. (2020). 3d shape classification using a single view. IEEE Access, 8:200812-200822. DOI: 10.1109/access.2020.3035583.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. DOI: 10.48550/arxiv.2010.11929.

Eder, M., Shvets, M., Lim, J., and Frahm, J.-M. (2020). Tangent images for mitigating spherical distortion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12426-12434. DOI: 10.1109/cvpr42600.2020.01244.

Esteves, C., Allen-Blanchette, C., Makadia, A., and Daniilidis, K. (2018). Learning so (3) equivariant representations with spherical cnns. In Proceedings of the European Conference on Computer Vision (ECCV), pages 52-68. DOI: 10.1007/s11263-019-01220-1.

Esteves, C., Makadia, A., and Daniilidis, K. (2020). Spin-weighted spherical cnns. NeuIPS, 33:8614-8625. DOI: 10.48550/arxiv.2006.10731.

Esteves, C., Slotine, J.-J., and Makadia, A. (2023). Scaling spherical CNNs. In ICML, volume 202, pages 9396-9411. DOI: 10.48550/arXiv.2306.05420.

Esteves, C., Xu, Y., Allen-Blanchette, C., and Daniilidis, K. (2019). Equivariant multi-view networks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1568-1577. DOI: 10.1109/iccv.2019.00165.

Fernandez-Labrador, C., Facil, J. M., Perez-Yus, A., Demonceaux, C., Civera, J., and Guerrero, J. J. (2020). Corners for layout: End-to-end layout recovery from 360 images. IEEE Robotics and Automation Letters, 5(2):1255-1262. DOI: 10.1109/lra.2020.2967274.

Garcia-Garcia, A., Gomez-Donoso, F., Garcia-Rodriguez, J., Orts-Escolano, S., Cazorla, M., and Azorin-Lopez, J. (2016). Pointnet: A 3d convolutional neural network for real-time object class recognition. In 2016 International joint conference on neural networks (IJCNN), pages 1578-1584. IEEE. DOI: 10.1109/ijcnn.2016.7727386.

Goldblum et al., M. (2024). Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks. NeuIPS, 36. DOI: 10.52202/075280-1277.

Gomez-Donoso, F., Escalona, F., Orts-Escolano, S., Garcia-Garcia, A., Garcia-Rodriguez, J., and Cazorla, M. (2022). 3dslicelenet: Recognizing 3d objects using a slice-representation. IEEE Access, 10:15378-15392. DOI: 10.1109/access.2022.3148387.

Guo, X., Sun, Y., Zhao, R., Kuang, L., and Han, X. (2022). Swpt: Spherical window-based point cloud transformer. In Proceedings of the Asian Conference on Computer Vision, pages 3034-3050. DOI: 10.1007/978-3-031-26319-4_24.

Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., and Bennamoun, M. (2020). Deep learning for 3d point clouds: A survey. IEEE transactions on pattern analysis and machine intelligence, 43(12):4338-4364. DOI: 10.1109/tpami.2020.3005434.

Hamdi, A., Giancola, S., and Ghanem, B. (2021). Mvtn: Multi-view transformation network for 3d shape recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1-11. DOI: 10.1109/iccv48922.2021.00007.

Han, Z., Lu, H., Liu, Z., Vong, C.-M., Liu, Y.-S., Zwicker, M., Han, J., and Chen, C. P. (2019). 3d2seqviews: Aggregating sequential views for 3d global feature learning by cnn with hierarchical attention aggregation. IEEE Transactions on Image Processing, 28(8):3986-3999. DOI: 10.1109/tip.2019.2904460.

Han, Z., Shang, M., Liu, Z., Vong, C.-M., Liu, Y.-S., Zwicker, M., Han, J., and Chen, C. P. (2018). Seqviews2seqlabels: Learning 3d global features via aggregating sequential views by rnn with attention. IEEE Transactions on Image Processing, 28(2):658-672. DOI: 10.1109/tip.2018.2868426.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778. DOI: 10.1109/cvpr.2016.90.

Hoang, L., Lee, S.-H., and Kwon, K.-R. (2020). A 3d shape recognition method using hybrid deep learning network cnn-svm. Electronics, 9(4):649. DOI: 10.3390/electronics9040649.

Hou, Q., Lu, C.-Z., Cheng, M.-M., and Feng, J. (2024). Conv2former: A simple transformer-style convnet for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. DOI: 10.1109/tpami.2024.3401450.

Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700-4708. DOI: 10.1109/cvpr.2017.243.

Huang, Q., Wang, Y., and Yin, Z. (2020). View-based weight network for 3d object recognition. Image and Vision Computing, 93:103828. DOI: 10.1016/j.imavis.2019.11.006.

Jeon, J., Jung, J., and Lee, S. (2019). Deep upright adjustment of 360 panoramas using multiple roll estimations. In Computer Vision-ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2-6, 2018, Revised Selected Papers, Part V 14, pages 199-214. Springer. DOI: 10.1007/978-3-030-20873-8_13.

Jiang, C., Huang, J., Kashinath, K., Marcus, P., Niessner, M., et al. (2019a). Spherical cnns on unstructured grids. arXiv preprint arXiv:1901.02039. DOI: 10.48550/arxiv.1901.02039.

Jiang, J., Bao, D., Chen, Z., Zhao, X., and Gao, Y. (2019b). Mlvcnn: Multi-loop-view convolutional neural network for 3d shape retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 8513-8520. DOI: 10.1609/aaai.v33i01.33018513.

Jung, R., Cho, S., and Kwon, J. (2020). Upright adjustment with graph convolutional networks. In 2020 IEEE International Conference on Image Processing (ICIP), pages 1058-1062. IEEE. DOI: 10.1109/icip40778.2020.9190715.

Jung, R., Lee, A. S. J., Ashtari, A., and Bazin, J.-C. (2019). Deep360up: A deep learning-based approach for automatic vr image upright adjustment. In 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pages 1-8. IEEE. DOI: 10.1109/vr.2019.8798326.

Kanezaki, A., Matsushita, Y., and Nishida, Y. (2018). Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5010-5019. DOI: 10.1109/cvpr.2018.00526.

Klokov, R. and Lempitsky, V. (2017). Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models. In IEEE ICCV, pages 863-872. DOI: 10.48550/arXiv.1704.01222.

Lee, J.-S. and Park, T.-H. (2022). Transformable dilated convolution by distance for lidar semantic segmentation. IEEE Access, 10:125102-125111. DOI: 10.1109/access.2022.3225556.

Li, J., Chen, B. M., and Lee, G. H. (2018). So-net: Self-organizing network for point cloud analysis. In IEEE/CVF CVPR, pages 9397-9406. DOI: 10.1109/cvpr.2018.00979.

Li, J., Liu, Z., Li, L., Lin, J., Yao, J., and Tu, J. (2023). Multi-view convolutional vision transformer for 3d object recognition. Journal of Visual Communication and Image Representation, 95:103906. DOI: 10.1016/j.jvcir.2023.103906.

Li, Y., Guo, Y., Yan, Z., Huang, X., Duan, Y., and Ren, L. (2022). Omnifusion: 360 monocular depth estimation via geometry-aware fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2801-2810. DOI: 10.1109/cvpr52688.2022.00282.

Liang, Q., Wang, Y., Nie, W., and Li, Q. (2020). Mvcln: multi-view convolutional lstm network for cross-media 3d shape recognition. IEEE Access, 8:139792-139802. DOI: 10.1109/access.2020.3012692.

Ling, Z., Xing, Z., Zhou, X., Cao, M., and Zhou, G. (2023). Panoswin: a pano-style swin transformer for panorama understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17755-17764. DOI: 10.1109/cvpr52729.2023.01703.

Liu, A.-A., Zhou, H.-Y., Li, M.-J., and Nie, W.-Z. (2020). 3d model retrieval based on multi-view attentional convolutional neural network. Multimedia Tools and Applications, 79:4699-4711. DOI: 10.1007/s11042-019-7521-8.

Liu, H. and Tian, S. (2024). Deep 3d point cloud classification and segmentation network based on gatenet. The Visual Computer, 40(2):971-981. DOI: 10.1007/s00371-023-02826-w.

Liu, J., Chen, H., Li, S., and Li, J. (2024). Generation of upright panoramic image from non-upright panoramic image. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5261-5270. DOI: 10.1109/wacv57701.2024.00518.

Liu, Y., Tian, B., Lv, Y., Li, L., and Wang, F.-Y. (2023). Point cloud classification using content-based transformer via clustering in feature space. IEEE/CAA Journal of Automatica Sinica, 11(1):231-239. DOI: 10.1109/jas.2023.123432.

Liu, Y., Wang, Y., Du, H., and Cai, S. (2022a). Spherical transformer: Adapting spherical signal to convolutional networks. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV), pages 15-27. Springer. DOI: 10.1007/978-3-031-18913-5_2.

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012-10022. DOI: 10.1109/iccv48922.2021.00986.

Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022b). A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976-11986. DOI: 10.1109/cvpr52688.2022.01167.

Lu, D., Xie, Q., Gao, K., Xu, L., and Li, J. (2022). 3dctn: 3d convolution-transformer network for point cloud classification. IEEE Transactions on Intelligent Transportation Systems, 23(12):24854-24865. DOI: 10.1109/tits.2022.3198836.

Luo, W., Zhang, H., Ni, P., and Tian, X. (2020). Balanced principal component for 3d shape recognition using convolutional neural networks. IET Image Processing, 14(17):4468-4476. DOI: 10.1049/iet-ipr.2019.0844.

Ma, C., Guo, Y., Yang, J., and An, W. (2018). Learning multi-view representation with lstm for 3-d shape recognition and retrieval. IEEE Transactions on Multimedia, 21(5):1169-1182. DOI: 10.1109/tmm.2018.2875512.

Ma, X., Qin, C., You, H., Ran, H., and Fu, Y. (2022). Rethinking network design and local geometry in point cloud: A simple residual mlp framework. arXiv preprint arXiv:2202.07123. DOI: 10.48550/arxiv.2202.07123.

Maturana, D. and Scherer, S. (2015). Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 922-928. IEEE. DOI: 10.1109/iros.2015.7353481.

Mirbauer, M., Krabec, M., Křivánek, J., and Šikudová, E. (2021). Survey and evaluation of neural 3d shape classification approaches. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):8635-8656. DOI: 10.1109/tpami.2021.3102676.

Mohammadi, S. S., Wang, Y., and Del Bue, A. (2021). Pointview-gcn: 3d shape classification with multi-view point clouds. In 2021 IEEE International Conference on Image Processing (ICIP), pages 3103-3107. IEEE. DOI: 10.1109/icip42928.2021.9506426.

Muzahid, A., Han, H., Zhang, Y., Li, D., Zhang, Y., Jamshid, J., and Sohel, F. (2024). Deep learning for 3d object recognition: A survey. Neurocomputing, page 128436. DOI: 10.1016/j.neucom.2024.128436.

Pintore, G., Agus, M., Almansa, E., Schneider, J., and Gobbetti, E. (2021). Slicenet: deep dense depth estimation from a single indoor panorama using a slice-based representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11536-11545. DOI: 10.1109/cvpr46437.2021.01137.

Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2017a). Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652-660. DOI: 10.1109/cvpr.2017.16.

Qi, C. R., Yi, L., Su, H., and Guibas, L. J. (2017b). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30. DOI: 10.48550/arxiv.1706.02413.

Qi, S., Ning, X., Yang, G., Zhang, L., Long, P., Cai, W., and Li, W. (2021). Review of multi-view 3d object recognition methods based on deep learning. Displays, 69:102053. DOI: 10.1016/j.displa.2021.102053.

Qian, G., Li, Y., Peng, H., Mai, J., Hammoud, H., Elhoseiny, M., and Ghanem, B. (2022). Pointnext: Revisiting pointnet++ with improved training and scaling strategies. Advances in neural information processing systems, 35:23192-23204. DOI: 10.52202/068431-1685.

Rawat, W. and Wang, Z. (2017). Deep convolutional neural networks for image classification: A comprehensive review. Neural computation, 29(9):2352-2449. DOI: 10.1162/neco_a_00990.

Rey-Area, M., Yuan, M., and Richardt, C. (2022). 360monodepth: High-resolution 360deg monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3762-3772. DOI: 10.48550/arXiv.2111.15669.

Riegler, G., Osman Ulusoy, A., and Geiger, A. (2017). Octnet: Learning deep 3d representations at high resolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3577-3586. DOI: 10.1109/cvpr.2017.701.

Schuster et al., R. (2019). SDC-stacked dilated convolution: A unified descriptor network for dense matching tasks. In IEEE/CVF CVPR, pages 2556-2565. DOI: 10.1007/978-3-319-67199-4_103425.

Sedaghat, N., Zolfaghari, M., Amiri, E., and Brox, T. (2016). Orientation-boosted voxel nets for 3d object recognition. arXiv preprint arXiv:1604.03351. DOI: 10.5244/c.31.97.

Sfikas, K., Pratikakis, I., and Theoharis, T. (2018). Ensemble of panorama-based convolutional neural networks for 3d model classification and retrieval. Computers & Graphics, 71:208-218. DOI: 10.1016/j.cag.2017.12.001.

Shan, Y., Chen, H., Zhang, J., Li, S., and Li, J. (2024). Multi-scale attention-based inclination angles estimation for panoramic camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1322-1330. DOI: 10.1109/cvprw63382.2024.00139.

Shan, Y. and Li, S. (2019). Discrete spherical image representation for CNN-based inclination estimation. IEEE Access, 8:2008-2022. DOI: 10.1109/ACCESS.2019.2962133.

Shen et al., Z. (2022). Panoformer: Panorama transformer for indoor 360$^i̧rc$ depth estimation. In ECCV, pages 195-211. DOI: 10.1007/978-3-031-19769-7_12.

Shi, B., Bai, S., Zhou, Z., and Bai, X. (2015). Deeppano: Deep panoramic representation for 3-d shape recognition. IEEE Signal Processing Letters, 22(12):2339-2343. DOI: 10.1109/lsp.2015.2480802.

Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. preprint arXiv:1409.1556. DOI: 10.48550/arxiv.1409.1556.

Stringhini, R. M., da Silveira, T. L., and Jung, C. R. (2024a). Spherically-weighted horizontally dilated convolutions for omnidirectional image processing. In 2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 1-6. IEEE. DOI: 10.1109/sibgrapi62404.2024.10716273.

Stringhini, R. M., Lermen, T. S., Da Silveira, T. L., and Jung, C. R. (2024b). Single-panorama classification of 3d objects using horizontally stacked dilated convolutions. In 2024 IEEE International Conference on Image Processing (ICIP), pages 3436-3442. IEEE. DOI: 10.1109/icip51287.2024.10647442.

Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pages 945-953. DOI: 10.1109/iccv.2015.114.

Su, J.-C., Gadelha, M., Wang, R., and Maji, S. (2018). A deeper look at 3D shape classifiers. In ECCV Workshops, pages 0-0. DOI: 10.48550/arXiv.1809.02560.

Su, Y. and Grauman, K. (2017a). Learning spherical convolution for fast features from 360 imagery. NeuIPS, 30. DOI: 10.48550/arxiv.1708.00919.

Su, Y. and Grauman, K. (2019). Kernel transformer networks for compact spherical convolution. In IEEE/CVF CVPR, pages 9442-9451. DOI: 10.1109/cvpr.2019.00967.

Su, Y.-C. and Grauman, K. (2017b). Learning spherical convolution for fast features from 360 imagery. NIPS, 30. DOI: 10.48550/arxiv.1708.00919.

Sun, K., Zhang, J., Liu, J., Yu, R., and Song, Z. (2020). Drcnn: Dynamic routing convolutional neural network for multi-view 3d object recognition. IEEE Transactions on Image Processing, 30:868-877. DOI: 10.1109/tip.2020.3039378.

Tan, M. and Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, pages 6105-6114. PMLR. DOI: 10.48550/arxiv.1905.11946.

Tateno, K., Navab, N., and Tombari, F. (2018). Distortion-aware convolutional filters for dense prediction in panoramic images. In Proceedings of the European Conference on Computer Vision (ECCV), pages 707-722. DOI: 10.1007/978-3-030-01270-0_43.

Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems. DOI: 10.65215/2q58a426.

Wang, P.-S., Liu, Y., Guo, Y.-X., Sun, C.-Y., and Tong, X. (2017). O-cnn: Octree-based convolutional neural networks for 3D shape analysis. ACM Transactions On Graphics, 36(4):1-11. DOI: 10.1145/3072959.3073608.

Wang, P.-S., Sun, C.-Y., Liu, Y., and Tong, X. (2018). Adaptive O-CNN: A patch-based deep representation of 3D shapes. ACM Transactions on Graphics, 37(6):1-11. DOI: 10.1145/3272127.3275050.

Wei, X., Yu, R., and Sun, J. (2020). View-gcn: View-based graph convolutional network for 3d shape analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1850-1859. DOI: 10.1109/cvpr42600.2020.00192.

Wu, C., Zheng, J., Pfrommer, J., and Beyerer, J. (2023). Attention-based point cloud edge sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5333-5343. DOI: 10.1109/cvpr52729.2023.00516.

Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912-1920. DOI: 10.1109/cvpr.2015.7298801.

Xiao, J., Ehinger, K. A., Oliva, A., and Torralba, A. (2012). Recognizing scene viewpoint using panoramic place representation. In 2012 IEEE conference on computer vision and pattern recognition, pages 2695-2702. IEEE. DOI: 10.1109/cvpr.2012.6247991.

Xu, C., Yang, H., Han, C., and Zhang, C. (2023). Pcformer: A parallel convolutional transformer network for 360 depth estimation. IET Computer Vision, 17(2):156-169. DOI: 10.1049/cvi2.12144.

Xu, Y., Zhang, Z., and Gao, S. (2021). Spherical dnns and their applications in 360$^i̧rc $ images and videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):7235-7252. DOI: 10.1109/tpami.2021.3100259.

Yavartanoo, M., Kim, E. Y., and Lee, K. M. (2018). Spnet: Deep 3d object classification and retrieval using stereographic projection. In Asian conference on computer vision, pages 691-706. Springer. DOI: 10.1007/978-3-030-20873-8_44.

Yu, F. and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. preprint arXiv:1511.07122. DOI: 10.48550/arxiv.1511.07122.

Yu, T., Meng, J., and Yuan, J. (2018). Multi-view harmonized bilinear network for 3d object recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 186-194. DOI: 10.1109/cvpr.2018.00027.

Yu, X., Tang, L., Rao, Y., Huang, T., Zhou, J., and Lu, J. (2022). Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19313-19322. DOI: 10.1109/cvpr52688.2022.01871.

Yun, I., Shin, C., Lee, H., Lee, H.-J., and Rhee, C. E. (2023). Egformer: Equirectangular geometry-biased transformer for 360 depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6101-6112. DOI: 10.1109/iccv51070.2023.00561.

Zhang, J., Chen, Z., Lin, C., Nie, L., Shen, Z., Huang, J., and Zhao, Y. (2024). Sgformer: Spherical geometry transformer for 360 depth estimation. arXiv preprint arXiv:2404.14979. DOI: 10.1109/tcsvt.2025.3534220.

Zhang, R., Wang, L., Guo, Z., Wang, Y., Gao, P., Li, H., and Shi, J. (2023). Parameter is not all you need: Starting from non-parametric networks for 3d point cloud analysis. arXiv preprint arXiv:2303.08134. DOI: 10.48550/arxiv.2303.08134.

Zhao, H., Jiang, L., Jia, J., Torr, P. H., and Koltun, V. (2021). Point transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 16259-16268. DOI: 10.1109/iccv48922.2021.01595.

Zhao, L., Xu, S., Liu, L., Ming, D., and Tao, W. (2022). Svaseg: Sparse voxel-based attention for 3d lidar point cloud semantic segmentation. Remote Sensing, 14(18):4471. DOI: 10.3390/rs14184471.

Zhi, S., Liu, Y., Li, X., and Guo, Y. (2018). Toward real-time 3d object recognition: A lightweight volumetric cnn framework using multitask learning. Computers & Graphics, 71:199-207. DOI: 10.1016/j.cag.2017.10.007.

Zhou, Y., Zeng, F., Qian, J., and Han, X. (2019). 3d shape classification and retrieval based on polar view. Information Sciences, 474:205-220. DOI: 10.1016/j.ins.2018.09.051.

Zhuang et al., C. (2022). Acdnet: Adaptively combined dilated convolution for monocular panorama depth estimation. In AAAI Conference on Artificial Intelligence, volume 36, pages 3653-3661. DOI: 10.1609/aaai.v36i3.20278.

Zioulis, N., Karakottas, A., Zarpalas, D., and Daras, P. (2018). Omnidepth: Dense depth estimation for indoors spherical panoramas. In Proceedings of the European Conference on Computer Vision (ECCV), pages 448-465. DOI: 10.1007/978-3-030-01231-1_28.

Adapting Convolutions for Effective Omnidirectional Image Processing

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Metrics: