Enhancing Video Quality Using a Multi-Domain Spatio-Temporal Deformable Fusion Approach

Authors

DOI:

https://doi.org/10.5753/jbcs.2025.5385

Keywords:

Video Compression, Video Quality Enhancement, Multi-Domain Learning

Abstract

Video compression is essential for efficient data management but often introduces artifacts that degrade visual quality. This work presents the Multi-Domain Spatio-Temporal Deformable Fusion (MD-STDF) architecture, which employs a multi-domain learning approach to enhance the quality of compressed videos across codecs like HEVC, VVC, AV1, and VP9. By training on multiple domains, the model effectively adapts to diverse artifact patterns and compression scenarios. Experimental results show that MD-STDF achieves significant quality improvements, with average ∆PSNR gains of 0.764 dB for HEVC, 0.695 dB for VVC, 0.359 dB for VP9, and 0.228 dB for AV1. The model also demonstrated resilience under different compression rates, with BD-Rate values indicating that video quality can be efficiently restored in high compression scenarios for VP9 (-16.50%) and HEVC (-14.59%). Visual analysis shows a reduction in artifacts, resulting in perceptible improvements in subjective quality.

Downloads

Download data is not yet available.

References

Agarwal, A., Agarwal, A., Sinha, S., Vatsa, M., and Singh, R. (2021). Md-csdnetwork: Multi-domain cross stitched network for deepfake detection. In 2021 16th IEEE international conference on automatic face and gesture recognition (FG 2021), pages 1-8. IEEE. DOI: 10.1109/FG52635.2021.9666937.

Bender, I., Palomino, D., Agostini, L., Correa, G., and Porto, M. (2019). Compression efficiency and computational cost comparison between av1 and hevc encoders. In 2019 27th European Signal Processing Conference (EUSIPCO), pages 1-5. IEEE. DOI: 10.23919/EUSIPCO.2019.8903006.

Bjontegaard, G. (2001). Calculation of average psnr differences between rd-curves. ITU-T SG16 Q, 6. Available online [link]. Document VCEG-M33.

Boyce, J., Suehring, K., and Li, X. (2018). Jvet-j1010: Jvet common test conditions and software reference configurations. JVET-J1010. Available online [link].

Bross, B., Wang, Y.-K., Ye, Y., Liu, S., Chen, J., Sullivan, G. J., and Ohm, J.-R. (2021). Overview of the versatile video coding (vvc) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3736-3764. DOI: 10.1109/TCSVT.2021.3101953.

Caballero, J., Ledig, C., Aitken, A., Acosta, A., Totz, J., Wang, Z., and Shi, W. (2017). Real-time video super-resolution with spatio-temporal networks and motion compensation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4778-4787. DOI: 10.1109/CVPR.2017.304.

Chen, S. and Ye, M. (2021). Two-stage multi-frame cooperative quality enhancement on compressed video. In 2021 11th International Conference on Intelligent Control and Information Processing (ICICIP), pages 94-99. IEEE. DOI: 10.1109/icicip53388.2021.9642200.

Chen, Y., Lu, R., Zou, Y., and Zhang, Y. (2018a). Branch-activated multi-domain convolutional neural network for visual tracking. Journal of Shanghai Jiaotong University (Science), 23:360-367. DOI: 10.1007/s12204-018-1951-8.

Chen, Y., Murherjee, D., Han, J., Grange, A., Xu, Y., Liu, Z., Parker, S., Chen, C., Su, H., Joshi, U., et al. (2018b). An overview of core coding tools in the av1 video codec. In 2018 picture coding symposium (PCS), pages 41-45. IEEE. DOI: 10.1109/PCS.2018.8456249.

Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y. (2017a). Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 764-773. DOI: 10.1109/ICCV.2017.89.

Dai, Y., Liu, D., and Wu, F. (2017b). A convolutional neural network approach for post-processing in hevc intra coding. In MultiMedia Modeling: 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part I 23, pages 28-39. Springer. DOI: 10.1007/978-3-319-51811-4_3.

Deng, J., Wang, L., Pu, S., and Zhuo, C. (2020). Spatio-temporal deformable convolution for compressed video quality enhancement. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 10696-10703. DOI: 10.1609/aaai.v34i07.6697.

Dong, C., Deng, Y., Loy, C. C., and Tang, X. (2015). Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE International Conference on Computer Vision, pages 576-584. DOI: 10.1109/ICCV.2015.73.

Fu, C.-M., Alshina, E., Alshin, A., Huang, Y.-W., Chen, C.-Y., Tsai, C.-Y., Hsu, C.-W., Lei, S.-M., Park, J.-H., and Han, W.-J. (2012). Sample adaptive offset in the hevc standard. IEEE Transactions on Circuits and Systems for Video technology, 22(12):1755-1764. DOI: 10.1109/TCSVT.2012.2221529.

Guan, Z., Xing, Q., Xu, M., Yang, R., Liu, T., and Wang, Z. (2019). Mfqe 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE transactions on pattern analysis and machine intelligence, 43(3):949-963. DOI: 10.1109/TPAMI.2019.2944806.

Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. DOI: 10.48550/arXiv.1412.6980.

Kreisler, G., Da Silveira, G., Zatt, B., Palomino, D., and Corrêa, G. (2024). Multi-codec video quality enhancement model based on spatio-temporal deformable fusion. In 2024 IEEE 15th Latin America Symposium on Circuits and Systems (LASCAS), pages 1-5. IEEE. DOI: 10.1109/LASCAS60203.2024.10506192.

Kuanar, S., Conly, C., and Rao, K. (2018). Deep learning based hevc in-loop filtering for decoder quality enhancement. In 2018 Picture Coding Symposium (PCS), pages 164-168. IEEE. DOI: 10.1109/PCS.2018.8456278.

Li, T., Xu, M., Zhu, C., Yang, R., Wang, Z., and Guan, Z. (2019). A deep learning approach for multi-frame in-loop filter of hevc. IEEE Transactions on Image Processing, 28(11):5663-5678. DOI: 10.1109/TIP.2019.2921877.

Li, Z., Liu, F., Yang, W., Peng, S., and Zhou, J. (2021). A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems, 33(12):6999-7019. DOI: 10.1109/TNNLS.2021.3084827.

Liang, M. and Hu, X. (2015). Recurrent convolutional neural network for object recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3367-3375. DOI: 10.1109/CVPR.2015.7298958.

Luo, D., Ye, M., Li, S., and Li, X. (2022). Coarse-to-fine spatio-temporal information fusion for compressed video quality enhancement. IEEE Signal Processing Letters, 29:543-547. DOI: 10.1109/lsp.2022.3147441.

Mac, K.-N. C., Joshi, D., Yeh, R. A., Xiong, J., Feris, R. S., and Do, M. N. (2019). Learning motion in feature space: Locally-consistent deformable convolution networks for fine-grained action detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6282-6291. DOI: 10.1109/ICCV.2019.00638.

Meng, X., Deng, X., Zhu, S., and Zeng, B. (2019). Enhancing quality for vvc compressed videos by jointly exploiting spatial details and temporal structure. In 2019 IEEE International Conference on Image Processing (ICIP), pages 1193-1197. IEEE. DOI: 10.1109/ICIP.2019.8804469.

Mukherjee, D., Bankoski, J., Grange, A., Han, J., Koleszar, J., Wilkins, P., Xu, Y., and Bultje, R. (2013). The latest open-source video codec vp9-an overview and preliminary results. In 2013 Picture Coding Symposium (PCS), pages 390-393. IEEE. DOI: 10.1109/PCS.2013.6737765.

Nah, S., Son, S., and Lee, K. M. (2019). Recurrent neural networks with intra-frame iterations for video deblurring. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8102-8111. DOI: 10.1109/CVPR.2019.00829.

Nam, H. and Han, B. (2016). Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4293-4302. DOI: 10.1109/CVPR.2016.465.

Norkin, A., Bjontegaard, G., Fuldseth, A., Narroschke, M., Ikeda, M., Andersson, K., Zhou, M., and Van der Auwera, G. (2012). Hevc deblocking filter. IEEE Transactions on Circuits and Systems for Video Technology, 22(12):1746-1754. DOI: 10.1109/TCSVT.2012.2223053.

Peng, B., Chang, R., Pan, Z., Li, G., Ling, N., and Lei, J. (2022). Deep in-loop filtering via multi-domain correlation learning and partition constraint for multiview video coding. IEEE Transactions on Circuits and Systems for Video Technology, 33(4):1911-1921. DOI: 10.1109/TCSVT.2022.3213515.

Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234-241. Springer. DOI: 10.1007/978-3-319-24574-4_28.

Sandvine, I. (2023). Global internet phenomena report. North America and Latin America. Available online [link].

Sullivan, G. J., Ohm, J.-R., Han, W.-J., and Wiegand, T. (2012). Overview of the high efficiency video coding (hevc) standard. IEEE Transactions on circuits and systems for video technology, 22(12):1649-1668. DOI: 10.1109/TCSVT.2012.2221191.

Tong, J., Wu, X., Ding, D., Zhu, Z., and Liu, Z. (2019). Learning-based multi-frame video quality enhancement. In 2019 IEEE International Conference on Image Processing (ICIP), pages 929-933. IEEE. DOI: 10.1109/ICIP.2019.8803786.

Tsai, C.-Y., Chen, C.-Y., Yamakage, T., Chong, I. S., Huang, Y.-W., Fu, C.-M., Itoh, T., Watanabe, T., Chujoh, T., Karczewicz, M., et al. (2013). Adaptive loop filtering for video coding. IEEE Journal of Selected Topics in Signal Processing, 7(6):934-945. DOI: 10.1109/JSTSP.2013.2271974.

Wang, X., Chan, K. C., Yu, K., Dong, C., and Change Loy, C. (2019). Edvr: Video restoration with enhanced deformable convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 0-0. DOI: 10.1109/CVPRW.2019.00247.

Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600-612. DOI: 10.1109/TIP.2003.819861.

Xue, T., Chen, B., Wu, J., Wei, D., and Freeman, W. T. (2019). Video enhancement with task-oriented flow. International Journal of Computer Vision, 127:1106-1125. DOI: 10.1007/s11263-018-01144-2.

Yang, R. and Timofte, R. (2021). NTIRE 2021 challenge on quality enhancement of compressed video: Dataset and study. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. DOI: 10.1109/CVPRW53098.2021.00076.

Yang, R., Xu, M., Liu, T., Wang, Z., and Guan, Z. (2018a). Enhancing quality for hevc compressed videos. IEEE Transactions on Circuits and Systems for Video Technology, 29(7):2039-2054. DOI: 10.1109/TCSVT.2018.2867568.

Yang, R., Xu, M., Wang, Z., and Li, T. (2018b). Multi-frame quality enhancement for compressed video. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6664-6673. DOI: 10.1109/CVPR.2018.00697.

Zhang, R., Isola, P., Efros, A. A., Shechtman, E., and Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586-595. DOI: 10.48550/arXiv.1801.03924.

Zhao, H., Zheng, B., Yuan, S., Zhang, H., Yan, C., Li, L., and Slabaugh, G. (2021a). Cbren: Convolutional neural networks for constant bit rate video quality enhancement. IEEE Transactions on Circuits and Systems for Video Technology. DOI: 10.1109/TCSVT.2021.3123621.

Zhao, M., Xu, Y., and Zhou, S. (2021b). Recursive fusion and deformable spatiotemporal attention for video compression artifact reduction. In Proceedings of the 29th ACM International Conference on Multimedia, pages 5646-5654. DOI: 10.1145/3474085.3475710.

Zhu, C., Dong, H., Pan, J., Liang, B., Huang, Y., Fu, L., and Wang, F. (2022). Deep recurrent neural network with multi-scale bi-directional propagation for video deblurring. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 3598-3607. DOI: 10.1609/aaai.v36i3.20272.

Zhu, X., Hu, H., Lin, S., and Dai, J. (2019). Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9308-9316. DOI: 10.1109/CVPR.2019.00953.

Downloads

Published

2025-08-06

How to Cite

Júnior, G. da S., Kreisler, G., Zatt, B., Palomino, D., & Corrêa, G. (2025). Enhancing Video Quality Using a Multi-Domain Spatio-Temporal Deformable Fusion Approach. Journal of the Brazilian Computer Society, 31(1), 583–597. https://doi.org/10.5753/jbcs.2025.5385

Issue

Section

Articles