ENDLESS: An End-to-End Framework for Urban Synthetic Dataset Generation
DOI:
https://doi.org/10.5753/jbcs.2025.5869Keywords:
Synthetic Data, Smart Cities, Computer VisionAbstract
Computer vision models are fundamental for smart city applications. These models enable the city to interpret visual data, obtained from sensors such as surveillance cameras, to optimize its tasks and positively impact the citizens' lives. However, these models require ever-growing amounts of labeled data for training, which is expensive and raises ethical concerns when collected in the real world. Conversely, 3D engines and simulators allow the cost-effective and large-scale generation of automatically annotated synthetic data. This work proposes a synthetic dataset generator for the smart cities field using the CARLA simulator. The proposed generator allows the end-to-end generation of massive datasets with a single command, which includes the simulation of city assets, such as vehicles and pedestrians, and the recording and annotation of visual data. To demonstrate the generator's effectiveness, a dataset with over 300K annotated frames was generated and compared with other state-of-the-art datasets. The comparison results show that the proposed generator is capable of producing datasets comparable to the state of the art in terms of data volume and number of annotations. It's expected that the proposed generator could be used to create useful datasets for training and evaluating computer vision models in the smart cities area. It's also expected that this work brings attention to the synthetic data usage for smart city models.Downloads
References
Adewopo, V. A., Elsayed, N., ElSayed, Z., Ozer, M., Abdelgawad, A., and Bayoumi, M. (2023). A review on action recognition for accident detection in smart city transportation systems. Journal of Electrical Systems and Information Technology, 10(1):57. DOI: 10.1186/s43067-023-00124-y.
Barthélemy, J., Verstaevel, N., Forehead, H., and Perez, P. (2019). Edge-computing video analytics for real-time traffic monitoring in a smart city. Sensors, 19(9):2048. DOI: 10.3390/s19092048.
Bhat, S. F., Birkl, R., Wofk, D., Wonka, P., and Müller, M. (2023). Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288. DOI: 10.48550/arxiv.2302.12288.
Chourabi, H., Nam, T., Walker, S., Gil-Garcia, J. R., Mellouli, S., Nahon, K., Pardo, T. A., and Scholl, H. J. (2012). Understanding smart cities: An integrative framework. In 2012 45th Hawaii International Conference on System Sciences, pages 2289-2297. DOI: 10.1109/HICSS.2012.615.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213-3223. DOI: 10.1109/cvpr.2016.350.
Deschaud, J.-E. (2021). Kitti-carla: a kitti-like dataset generated by carla simulator. arXiv preprint arXiv:2109.00892. DOI: 10.48550/arxiv.2109.00892.
Deschaud, J.-E., Duque, D., Richa, J. P., Velasco-Forero, S., Marcotegui, B., and Goulette, F. (2021). Paris-carla-3d: A real and synthetic outdoor point cloud dataset for challenging tasks in 3d mapping. Remote Sensing, 13(22):4713. DOI: 10.3390/rs13224713.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. DOI: 10.48550/arxiv.2010.11929.
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., and Koltun, V. (2017). CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, pages 1-16. DOI: 10.48550/arXiv.1711.03938.
Fabbri, M., Brasó, G., Maugeri, G., Cetintas, O., Gasparini, R., Ovsep, A., Calderara, S., Leal-Taixé, L., and Cucchiara, R. (2021). Motsynth: How can synthetic data help pedestrian detection and tracking? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10849-10859. DOI: 10.1109/iccv48922.2021.01067.
Gaidon, A., Wang, Q., Cabon, Y., and Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4340-4349. DOI: 10.48550/arXiv.1605.06457.
Geiger, A., Lenz, P., and Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pages 3354-3361. IEEE. DOI: 10.1109/cvpr.2012.6248074.
Giffinger, R., Fertner, C., Kramar, H., Kalasek, R., Pichler-Milanovic, N., and Meijers, E. J. (2007). Smart cities. ranking of european medium-sized cities. final report. Available at:[link].
He, K., Gkioxari, G., Dollár, P., and Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961-2969. DOI: 10.1109/iccv.2017.322.
Herzog, F., Chen, J., Teepe, T., Gilg, J., Hörmann, S., and Rigoll, G. (2023). Synthehicle: Multi-vehicle multi-camera tracking in virtual cities. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1-11. DOI: 10.1109/wacvw58289.2023.00005.
Kerim, A., Aslan, C., Celikcan, U., Erdem, E., and Erdem, A. (2021). Nova: Rendering virtual worlds with humans for computer vision tasks. In Computer Graphics Forum, volume 40, pages 258-272. Wiley Online Library. DOI: 10.1111/cgf.14271.
Kloukiniotis, A., Papandreou, A., Anagnostopoulos, C., Lalos, A., Kapsalas, P., Nguyen, D.-V., and Moustakas, K. (2022). Carlascenes: A synthetic dataset for odometry in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4520-4528. DOI: 10.1109/cvprw56347.2022.00498.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324. DOI: 10.1109/5.726791.
Li, Y., Jiang, L., Xu, L., Xiangli, Y., Wang, Z., Lin, D., and Dai, B. (2023). Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3205-3215. DOI: 10.1109/iccv51070.2023.00297.
Monk, E. (2019). Monk skin tone scale. Available at:[link].
Pathiraja, B., Liu, C., and Senanayake, R. (2024). Fairness in autonomous driving: Towards understanding confounding factors in object detection under challenging weather. arXiv preprint arXiv:2406.00219. DOI: 10.48550/arxiv.2406.00219.
Paulin, G. and Ivasic-Kos, M. (2023). Review and analysis of synthetic dataset generation methods and techniques for application in computer vision. Artificial intelligence review, 56(9):9221-9265. DOI: 10.1007/s10462-022-10358-3.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779-788. DOI: 10.1109/cvpr.2016.91.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28. DOI: 10.1109/tpami.2016.2577031.
Richter, S. R., Vineet, V., Roth, S., and Koltun, V. (2016). Playing for data: Ground truth from computer games. In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 102-118. Springer. DOI: 10.1007/978-3-319-46475-6_7.
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., and Lopez, A. M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3234-3243. DOI: 10.1109/cvpr.2016.352.
Silva, B. N., Khan, M., Jung, C., Seo, J., Muhammad, D., Han, J., Yoon, Y., and Han, K. (2018). Urban planning and smart city decision management empowered by real-time data processing using big data analytics. Sensors, 18(9):2994. DOI: 10.3390/s18092994.
Stauner, T., Blank, F., Fürst, M., Günther, J., Hagn, K., Heidenreich, P., Huber, M., Knerr, B., Schulik, T., and Leiss, K.-F. (2022). Synpeds: A synthetic dataset for pedestrian detection in urban traffic scenes. In Proceedings of the 6th ACM Computer Science in Cars Symposium, pages 1-10. DOI: 10.1145/3568160.3570230.
Syahidi, A. A., Kiyokawa, K., and Okura, F. (2023). Computer vision in smart city application: A mapping review. In 2023 6th International Conference on Applied Computational Intelligence in Information Systems (ACIIS), pages 1-6. IEEE. DOI: 10.1109/aciis59385.2023.10367332.
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23-30. IEEE. DOI: 10.1109/iros.2017.8202133.
Turkcan, M. K., Li, Y., Zang, C., Ghaderi, J., Zussman, G., and Kostic, Z. (2024). Boundless: Generating photorealistic synthetic data for object detection in urban streetscapes. arXiv preprint arXiv:2409.03022. DOI: 10.48550/arxiv.2409.03022.
Yar, H., Khan, Z. A., Ullah, F. U. M., Ullah, W., and Baik, S. W. (2023). A modified yolov5 architecture for efficient fire detection in smart cities. Expert Systems with Applications, 231:120465. DOI: 10.1016/j.eswa.2023.120465.
Zaman, M., Puryear, N., Abdelwahed, S., and Zohrabi, N. (2024). A review of iot-based smart city development and management. Smart Cities, 7(3):1462-1501. DOI: 10.3390/smartcities7030061.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Amadeo Tato Cota Neto, Willams de Lima Costa, Veronica Teichrieb, Joao Marcelo Teixeira

This work is licensed under a Creative Commons Attribution 4.0 International License.

