Foundation Models for Time Series Forecasting: Evidence from the Fuel Sector
DOI:
https://doi.org/10.5753/jbcs.2026.6840Keywords:
Forecasting, Foundation models, Zero-shot, Fine-tuning, Fuel demandAbstract
Foundation Models (FMs), typically based on large pre-trained architectures such as Transformers, have significantly advanced the fields of Natural Language Processing and Computer Vision and are increasingly being adapted to time series analysis, particularly for forecasting. However, systematic empirical evidence on their performance compared to traditional statistical, machine learning, and deep learning models on truly unseen time series data is limited, as many benchmark datasets may have been partially exposed during pre-training. This study provides empirical evidence from the fuel sector by benchmarking six state-of-the-art FMs against ten traditional forecasting methods, using 34 years of monthly fuel demand data with diverse and complex patterns. Accurate short-term forecasting of fuel demand is critical for decision-making across transportation, energy, and industry, making this domain particularly suitable for evaluating FMs’ capabilities. To this end, we assess both zero-shot inference and multiple fine-tuning strategies. Our results show that certain FMs, including Chronos and TimesFM, rank among the top-performing models in terms of RRMSE and POCID across zero-shot and fine-tuning settings, while classical statistical models such as ETS remain competitive. These findings have the potential to guide model selection in the fuel domain and similar real-world applications.
Downloads
References
Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623-2631. DOI: 10.1145/3292500.3330701.
Ansari, A. F., Stella, L., Turkmen, A. C., Zhang, X., Mercado, P., Shen, H., Shchur, O., Rangapuram, S. S., Arango, S. P., Kapoor, S., Zschiegner, J., Maddix, D. C., Wang, H., Mahoney, M. W., Torkkola, K., Wilson, A. G., Bohlke-Schneider, M., and Wang, B. (2024). Chronos: Learning the language of time series. Transactions on Machine Learning Research, pages 1-42. DOI: 10.48550/arXiv.2403.07815.
Benavoli, A., Corani, G., and Mangili, F. (2016). Should we really use post-hoc tests based on mean-ranks? The Journal of Machine Learning Research, 17(1):152-161. DOI: 10.5555/2946645.2946650.
Blázquez-García, A., Conde, A., Mori, U., and Lozano, J. A. (2021). A review on outlier/anomaly detection in time series data. ACM computing surveys, 54(3):1-33. DOI: 10.1145/3444690.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33:1877-1901. DOI: 10.5555/3495724.3495883.
Das, A., Kong, W., Sen, R., and Zhou, Y. (2024). A decoder-only foundation model for time-series forecasting. In Proceedings of the International Conference on Machine Learning. JMLR.org. DOI: 10.5555/3692070.3692474.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research, 7(Jan):1-30. DOI: 10.5555/1248547.1248548.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186. Association for Computational Linguistics. DOI: 10.18653/v1/N19-1423.
Elsharkawy, A. and colleagues (2017). Analyzing gasoline consumption in the u.s.: Evidence of seasonal patterns and driving behavior. Transportation Research Part D: Transport and Environment, 53:49-62. DOI: 10.1016/j.trd.2016.12.010.
Frutuoso, F., Alves, C., Araújo, S., Serra, D., Barros, A., Cavalcante, F., Araújo, R., Policarpo, N., and Oliveira, M. (2023). Assessing light flex-fuel vehicle emissions with ethanol/gasoline blends along an urban corridor: a case of fortaleza/brazil. International Journal of Transportation Science and Technology, 12(2):447-459. DOI: 10.1016/j.ijtst.2022.04.001.
Garza, A. and Mergenthaler-Canseco, M. (2023). Timegpt-1. arXiv. DOI: 10.48550/arXiv.2310.03589.
Godahewa, R., Bergmeir, C., Webb, G. I., Hyndman, R. J., and Montero-Manso, P. (2021). Monash time series forecasting archive. In Neural Information Processing Systems Track on Datasets and Benchmarks. DOI: 10.48550/arXiv.2105.06643.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8):1735–1780. DOI: 10.1162/neco.1997.9.8.1735.
Hyndman, R., Koehler, A., Ord, K., and Snyder, R. (2008). Forecasting with exponential smoothing: the state space approach. Springer.
Ismail-Fawaz, A., Dempster, A., Tan, C. W., Herrmann, M., Miller, L., Schmidt, D. F., Berretti, S., Weber, J., Devanne, M., Forestier, G., et al. (2023). An approach to multiple comparison benchmark evaluations that is stable under manipulation of the comparate set. arXiv preprint arXiv:2305.11921. DOI: 10.48550/arXiv.2305.11921.
Ko, A. H., Sabourin, R., and Britto Jr, A. S. (2008). From dynamic classifier selection to dynamic ensemble selection. Pattern recognition, 41(5):1718-1731. DOI: 10.1016/j.patcog.2007.10.015.
Krause, J., Beiruth, A. C., Barddal, J. P., Britto Jr, A. S., and Souza, V. M. A. (2024). Fuels demand forecasting: Identifying leading feature sets, prediction strategy, and regressors. In Proceedings of the International Conference on Machine Learning and Applications, pages 957-962. DOI: 10.1109/ICMLA61862.2024.00141.
Kruger, R., Mueen, A., and Souza, V. M. A. (2024). Peak prediction in time series: Comparing approaches for energy high-load prediction. In Proceedings of the International Joint Conference on Neural Networks, pages 1-8. IEEE. DOI: 10.1109/IJCNN60899.2024.10651140.
Lepot, M., Aubin, J.-B., and Clemens, F. H. (2017). Interpolation in time series: An introductive overview of existing methods, their performance criteria and uncertainty assessment. Water, 9(10):796. DOI: 10.3390/w9100796.
Liang, Y., Wen, H., Nie, Y., Jiang, Y., Jin, M., Song, D., Pan, S., and Wen, Q. (2024). Foundation models for time series analysis: A tutorial and survey. In Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining, pages 6555-6565. DOI: 10.1145/3637528.3671451.
Lima, F. T. and Souza, V. M. A. (2023). A large comparison of normalization methods on time series. Big Data Research, 34:100407. DOI: 10.1016/j.bdr.2023.100407.
Makridakis, S. and Hibon, M. (2000). The m3-competition: results, conclusions and implications. International journal of forecasting, 16(4):451-476. DOI: 10.1016/S0169-2070(00)00057-1.
Makridakis, S., Spiliotis, E., and Assimakopoulos, V. (2020). The m4 competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting, 36(1):54-74. DOI: 10.1016/j.ijforecast.2019.04.014.
Miller, J. A., Aldosari, M., Saeed, F., Barna, N. H., Rana, S., Arpinar, I. B., and Liu, N. (2024). A survey of deep learning and foundation models for time series forecasting. arXiv preprint arXiv:2401.13912. DOI: 10.48550/arXiv.2401.13912.
Oreshkin, B. N., Carpov, D., Chapados, N., and Bengio, Y. (2020). N-BEATS: neural basis expansion analysis for interpretable time series forecasting. In Proceedings of the International Conference on Learning Representations. DOI: 10.48550/arXiv.1905.10437.
Parmezan, A. R. S., Souza, V. M. A., and Batista, G. E. (2022). Time series prediction via similarity search: Exploring invariances, distance measures and ensemble functions. IEEE Access, 10:78022-78043. DOI: 10.1109/ACCESS.2022.3192849.
Parmezan, A. R. S., Souza, V. M. A., and Batista, G. E. A. P. A. (2019). Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model. Information Sciences, 484(5):302-337. DOI: 10.1016/j.ins.2019.01.076.
Parsons, S. D. (1980). Estimating fuel requirements for field operations. Technical Report AE-110, Purdue University Cooperative Extension Service, West Lafayette, IN.
Policarpo, N. A., Silva, C., Lopes, T. F. A., dos Santos Araújo, R., Cavalcante, F. S. Á., Pitombo, C. S., and de Oliveira, M. L. M. (2018). Road vehicle emission inventory of a brazilian metropolitan area and insights for other emerging economies. Transportation Research Part D: Transport and Environment, 58:172-185. DOI: 10.1016/j.trd.2017.12.004.
Rasul, K., Ashok, A., Williams, A. R., Khorasani, A., Adamopoulos, G., Bhagwatkar, R., Biloš, M., Ghonia, H., Hassen, N., Schneider, A., Garg, S., Drouin, A., Chapados, N., Nevmyvaka, Y., and Rish, I. (2023). Lag-llama: Towards foundation models for time series forecasting. In R0-FoMo:Robustness of Few-shot and Zero-shot Learning in Large Foundation Models. DOI: 10.48550/arXiv.2310.08278.
Serrano, A. L. M., dos Santos Martins, P. H., Vergara, G. F., Bispo, G. D., Rodrigues, G. A. P., Mosquéra, L. R., Oliveira, M. N. d., Neumann, C., Peixoto, M. G. M., and Gonçalves, V. P. (2025). Forecasting ethanol and gasoline consumption in brazil: Advanced temporal models for sustainable energy management. Energies, 18(6):1501. DOI: 10.3390/en18061501.
Shi, X., Wang, S., Nie, Y., Li, D., Ye, Z., Wen, Q., and Jin, M. (2024). Time-moe: Billion-scale time series foundation models with mixture of experts. In Proceedings of the International Conference on Learning Representations, pages 1-16. DOI: 10.48550/arXiv.2409.16040.
Siddiquee, M. A., Souza, V. M., Baker, G. E., and Mueen, A. (2022). Septor: Seismic depth estimation using hierarchical neural networks. In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pages 3889-3897. DOI: 10.1145/3534678.3539166.
Taylor, S. J. and Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1):37-45. DOI: 10.1080/00031305.2017.1380080.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. DOI: 10.48550/arXiv.1706.03762.
Wang, W., Zheng, V. W., Yu, H., and Miao, C. (2019). A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):1-37. DOI: 10.1145/3293318.
Woo, G., Liu, C., Kumar, A., Xiong, C., Savarese, S., and Sahoo, D. (2024). Unified training of universal time series forecasting transformers. In Proceedings of the 41st International Conference on Machine Learning, volume 234, pages 53140-53164. DOI: https://dl.acm.org/doi/10.5555/3692070.3694248.
Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014). How transferable are features in deep neural networks? Advances in neural information processing systems, 27. DOI: 10.48550/arXiv.1411.1792.
Zeng, A., Chen, M., Zhang, L., and Xu, Q. (2023). Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121-11128. DOI: 10.1609/aaai.v37i9.26317.
Zhu, Z., Chen, H., Qu, Q., and Chung, V. (2025). Fincast: A foundation model for financial time-series forecasting. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management, pages 4539-4549. DOI: 10.1145/3746252.3761261.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Jonas Krause, Alex C. D. Lopes, Lucas G. M. Castro, André G. R. Ribeiro, Marcos A. Mochinski, Emerson Cabrera Paraiso, Fabrício Enembreck, Jean Paul Barddal, Alceu de Souza Britto Jr, Vinicius M. A. Souza

This work is licensed under a Creative Commons Attribution 4.0 International License.

