Towards a Statistical Evaluation of PigLatin Joins

Authors

  • Renato Javier M. Mogrovejo No affiliation declared
  • José Maria Monteiro No affiliation declared
  • Javam C. Machado No affiliation declared
  • Carlos Juliano M. Viana No affiliation declared
  • Sérgio Lifschitz PUC-Rio

DOI:

https://doi.org/10.5753/jidm.2013.1511

Keywords:

Cloud databases, MapReduce, PigLatin, Hadoop, Join Evaluation, Multiple Regression

Abstract

There are different scalable data management solutions which can take advantage of cloud features making them more attractive for a deployment in such environments. One of the most critical operations in data processing is joining large data sets. This is one of the most expensive and hardest operations to optimize. We are mainly  concerned here with join operations expressed in PigLatin, an abstract query language for a high-level platform, called Pig, which creates MapReduce programs with Hadoop. In this work we explore statistical methods, namely multiple regression analysis, in order to predict three distinct join types execution times, comparing them with actual running times.

Downloads

Download data is not yet available.

Downloads

Published

2013-09-25

How to Cite

M. Mogrovejo, R. J., Monteiro, J. M., C. Machado, J., M. Viana, C. J., & Lifschitz, S. (2013). Towards a Statistical Evaluation of PigLatin Joins. Journal of Information and Data Management, 4(3), 483. https://doi.org/10.5753/jidm.2013.1511

Issue

Section

SBBD Articles