\section{Introduction}


The Web 2.0  is characterized by the central role played by  users in the creation and sharing of  their own content. 
This content usually comprises a main \textit{object} (e.g., a video) and several sources of data associated with it, referred to as its  \textit{features}.  An object's \textit{textual features} are well defined blocks of text such as title, tags, description and user comments. Among all textual features, tags have attracted special attention  as they offer an effective data source  for Information Retrieval (IR) services such as search \cite{liwww2008} and classification \cite{IPM_flavio}.
Thus, there is a large interest in developing strategies to recommend tags to users, improving the quality of the generated tags and, indirectly, of the IR services that rely on them as data source.

Most existing tag recommendation strategies treat the problem as a multiple candidate tag ranking problem, sorting tag candidates according to some metric of  relevance and recommending tags that are in the top positions of the generated ranking \cite{belem_sigir2011,lipczak11,wu_www09}.  The relevance of a candidate usually refers to how well it \textit{describes} the content of the target object or helps to \textit{discriminate} it from other objects. 
This modeling of the problem motivates the use of Learning-to-Rank (L2R) based strategies to automatically ``learn'' good tag ranking functions. However, previous work on tag recommendation has explored only three different L2R techniques, namely, Genetic Programming (GP), RankSVM and RankBoost.  While \cite{cao_2009} proposed the use of RankSVM, \cite{wu_www09} explored RankBoost, although neither compared the adopted approach with alternative L2R methods. Similarly, in \cite{belem_sigir2011}, we investigated the use of both GP and RankSVM  to design tag recommendation methods, comparing them against each other as well as novel unsupervised heuristic methods that extended previously proposed strategies. Our results indicated that L2R techniques provide some gains over the best heuristic. Moreover, the flexibility of  the  L2R framework in terms of the incorporation of new attributes and ability to maximize different target measures makes it an attractive solution for the tag recommendation problem.  

These previous studies of L2R  techniques for tag recommendation are somewhat limited since: (1)  only three techniques have been considered,  (2)  only two  techniques have been directly compared against each other, and (3) the considered techniques have been evaluated with respect to   effectiveness only (efficiency aspects have been disregarded). However, given the success of L2R on several other domains \cite{gomes_2013,faria_mir2010}, there is currently a  large body of work on  L2R techniques. Moreover, the efficiency of these techniques  in terms of recommendation time, which is a time sensitive online task, should be taken into account when comparing them. This is an important performance measure since the user must wait for recommendations to be provided and may abandon the system if the waiting time is beyond a certain level.


%In \cite{belem_sigir2011}, we extended   existing tag recommendation methods that are based on tag co-occurrence patterns  to generate candidates not only from tags that have been previously assigned to the objects (including the target object), but also from terms contained in  other textual features. We also exploited  several heuristic metrics to capture the  \textit{relevance} of each such term for a target object, focusing on object-centered recommendations.  That is, we developed  functions to estimate the relevance of a candidate term as a  recommendation for a given object, thus enabling us to  rank the candidates according to such estimations and recommend the most relevant ones as tags for the object.  

%Learning-to-Rank (L2R) based strategies have emerged as a promising and flexible solution for this problem. In common, these strategies exploit a list of relevance metrics as atributes and a training dataset to ``learn'' the best way to combine these metrics into a recommendation function. One such effort is our  prior investigation \cite{belem_sigir2011},  where  both RankSVM and Genetic Programming were applied to the tag recommendation problem. Two other related efforts are those by  Cao {\it et al.} \cite{cao_2009} and Wu {\it et al}. \cite{wu_www09}, which exploit RankSVM and RankBoost as learning-to-rank techniques, respectively. 

%However, previous work proposed only three different L2R techniques, namely, \textit{Genetic Programming} ($GP$), $RankSVM$ and $RankBoost$, comparing at most two of them with relation to their effectiveness (e.g., precision) only. 


%In contrast, we here compare not only the effectiveness, but also the efficiency (in recommendation time) of eight L2R techniques: \textit{Random Forest} ($RF$), $MART$, $\lambda$-$MART$,  $ListNet$, $AdaRank$ and the three aforementioned techniques.

In this context, we here perform an extensive evaluation of the use of different L2R techniques for tag recommendation. Our main contributions are threefold. First, we compare eight L2R techniques, including  the three aforementioned methods as well as five techniques that have not been previously exploited for tag recomendation. These techniques are \textit{Random Forest} ($RF$), $MART$, $\lambda$-$MART$,  $ListNet$ and $AdaRank$. Second, our evaluation comprises not only  effectiveness (e.g.,  precision, NDCG \cite{MIR} and recall of recommended tags), but also efficiency (i.e., recommendation time), which has been ignored in previous work. Third, our experiments exploit real data collected from five popular Web 2.0 applications, namely, Bibsonomy, LastFM, MovieLens, YahooVideo and YouTube\footnote{\url{http://www.bibsonomy.org}, \url{http://www.last.fm}, \url{http://www.movielens.org}, \url{http://www.yahoo.com/video}, and \url{http://www.youtube.com}, respectively.}, which is a much larger number of datasets than used by  previous work. We compare all eight L2R methods against each other as well as against a state-of-the-art heuristic (the best unsupervised method proposed in \cite{belem_sigir2011}).  To our knowledge, this is the most comprehensive experimental evaluation of the use of L2R methods for tag recommendation. 

Our results show that, unlike  existing comparisons of different L2R techniques in other domains (e.g., document ranking) \cite{gomes_2013}, for tag recommendation, there is a clear winning group of methods ($RF$, $MART$ and $\lambda$-$MART$), with a slight advantage of two ($RF$ and $\lambda$-$MART$) over the other, with gains in NDCG \cite{MIR} ranging from 4\% to 12\% over the best of the remaining  L2R techniques considered.  We also find that recommendation time, despite some variation across methods, is under 1.3 seconds, on average, for all L2R techniques, in a worst case scenario in which no pre-computed data is available in cache. Thus, this recommendation time is reasonably short for an interactive task. Finally, we find that the best L2R method significantly outperforms the best unsupervised solution, with gains in NDCG of 29\%, on average. These results confirm   the  feasibility and effectiveness of the L2R approach for the tag recommendation task.


The rest of this article is organized as follows. Section 2 discusses related work, whereas Section 3 defines the problem. Section 4 introduces the methods analyzed here, while our experimental methodology and results are presented in Sections 5 and 6, respectively. Finally, conclusions and future work are offered in Section 7.