\section{Tag Recommendation: Problem Statement} \label{sec:rec}

A Web 2.0 $object$ refers to an instance of a media (text, audio, video, image) in an application. There are various features associated with an object. % which offer information about it.
{\it Textual features}, the main source of information exploited by the tag recommendation strategies considered here, are self-contained blocks of text, usually with a well defined functionality \cite{IPM_flavio}.
We here exploit the following textual features, commonly found in various applications: \textit{tags}, \textit{title} and \textit{description}.

% There are various sources of information related to an object, here referred to as its features. In particular, \textit{textual features}, our main source of information, comprise the self-contained textual blocks that are associated with an object, usually with a well defined functionality \cite{cikm}. The textual features here exploited are the object \textit{tags}, \textit{title} and \textit{description}.


As in \cite{belem_sigir2011}, we define the tag recommendation problem as follows. Given a target object $o$, a set of input tags $I_o$ associated with it, and a set of (other) textual features $F_o$ = $\{F_o^1, F_o^2,..., F_o^n\}$, where each element $F_o^i$ is the set of terms in textual feature $i$ of   object $o$, generate a set of candidates $C_o$ ($C_o \cap I_o = \emptyset$), sorted according to their relevance to $o$\footnote{We focus on the scenario when there are some initial tags in the target object (i.e., $I_o \neq \emptyset$), and we want to recommend new (different) tags to it. Our methods are also able to recommend relevant tags to an object with no tags by exploiting other textual features and metrics of relevance. We plan to investigate this scenario  in future work.}. Considering this scenario, out of the various tag recommendation methods available, those that have consistently produced the most competitive results often exploit term co-occurrence patterns with tags previously assigned to the target object (i.e., tags in $I_o$) and possibly other metrics of tag relevance \cite{belem_sigir2011,menezes2010,lipczak11}.
We here include metrics that capture such co-occurrence patterns as attributes exploited by the considered L2R methods (see Section \ref{sec:metrics}).   
 
In order to learn such co-occurrence patterns and to compute relevance metrics, we exploit a training set $\mathcal{D} = \{ \langle I_d, F_d \rangle \}$, where $I_d$ ($I_d \neq \emptyset$) contains all tags assigned to object $d$, and $F_d$ contains the term sets of the other textual features associated with $d$. There is also a test set $\mathcal{O}$, which is a collection of tuples $\{\langle I_o, F_o, Y_o\rangle\}$, where both $I_o$ and $Y_o$ are sets of tags associated with object $o$. However, while tags in $I_o$ are known (and given as input to the recommender), tags in $Y_o$ are assumed to be unknown and are taken as the relevant recommendations to the target object $o$ (i.e., the {\it gold standard}).  This split of the tags of each test object is done to enable an automatic assessment of the recommendations, as performed in various previous studies \cite{pairwise2010,yin_wsdm2013} and further discussed in Section \ref{sec:metodologia}. Similarly, there is a validation set $\mathcal{V}$ used for tuning parameters and ``learning'' recommendation functions (see Section \ref{sec:metodologia}). Thus, each object $v$ in $\mathcal{V}$ also has its tag set split into input tags ($I_v$) and gold standard ($Y_v$).

% Splitting the tags of each test object into these two subsets facilitates an automatic assessment of the recommendations, as performed in \cite{belem_sigir2011, garg2008} and further discussed in Section \ref{sec:metodologia}. This evaluation simulates the recommendation of new tags ($Y_o$) to an object that already has been annotated with $I_o$. Similarly, there is a validation set $\mathcal{V}$ used for tuning parameters and ``learning" recommendation functions (see Section \ref{sec:metodologia}). Thus, each object $v$ in $\mathcal{V}$ also has its tag set split into input tags ($I_v$) and gold standard ($Y_v$).

%Next, we briefly present several metrics that can be used to estimate the relevance of a candidate for tag recommendation. They have been  previously applied for recommending tags \cite{belem_sigir2011, sigur2008ftr, menezes2010, heymann_sigir08}, and here are used as attributes of various different L2R based methods.