A Relevance Measure for Multivalued Attributes

Authors

  • Mariana Tasca Universidade Federal Fluminese
  • Bianca Zadrozny IBM Research
  • Alexandre Plastino Universidade Federal Fluminese

DOI:

https://doi.org/10.5753/jidm.2013.1507

Keywords:

attribute selection, classification, multi-relational data mining, multivalued attributes, relevance measures

Abstract

An important step in the knowledge discovery in databases (KDD) process is the attribute selection procedure, which aims at choosing a subset of attributes that can represent the important information within the data. Most of the existing attribute selection methods can only handle simple attribute types, such as categorical and numerical. In particular, these methods cannot be applied to multivalued attributes, which are attributes that take multiple values simultaneously for the same instance in the dataset. In many real datasets, however, multivalued attributes are present, e.g., the types of books owned by a person may be represented by a multivalued attribute. This article proposes a relevance measure for multivalued attributes, which aims at measuring their importance for classification. The proposed measure takes into account the ability that the attribute has for determining the instance class. In order to evaluate the proposed measure, experiments were conducted with several datasets submitted to multi-relational classifiers. The experiments show that the resulting accuracy values follow, in most cases, the values of the proposed relevance measure. This is an evidence that the proposed measure can be a good indicator of the relevance of multivalued attributes for classification.

Downloads

Download data is not yet available.

Downloads

Published

2013-09-25

How to Cite

Tasca, M., Zadrozny, B., & Plastino, A. (2013). A Relevance Measure for Multivalued Attributes. Journal of Information and Data Management, 4(3), 421. https://doi.org/10.5753/jidm.2013.1507

Issue

Section

SBBD Articles