Volume 8 : Number 2 : Paper 1

December 2005 Regular Issue with two Papers selected from CIESC 2004
Title:
Accuracy and Diversity in Ensembles of Text Categorisers

Authors and Affiliations:
Juan Jose Garcia Adeva, School of Electrical and Information Engineering University of Sydney, NSW 2006 Australia
Ulises Cervino, Instituto de Fısica, Santa Fe, Argentina
Rafael A. Calvo, School of Electrical and Information Engineering University of Sydney, NSW 2006 Australia

Abstract:
Error-Correcting Out Codes (ECOC) ensembles of binary classifiers are used in Text Categorisation
to improve the accuracy while benefiting from learning algorithms that only support two classes. An accurate ensemble relies on the quality of its corresponding decomposition matrix, which at the same time depends on the separation between the categories and the diversity of the dichotomies representing the binary classifiers. Important open questions include finding a good definition for diversity between two dichotomies and a way of combining all the pairwise diversity values into a single indicator that we call the decomposition quality. In this work we introduce a new measure to estimate the diversity between two learners and we compare it to the well-known Hamming distance. We also examine three functions to evaluate the decomposition quality. We present a set of experiments where these measures and functions are tested using two distinct document corpora with several configurations in each. The analysis of the results shows a weak relationship between the ensemble accuracy and its diversity.

Received July, 16, 2005, Revised December, 2, 2005 , Editor: Mauricio Solar
Full paper, 12 pages [ PDF, 1209 Kb ]