The Ferrety algorithm for the KDD Cup 2005 problem

Abstract

In this paper, we present a general solution for the KDD Cup 2005 problem. It uses the Internet as source of knowledge and extends it to categorize very short (less than 5 words) documents with reasonable accuracy. Our approach consists of three main parts: i.) a central knowledge filter ii.) an on-demand web crawler and iii.) a very efficient categorizer system. Our solution obtained Creativity and Precision Runner-up Awards at the competition. The main idea of Ferrety Algorithm can be generalized for mapping one taxonomy to another if training documents are available.

Topics

5 Figures and Tables

Download Full PDF Version (Non-Commercial Use)