Wu describes a heuristic, attribute-based, noise-tolerant data mining program based on the extension matrix approach. Given n negative examples and p positive examples, where each example has a attributes, one can define the negative example matrix NEM to be the matrix whose rows are the negative examples and whose columns give the attribute values. For each positive example P, one constructs an extension matrix whose elements are * (a so-called dead element) when P has the same value as the corresponding attribute of the negative example, and whose elements have the value of the attribute of the negative example otherwise. If one can find a set of n non-dead elements, one in each row, this defines a conjunctive formula that separates the given example P from the negative example. The algorithm HCV uses heuristics to find, in polynomial time ( O ( p n a3 + p2 n a ) ), a disjunctive formula separating all the positive examples from the negative examples whenever at least one such formula exists.
The paper contains a brief discussion of how the algorithm can be modified to cope with noisy data. It concludes with the results of experimental comparisons of HCV with other data mining algorithms, including C4.5. The author observes that HCV is much faster than C4.5 on some sets and slower on other sets.