In semi-supervised learning, a small amount of labeled data is combined with a large set of unlabeled data, which can lead to significant improvements in learning accuracy. One technique to achieve this is co-training, in which multiple learners are trained on a set of examples, either with different learning methods on a single set of features (single-view) or with different feature sets (multi-view). The authors propose a combination of the two, the multiple-view multiple-learner (MVML) framework.
MVML begins by sampling from a pool of unlabeled data and trains multiple learner ensembles, each using its own view of the training set. After each ensemble labels the unlabeled data in the sample, all confidently predicted examples are aggregated into the expanded training set, which is replenished at random from the set of unlabeled data. Thus, over time the performance of the MVML generally improves.
This framework is tested along with the other two frameworks, and the results appear to be very encouraging. The three tests performed by the authors are based on text classification, advertisement image classification, and handwritten digit recognition. In all three tests, the MVML framework performed better at classification than the other two frameworks. The authors acknowledge that the results were not always as good as they had hoped, as in the case of handwritten digits, but were nonetheless an improvement over existing methodologies.
The paper assumes familiarity with various learning methods, although brief explanations, together with references for additional information, are given. The emphasis lies more on the description of the framework and the experimental investigation of its performance in comparison with simpler methods than on a theoretical treatment of the underlying algorithms. The authors provide sufficient information on the comparison methods used and the essential criteria for the different test cases.
While the combination of multiple views and multiple learners into a single approach seems reasonably straightforward, the authors appear to be the first ones to have investigated this combination. The initial results look promising, and they hopefully will lead to further experiments and a more formal analysis.