This machine-learning framework develops and trains machine learning models without the need for a training data set that is fully-labeled. In classification, designing learning algorithms that can produce performance models using less human supervision is a long-sought goal. A pervasive practice in supervised learning for machine learning models is to train a classifier using data label pairs for every sample in a training set. Obtaining large sets of fully-labeled training data is expensive, time-consuming, and inefficient. But the industry has yet to address the technical challenges related to providing processes and systems for efficiently and/or directly obtaining data that captures sufficient label information relevant for developing and training machine-learning models.
Researchers at the University of Florida have developed a machine-learning classification framework utilizing sufficiently labeled data. Inspired by the principle of sufficiency in statistics, sufficiently-labeled data presents a summary of the fully-labeled training set. It captures the relevant information for classification, while being easier to obtain directly from annotators and preserving user privacy.
A development framework enables training of a machine learning model without needing to utilize full-labeled training data sets
Researchers at the University of Florida developed a framework for training machine-learning models. It comprises a hidden module and an output module, configured for predicting the plurality of the original labels. The machine-learning model then uses the sufficiently labeled data for training, via one or more processors. This automatically provides the trained machine learning model for use in prediction tasks. The framework is an alternative view on neural networks that turn the layers into linear models in feature spaces, with demonstrated benefit of workflow in a transfer learning setting.