• Home
  • Research
  • Machine Learning

Research on classification, regression, imputation, and data augmentation methods

Classification methods

Supervised classification is the most common task in ML. It addresses the question of class assignment to an input, where the class is selected from a set of categorical values. A given set of (annotated) examples is given, where for each example the value of its associated class is known.

Work on the conception of new classification methods based on vine copula models have been presented in Carrera_et_al:2016, Carrera_et_al:2019. These vine copula classifiers exploits the capacity of vine copulas to capture diverse types of patterns of interactions between variables. Classification methods based on classifiers that evolved by means of evolutionary algorithm have been presented in Santana_et_al:2011, Santana_et_al:2012c, Roman_et_al:2019, Santana_et_al:2019.

A variety of classification approaches to problems from neuroscience are presented in Santana_et_al:2011f, Santana:2013a, Santana_et_al:2012, Zhang_et_al:2015, Santana_et_al:2015d, Santana_et_al:2019. In some cases, the introduced classifiers are based on unsupervised learning algorithms, as is the case of applying the affinity propagation algorithm for neuron morphology classification Santana_et_al:2013d. Recent work on the design of multi-task prediction models based on deep neural networks has been presented in Garciarena_et_al:2020c, Garciarena_et_al:2021b.

Regression methods

Regression is one of the two most common tasks in ML. Given a set of inputs with the corresponding values associated to a target variables. The problem consists of predicting the value of the target variables for unlabeled examples. Usually, the target variable takes values in the continuous domain. Most of the research on regression methods have focused on the solution of real-world problems with particular characteristics (e.g., feature extraction is required, multiple target variables need to be predicted, etc..).

For example, in Murua_et_al:2018, the tool wear prediction problem for the Inconel 718 material was addressed. For this problem, feature extraction of the cutting forces is necessary for a more accurate prediction. Different regression methods were evaluated. In Khargharia_et_al:2020, we investigate the trade-off between the accuracy and the overall complexity of sets of RNNs that are used together to predict the volume of vehicles in a network of gas stations. In GarciaRodriguez_et_al:2021, isotonic regression and regressors based on different multi-layer perceptron architectures are compared to other traditional regression methods for prediction of the award winning price in the public procurement process.

More recently, in Roman_et_al:2021, a three-objective regression problem is addressed in the context of post-editing effort estimation from sentence embedding representation. An approach based on genetic programming is used to evolve kernels that are suitable for predicting several metrics at the same time.

Imputation and data augmentation methods

The goal of imputation is to impute or completing missing data in incomplete or corrupted examples. Data augmentation approaches are used to generate new examples that resemble those in the training set. Both, imputation and data augmentation methods are very important in scenarios in which the availability of data for training the ML model is scarce.

A review of the most common types of missing data types, and the imputation methods used to address them is presented in Garciarena_and_Santana:2017. Methods for that incorporate the automatic selection of the imputation strategy as part of the design of ML pipelines for classification problems are introduced in Garciarena_et_al:2018, Garciarena_et_al:2018c.

Data generation approaches for topic classification were presented in Montenegro_et_al:2019a.