Fontoura VD, Pozo AT and Santana R (2017), "Automated design of hyper-heuristics components to solve the PSP problem with HP model", In 2017 IEEE Congress on Evolutionary Computation (CEC). , pp. 1848-1855.
[Abstract] [BibTeX] [URL]
|
Abstract: The Protein Structure Prediction (PSP) problem is one of the modern most challenging problems from science. Simplified protein models are usually applied to simulate and study some characteristics of the protein folding process. Hence, many heuristic strategies have been applied in order to find simplified protein structures in which the protein configuration has the minimal energy. However, these strategies have difficulties in finding the optimal solutions to the longer sequences of amino-acids, due to the complexity of the problem and the huge amount of local optima. Hyper heuristics have proved to be useful in this type of context since they try to combine different heuristics strengths into a single framework. However, there is lack of work addressing the automated design of hyper-heuristics components. This paper proposes GEHyPSP, an approach which aims to achieve generation, through grammatical evolution, of selection mechanisms and acceptance criteria for a hyper-heuristic framework applied to PSP problem. We investigate the strengths and weaknesses of our approach on a benchmark of simplified protein models. GEHyPSP was able to reach the best known results for 7 instances from 11 that composed the benchmark set used to evaluate the approach. |
BibTeX:
@inproceedings{Fontoura_et_al:2017,
author = {Fontoura, Vidal D and Pozo, Aurora TR and Santana, Roberto},
title = {Automated design of hyper-heuristics components to solve the PSP problem with HP model},
booktitle = {2017 IEEE Congress on Evolutionary Computation (CEC)},
year = {2017},
pages = {1848--1855},
url = {https://ieeexplore.ieee.org/document/7969526}
}
|
Garciarena U and Santana R (2017), "An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers", Expert Systems with Applications. Vol. 89, pp. 52-65. Elsevier.
[Abstract] [BibTeX] [URL]
|
Abstract: When applying data-mining techniques to real-world data, we often find ourselves facing observations that have no value recorded for some attributes. This can be caused by several phenomena, such as a machine’s incapability to record certain characteristics or a person refusing to answer a question in a poll. Depending on that motivation, values gone missing may follow one kind of pattern or another, or describe no regularity at all. One approach to palliate the effect of missing data on machine learning tasks is to replace the missing observations. Imputation algorithms attempt to calculate a value for a missing gap, using information associated with it, i.e., the attribute and/or other values in the same observation. While several imputation methods have been proposed in the literature, few works have addressed the question of the relationship between the type of missing data, the choice of the imputation method, and the effectiveness of classification algorithms that used the imputed data. In this paper we address the relationship among these three factors. By constructing a benchmark of hundreds of databases containing different types of missing data, and applying several imputation methods and classification algorithms, we empirically show that an interaction between imputation methods and supervised classification can be deduced. Besides, differences in terms of classification performance for the same imputation method in different missing data patterns have been found. This points to the convenience of considering the combined choice of the imputation method and the classifier algorithm according to the missing data type. |
BibTeX:
@article{Garciarena_and_Santana:2017,
author = {Garciarena, Unai and Santana, Roberto},
title = {An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers},
journal = {Expert Systems with Applications},
publisher = {Elsevier},
year = {2017},
volume = {89},
pages = {52--65},
url = {https://www.sciencedirect.com/science/article/pii/S095741741730502X}
}
|
Garciarena U, Santana R and Mendiburu A (2017), "Evolving imputation strategies for missing data in classification problems with TPOT", CoRR. Vol. abs/1706.01120
[Abstract] [BibTeX] [URL]
|
Abstract: Missing data has a ubiquitous presence in real-life applications of machine learning techniques. Imputation methods are algorithms conceived for restoring missing values in the data, based on other entries in the database. The choice of the imputation method has an influence on the performance of the machine learning technique, e.g., it influences the accuracy of the classification algorithm applied to the data. Therefore, selecting and applying the right imputation method is important and usually requires a substantial amount of human intervention. In this paper we propose the use of genetic programming techniques to search for the right combination of imputation and classification algorithms. We build our work on the recently introduced Python-based TPOT library, and incorporate a heterogeneous set of imputation algorithms as part of the machine learning pipeline search. We show that genetic programming can automatically find increasingly better pipelines that include the most effective combinations of imputation methods, feature pre-processing, and classifiers for a variety of classification problems with missing data. |
BibTeX:
@article{Garciarena_et_al:2017a,
author = {Garciarena, Unai and Santana, Roberto and Mendiburu, Alexander},
title = {Evolving imputation strategies for missing data in classification problems with TPOT},
journal = {CoRR},
year = {2017},
volume = {abs/1706.01120},
url = {http://arxiv.org/abs/1706.01120}
}
|
Martins MSR, Delgado M, Lüders R, Santana R, Gonçalves RA and de Almeida CP (2017), "Probabilistic analysis of Pareto Front approximation for a hybrid multi-objective Bayesian estimation of distribution algorithm", In 2017 Brazilian Conference on Intelligent Systems (BRACIS). , pp. 384-389.
[Abstract] [BibTeX] [URL]
|
Abstract: https://ieeexplore.ieee.org/document/8247084 |
BibTeX:
@inproceedings{Martins_et_al:2017a,
author = {Martins, Marcella Scoczynski Ribeiro and Delgado, Myriam and Lüders, Ricardo and Santana, Roberto and Gonçalves, Richard A and de Almeida, Carolina P},
title = {Probabilistic analysis of Pareto Front approximation for a hybrid multi-objective Bayesian estimation of distribution algorithm},
booktitle = {2017 Brazilian Conference on Intelligent Systems (BRACIS)},
year = {2017},
pages = {384--389},
url = {https://ieeexplore.ieee.org/document/8247084}
}
|
Castro OR, Lozano JA, Santana R and Pozo A (2017), "Combining CMA-ES and MOEA/DD for many-objective optimization", In Proceedings of the IEEE Congress on Evolutionary Computation, 2017. (CEC 2017). , pp. 1451-1458.
[Abstract] [BibTeX] [URL]
|
Abstract: Multi-objective Estimation of Distribution Algorithms (MOEDAS) have been successfully applied to solve Multi-objective Optimization Problems (MOPs) since they are able to model dependencies between variables of the problem and then sample new solutions to guide the search to promising areas. A state-of-the-art optimizer for single-objective continuous functions that also uses probabilistic modeling is the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Different variants of CMA-ES have been proposed for MOPs however most of them are based on Pareto dominance as the main selection criterion. Recently, a new multi-objective CMA-ES called MOEA/D-CMA was proposed combining the strengths of CMA-ES with those of the multi-objective evolutionary algorithm based on decomposition (MOEA/D). Nowadays, however, researchers on MOEAs agree that combining Pareto and decomposition can be beneficial for the search on MOPs. As a result, a new MOEA has been proposed, called MOEA/DD. This algorithm modifies the MOEA/D by including a new Pareto dominance update mechanism that brings more diversity into the search. In this study, we extend the MOEA/D-CMA by replacing its update mechanism by the one of MOEA/DD. The hypothesis is that this update mechanism will improve the performance of MOEA/D-CMA as it improved MOEA/D. MOEA/D-CMA and MOEA/DD-CMA are implemented and evaluated through an experimental study. The experimental study involves two well-known families of benchmark problems whose objective numbers scale from two to fifteen. Then, an extensive statistical analysis of the results is made to extract sound, statistically supported conclusions about the performance of the algorithms as the number of objectives scales. |
BibTeX:
@inproceedings{Rodrigues_et_al:2017,
author = {Castro, Olacir R and Lozano, Jose A and Santana, Roberto and Pozo, Aurora},
title = {Combining CMA-ES and MOEA/DD for many-objective optimization},
booktitle = {Proceedings of the IEEE Congress on Evolutionary Computation, 2017. (CEC 2017)},
year = {2017},
pages = {1451--1458},
url = {https://ieeexplore.ieee.org/document/7969474}
}
|
Castro OR, Pozo A, Lozano JA and Santana R (2017), "An investigation of clustering strategies in many-objective optimization: the I-Multi algorithm as a case study", Swarm Intelligence. Vol. 11(2), pp. 101-130. Springer.
[Abstract] [BibTeX] [URL]
|
Abstract: A variety of general strategies have been applied to enhance the performance of multi-objective optimization algorithms for many-objective optimization problems (those with more than three objectives). One of these strategies is to split the solutions to cover different regions of the search space (clusters) and apply an optimizer to each region with the aim of producing more diverse solutions and achieving a better distributed approximation of the Pareto front. However, the effectiveness of clustering in this context depends on a number of issues, including the characteristics of the objective functions. In this paper we show how the choice of the clustering strategy can greatly influence the behavior of an optimizer. We investigate the relation between the characteristics of a multi-objective optimization problem and the efficiency of the use of a clustering combination (clustering space, metric) in the resolution of this problem. Using as a case study the Iterated Multi-swarm (I-Multi) algorithm, a recently introduced multi-objective particle swarm optimization algorithm, we scrutinize the impact that clustering in different spaces (of variables, objectives and a combination of both) can have on the approximations of the Pareto front. Furthermore, employing two difficult multi-objective benchmarks of problems with up to 20 objectives, we evaluate the effect of using different metrics for determining the similarity between the solutions during the clustering process. Our results confirm the important effect of the clustering strategy on the behavior of multi-objective optimizers. Moreover, we present evidence that some problem characteristics can be used to select the most effective clustering strategy, significantly improving the quality of the Pareto front approximations produced by I-Multi. |
BibTeX:
@article{Rodrigues_et_al:2017a,
author = {Castro, Olacir R and Pozo, Aurora and Lozano, Jose A and Santana, Roberto},
title = {An investigation of clustering strategies in many-objective optimization: the I-Multi algorithm as a case study},
journal = {Swarm Intelligence},
publisher = {Springer},
year = {2017},
volume = {11},
number = {2},
pages = {101--130},
url = {https://link.springer.com/article/10.1007/s11721-017-0134-9}
}
|
Santana R and Lozano JA (2017), "Different scenarios for survival analysis of evolutionary algorithms", In Proceedings of the Genetic and Evolutionary Computation Conference. , pp. 825-832.
[Abstract] [BibTeX] [URL]
|
Abstract: Empirical analysis of evolutionary algorithms (EAs) behavior is usually approached by computing relatively simple descriptive statistics like mean fitness and mean number of evaluations to convergence, or more theoretically sound statistical tests for finding significant differences between algorithms. However, these analyses do not consider situations where the EA failed to finish due to numerical errors or excessive computational time. Furthermore, the ability of an EA to continuously make search improvements is usually overlooked. In this paper we propose the use of the theory from survival analysis for empirically investigating the behavior of EAs, even in situations where not all the experiments finish in a reasonable time. We introduce two scenarios for the application of survival analysis in EAs. Survival trees, a machine learning technique adapted to the survival analysis scenario, are applied to automatically identify combinations of EA parameters with similar effect in the behavior of the algorithm. |
BibTeX:
@inproceedings{Santana_and_Lozano:2017,
author = {Santana, Roberto and Lozano, Jose A},
title = {Different scenarios for survival analysis of evolutionary algorithms},
booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference},
year = {2017},
pages = {825--832},
url = {https://dl.acm.org/doi/10.1145/3071178.3071250}
}
|
Santana R, Sirbiladze G, Ghvaberidze B and Matsaberidze B (2017), "A comparison of probabilistic-based optimization approaches for vehicle routing problems", In 2017 IEEE Congress on Evolutionary Computation (CEC). , pp. 2606-2613.
[Abstract] [BibTeX] [URL]
|
Abstract: Estimation of distribution algorithms (EDAs) are evolutionary algorithms that use probabilistic modeling to lead a more efficient search for optimal solutions. While EDAs have been applied to several types of optimization problems, they exhibit some limitations to deal with constrained optimization problems. More study and understanding of how can EDAs deal with these problems is required. In this paper we investigate the application of EDAs to a version of the vehicle routing problem in which solutions should satisfy a number of constraints involving the customers, the fleet vehicle, and the items to be delivered. For this problem, we compare two different representations of the solutions, and apply EDAs that use three probabilistic models with different characteristics. Our results show that the combination of an integer representation with tree-based probabilistic model produces the best results and is able to solve vehicle routing problems that contain over thousands of promising paths. |
BibTeX:
@inproceedings{Santana_et_al:2017,
author = {Santana, Roberto and Sirbiladze, Gia and Ghvaberidze, Bezhan and Matsaberidze, Bidzina},
title = {A comparison of probabilistic-based optimization approaches for vehicle routing problems},
booktitle = {2017 IEEE Congress on Evolutionary Computation (CEC)},
year = {2017},
pages = {2606--2613},
url = {https://ieeexplore.ieee.org/document/7969622}
}
|
Santana R (2017), "Reproducing and learning new algebraic operations on word embeddings using genetic programming", CoRR. Vol. abs/1702.05624
[Abstract] [BibTeX]
[URL]
|
Abstract: Metaheuristics that explore the decision variables space to construct probabilistic modeling from promising solutions, like estimation of distribution algorithms (EDAs), are becoming very popular in the context of Multi-objective Evolutionary Algorithms (MOEAs). The probabilistic model used in EDAs captures certain statistics of problem variables and their interdependencies. Moreover, the incorporation of local search methods tends to achieve synergy of MOEAs' operators and local heuristics aiming to improve the performance. In this work, we aim to scrutinize the probabilistic graphic model (PGM) presented in Hybrid Multi-objective Bayesian Estimation of Distribution Algorithm (HMOBEDA), which is based on a Bayesian network. Different from traditional EDA-based approaches, the PGM of HMOBEDA provides the joint probability of decision variables, objectives, and configuration parameters of an embedded local search. HMOBEDA has shown to be very competitive on instances of Multi-Objective Knapsack Problem (MOKP), outperforming state-of-the-art approaches. Two variants of HMOBEDA are proposed in this paper using different sample methods. We aim to compare the learnt structure in terms of the probabilistic Pareto Front approximation produced at the end of evolution. Results on instances of MOKP with 2 to 8 objectives show that both proposed variants outperformthe original approach, providing not only the best values for hypervolume and inverted generational distance indicators, butalso a higher diversity in the solution set. |
BibTeX:
@article{Santana:2017,
author = {Roberto Santana},
title = {Reproducing and learning new algebraic operations on word embeddings using genetic programming},
journal = {CoRR},
year = {2017},
volume = {abs/1702.05624},
url = {https://arxiv.org/abs/1702.05624v1}
}
|
Santana R (2017), "Gray-box optimization and factorized distribution algorithms: where two worlds collide", CoRR. Vol. abs/1707.03093
[Abstract] [BibTeX] [URL]
|
Abstract: The concept of gray-box optimization, in juxtaposition to black-box optimization, revolves about the idea of exploiting the problem structure to implement more efficient evolutionary algorithms (EAs). Work on factorized distribution algorithms (FDAs), whose factorizations are directly derived from the problem structure, has also contributed to show how exploiting the problem structure produces important gains in the efficiency of EAs. In this paper we analyze the general question of using problem structure in EAs focusing on confronting work done in gray-box optimization with related research accomplished in FDAs. This contrasted analysis helps us to identify, in current studies on the use problem structure in EAs, two distinct analytical characterizations of how these algorithms work. Moreover, we claim that these two characterizations collide and compete at the time of providing a coherent framework to investigate this type of algorithms. To illustrate this claim, we present a contrasted analysis of formalisms, questions, and results produced in FDAs and gray-box optimization. Common underlying principles in the two approaches, which are usually overlooked, are identified and discussed. Besides, an extensive review of previous research related to different uses of the problem structure in EAs is presented. The paper also elaborates on some of the questions that arise when extending the use of problem structure in EAs, such as the question of evolvability, high cardinality of the variables and large definition sets, constrained and multi-objective problems, etc. Finally, emergent approaches that exploit neural models to capture the problem structure are covered. |
BibTeX:
@article{Santana:2017a,
author = {Roberto Santana},
title = {Gray-box optimization and factorized distribution algorithms: where two worlds collide},
journal = {CoRR},
year = {2017},
volume = {abs/1707.03093},
url = {https://arxiv.org/abs/1707.03093}
}
|
de-Souza MZ, Santana R, Mendiburu A and Pozo ATR (2017), "Not all PBILs are the same: Unveiling the different learning mechanisms of PBIL variants", Applied Soft Computing. Vol. 53, pp. 88-96.
[Abstract] [BibTeX] [URL]
|
Abstract: Model-based optimization using probabilistic modeling of the search space is one of the areas where research on evolutionary algorithms (EAs) has considerably advanced in recent years. The population-based incremental algorithm (PBIL) is one of the first algorithms of its kind and it has been extensively applied to many optimization problems. In this paper we show that the different applications of PBIL reported in the literature correspond, in fact, to two essentially different algorithms, which are defined by the way the learning step is implemented. We analytically and empirically study the impact of the learning method on the search behavior of the algorithm. As a result of our research, we show examples in which the choice of a PBIL variant can produce qualitatively different outputs of the search process. |
BibTeX:
@article{Zangari_et_al:2017,
author = {Murilo Zangari-de-Souza and Roberto Santana and Alexander Mendiburu and Aurora Trinidad Ramirez Pozo},
title = {Not all PBILs are the same: Unveiling the different learning mechanisms of PBIL variants},
journal = {Applied Soft Computing},
year = {2017},
volume = {53},
pages = {88--96},
url = {https://www.sciencedirect.com/science/article/pii/S1568494616306743}
}
|
Zangari-de-Souza M, Mendiburu A, Santana R and Pozo A (2017), "Multiobjective Decomposition-based Mallows Models Estimation of Distribution Algorithm. A case of study for Permutation Flowshop Scheduling Problem", Information Sciences. Vol. 397--398, pp. 137-154. Elsevier.
[Abstract] [BibTeX] [URL]
|
Abstract: Estimation of distribution algorithms (EDAs) have become a reliable alternative to solve a broad range of single and multi-objective optimization problems. Recently, distance-based exponential models, such as Mallows Model (MM) and Generalized Mallows Model (GMM), have demonstrated their validity in the context of EDAs to deal with permutation-based optimization problems. The aim of this paper is two-fold. First, we introduce a novel general multi-objective decomposition-based EDA using Kernels of Mallows models (MEDA/D-MK framework) for solving multi-objective permutation-based optimization problems. Second, in order to demonstrate the validity of the MEDA/D-MK, we have applied it to solve the multi-objective permutation flowshop scheduling problem (MoPFSP) minimizing the total flow time and the makespan. The permutation flowshop scheduling problem is one of the most studied problems of this kind due to its fields of application and algorithmic challenge. The results of our experiments show that MEDA/D-MK outperforms an improved MOEA/D variant specific tailored for minimizing makespan and total flowtime. Furthermore, our approach achieves competitive results compared to the best-known approximated Pareto fronts reported in the literature for the benchmark considered. |
BibTeX:
@article{Zangari_et_al:2017a,
author = {Zangari-de-Souza, Murilo and Mendiburu, Alexander and Santana, Roberto and Pozo, Aurora},
title = {Multiobjective Decomposition-based Mallows Models Estimation of Distribution Algorithm. A case of study for Permutation Flowshop Scheduling Problem},
journal = {Information Sciences},
publisher = {Elsevier},
year = {2017},
volume = {397--398},
pages = {137--154},
url = {https://www.sciencedirect.com/science/article/pii/S0020025517305352}
}
|
Zangari-de-Souza M, Pozo A, Santana R and Mendiburu A (2017), "A decomposition-based binary ACO algorithm for the multiobjective UBQP", Neurocomputing. Vol. 246, pp. 58-68. Elsevier.
[Abstract] [BibTeX] [URL]
|
Abstract: The multiobjective unconstrained binary quadratic programming (mUBQP) is a combinatorial optimization problem which is able to represent several multiobjective optimization problems (MOPs). The problem can be characterized by the number of variables, the number of objectives and the objective correlation strength. Multiobjective evolutionary algorithms (MOEAs) are known as an efficient technique for solving MOPs. Moreover, several recent studies have shown the effectiveness of the MOEA/D framework applied to different MOPs. Previously, we have presented a preliminary study on an algorithm based on MOEA/D framework and the bio-inspired metaheuristic called binary ant colony optimization (BACO). The metaheuristic uses a positive feedback mechanism according to the best solutions found so far to update a probabilistic model which maintains the learned information. This paper presents the improved MOEA/D-BACO framework for solving the mUBQP. The components (i) mutation-like effect, and (ii) diversity preserving method are incorporated into the framework to enhance its search ability avoiding the premature convergence of the model and consequently maintaining a more diverse population of solutions. Experimental studies were conducted on a set of mUBQP instances. The results have shown that the proposed MOEA/D-BACO has outperformed MOEA/D, which uses genetic operators, in most of the test instances. Moreover, the algorithm has produced competitive results in comparison to the best approximated Pareto fronts from the literature. |
BibTeX:
@article{Zangari_et_al:2017b,
author = {Zangari-de-Souza, Murilo and Pozo, Aurora and Santana, Roberto and Mendiburu, Alexander},
title = {A decomposition-based binary ACO algorithm for the multiobjective UBQP},
journal = {Neurocomputing},
publisher = {Elsevier},
year = {2017},
volume = {246},
pages = {58--68},
url = {https://www.sciencedirect.com/science/article/pii/S0925231217302254}
}
|