💬 Natural Language Processing (NLP)
Research on hierarchical text classification, large language models, dialogue systems, and computational models of language in the brain.
Natural language processing (NLP) is the area of artificial intelligence concerned with the interaction between computers and human language. My NLP research covers a range of topics, from the development of novel classification methods for structured label spaces, to the exploration of large language models for industrial applications, to the computational modelling of how the human brain processes language meaning.
Multi-label Hierarchical Classification
Development of multi-dimensional hierarchical classification (MHC) methods for NLP tasks. In MHC, each instance is assigned to one category from each of multiple hierarchical classification trees simultaneously. This is a more general setting than standard multi-label or hierarchical classification, and arises naturally in spoken dialogue systems, product categorization, and medical coding.
The research addresses the characterization of MHC problems, the development of dedicated solving strategies (local vs. global approaches, label correlation modeling), and the design of appropriate performance measures for evaluating MHC systems.
Large Language Models (LLMs)
Dialogue Systems & Virtual Coaches
Brain & Language
ENIA Chair in AI & Language Technology
Selected Publications
- Santana R (2017). Reproducing and learning new algebraic operations on word embeddings using genetic programming. GECCO 2017.
- Santana R (2021). Semantic Composition of Word-Embeddings with Genetic Programming. GECCO 2021.
- Magalhães RPL, Santana R and Pozo A (2019). An empirical comparison of distance/similarity measures for Natural Language Processing. BRACIS 2019.
- Montenegro M, Marafioti A, Santana R, Alcaide JB, Bolaños M and Cuayahuitl H (2019). Data generation approaches for topic classification in multilingual spoken dialog system. INTERSPEECH 2019.
- Montenegro M, Santana R, Bolaños M and Cuayahuitl H (2020). Transfer learning in hierarchical dialogue topic classification with neural networks. IberSPEECH 2020.
- Montenegro M, Santana R, Bolaños M and Cuayahuitl H (2021). Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition module. INTERSPEECH 2021.
- Roman I, Santana R, Mendiburu A and Lozano JA (2019). Sentiment analysis with genetically evolved Gaussian kernels. GECCO 2019.
- Roman I, Santana R, Mendiburu A and Lozano JA (2021). Evolution of Gaussian Process kernels for machine translation post-editing effort estimation. Applied Soft Computing.
- Mei N, Carreiras M, Santana R and Pylkkänen L (2019). How the brain encodes meaning: Comparing word embedding and computer vision models to predict fMRI data. NeurIPS Workshop.
- Santana R (2022). An embedding space for SARS-CoV-2 epitope-based vaccines. European Journal of Clinical Investigation.
- Romero M and Santana R (2022). Creating wordless meaning in word-embedding. In preparation.