Natural language processing (NLP) is the area of artificial intelligence concerned with the interaction between computers and human language. My NLP research covers a range of topics, from the development of novel classification methods for structured label spaces, to the exploration of large language models for industrial applications, to the computational modelling of how the human brain processes language meaning.

Multi-label Hierarchical Classification

Multi-dimensional Hierarchical Classification

Development of multi-dimensional hierarchical classification (MHC) methods for NLP tasks. In MHC, each instance is assigned to one category from each of multiple hierarchical classification trees simultaneously. This is a more general setting than standard multi-label or hierarchical classification, and arises naturally in spoken dialogue systems, product categorization, and medical coding.

The research addresses the characterization of MHC problems, the development of dedicated solving strategies (local vs. global approaches, label correlation modeling), and the design of appropriate performance measures for evaluating MHC systems.

Data Generation for Multilingual Classification
Research on data generation approaches for topic classification in multilingual spoken dialogue systems. Development of methods to augment training data for NLP classifiers in low-resource languages, including back-translation, paraphrasing, and generation of synthetic dialogue turns.

Large Language Models (LLMs)

LLMs for Feature Selection and Anomaly Classification
Exploration of large language models (LLMs) as tools for feature selection and anomaly classification in industrial settings. Research on stacking predictions from multiple LLM models (ensemble of LLMs) to improve accuracy and explainability in anomaly detection tasks, particularly for building management systems.
Causality Explanation with LLMs
Use of large language models to generate causal explanations for anomaly classification decisions. Research on how LLMs can provide human-readable explanations of why a particular observation is classified as anomalous, improving the transparency and trustworthiness of anomaly detection systems.

Dialogue Systems & Virtual Coaches

Dialogue-Act Taxonomy for Virtual Coaches
Development and annotation of a dialogue-act taxonomy for a virtual coach system designed to improve the quality of life of elderly people. Research on how dialogue acts can be used to structure the conversational behavior of virtual assistants that support elderly users in their daily activities.
Distance/Similarity Measures for NLP
Empirical comparison of distance and similarity measures for natural language processing tasks. Evaluation of different measures (including edit distance, cosine similarity, and neural embedding distances) for tasks such as sentence similarity, question answering, and dialogue act classification.

Brain & Language

Word Embeddings and Brain Representations
Research comparing word embedding models and computer vision models in their ability to predict fMRI brain activity during visual word recognition. The study examines how well different computational representations of word meaning correspond to the neural representations observed in human brains during language processing.

ENIA Chair in AI & Language Technology

National AI Strategy (ENIA) Chair
Participation in the ENIA (Estrategia Nacional de Inteligencia Artificial) Chair at UPV/EHU, hosted within the HiTZ Center for Language Technology. The chair focuses on artificial intelligence and language technology, with activities covering research, teaching, and knowledge transfer in AI for language processing tasks.

Selected Publications