publications | Eliseo Bao

conferences

2024

Adapting Large Language Models for Underrepresented Languages

Eliseo Bao, Anxo Pérez, and Javier Parapar

In VII Congreso XoveTIC: impulsando el talento cientı́fico, 2024

@inproceedings{bao2024adapting,
  title = {Adapting Large Language Models for Underrepresented Languages},
  author = {Bao, Eliseo and Pérez, Anxo and Parapar, Javier},
  booktitle = {VII Congreso XoveTIC: impulsando el talento cient{\'\i}fico},
  year = {2024},
  organization = {Universidade da Coru{\~n}a, Servizo de Publicaci{\'o}ns},
  abstact = {The popularization of Large Language Models (LLMs), especially with the development of conversational systems, makes mandatory to think about facilitating the use of artificial intelligence (AI) to everyone. Most models neglect minority languages, prioritizing widely spoken ones. This exacerbates their underrepresentation in the digital world and negatively affects their speakers. We present two resources aimed at improving natural language processing (NLP) for Galician: (i) a Llama 3.1 instruct model adapted through continuous pre-training on the CorpusNos dataset; and (ii) a Galician version of the Alpaca dataset, used to assess the improvement over the base model. In this evaluation, our model outperformed both the base model and another Galician model in quantitative and qualitative terms},
}

journals

2024

HISC
Explainable depression symptom detection in social media

Eliseo Bao, Anxo Pérez, and Javier Parapar

Health Information Science and Systems, Sep 2024

Abs Bib HTML PDF Code

Users of social platforms often perceive these sites as supportive spaces to post about their mental health issues. Those conversations contain important traces about individuals’ health risks. Recently, researchers have exploited this online information to construct mental health detection models, which aim to identify users at risk on platforms like Twitter, Reddit or Facebook. Most of these models are centred on achieving good classification results, ignoring the explainability and interpretability of the decisions. Recent research has pointed out the importance of using clinical markers, such as the use of symptoms, to improve trust in the computational models by health professionals. In this paper, we propose using transformer-based architectures to detect and explain the appearance of depressive symptom markers in the users’ writings. We present two approaches: i) train a model to classify, and another one to explain the classifier’s decision separately and ii) unify the two tasks simultaneously using a single model. Additionally, for this latter manner, we also investigated the performance of recent conversational LLMs when using in-context learning. Our natural language explanations enable clinicians to interpret the models’ decisions based on validated symptoms, enhancing trust in the automated process. We evaluate our approach using recent symptom-based datasets, employing both offline and expert-in-the-loop metrics to assess the quality of the explanations generated by our models. The experimental results show that it is possible to achieve good classification results while generating interpretable symptom-based explanations.
@article{bao2024explainable, title = {Explainable depression symptom detection in social media}, author = {Bao, Eliseo and P{\'e}rez, Anxo and Parapar, Javier}, year = {2024}, month = sep, day = {06}, journal = {Health Information Science and Systems}, volume = {12}, number = {1}, pages = {47}, doi = {10.1007/s13755-024-00303-9}, issn = {2047-2501}, url = {https://doi.org/10.1007/s13755-024-00303-9}, }

theses

2024

M.Sc. Thesis
MindWell: an open-source chatbot to assist in the detection and monitoring of depressive disorders

Eliseo Bao

Feb 2024

Abs Bib PDF Code Slides

Social media users often perceive these platforms as supportive spaces in which to expose, comment and disclose their daily problems, and thus their activity can provide clues to their mental health status. Research on Information Retrieval (IR), Natural Language Processing (NLP) and Machine Learning (ML) has recently used this online information to develop screening models that aim to identify at-risk individuals on platforms such as Twitter, Reddit or Facebook. Recently, research has highlighted the importance of using clinical markers, such as the use of validated symptoms, to improve health professionals’ confidence in computational models. This work presents a open source chatbot designed as an assistant that provides explanations, aligned with validated clinical markers, for the presence of depressive symptoms in social media posts, taking into account the temporality of these symptoms. The aim is to develop a tool that includes the necessary functionalities to provide the abovementioned benefits. Following this approach, it is possible to provide professionals with a support tool that relieves them of the tedious and time-consuming task of manually reviewing a subject’s posting history. We evaluated our proposal using expert knowledge to measure the quality and applicability of the chatbot’s explanations to the real clinical setting, and the results demonstrated the usefulness of the system for generating analyses based on the subject’s feelings.
@article{bao2024mindwell, title = {MindWell: an open-source chatbot to assist in the detection and monitoring of depressive disorders}, author = {Bao, Eliseo}, year = {2024}, month = feb, publisher = {Universidade da Coruña}, url = {http://hdl.handle.net/2183/35971}, }

2022

B.Sc. Thesis
Ranking of Reddit users using Relevance Models for depressive disorders

Eliseo Bao

Jul 2022

Abs Bib PDF Code Slides

Depressive disorders are one of the most common groups of illnesses in the world. Although it is true that effective treatments exist, either due to the lack of resources or the stigma that is still associated, in many cases the consequences for those suffering from this type of disorders are devastating. Knowing that the language manifested by people suffering from this type of diseases can denote evidence of their mental health, the aim of this project is to exploit the possibilities of Relevance-Based Language Models to be used for early detection. Specifically, taking CLEF eRisk collections as a starting point, the goal is to build depression vocabularies. These vocabularies identify terms of weight and relevance in people with depressive tendencies, and must undergo phases of evaluation and comparison with other validated lexicons. In addition, we focus in being able to perform ranking, i.e., from texts written by a number of people, to establish a ranking for them according to the possible degree of depression. For the management of the project, an agile methodology has been used, so that it has been possible to adapt the project according to the results obtained in the experimentation. Satisfactory results have been achieved, especially in terms of ranking, as well as new avenues for experimentation and expansion have been set.
@article{bao2022ranking, title = {Ranking of Reddit users using Relevance Models for depressive disorders}, author = {Bao, Eliseo}, year = {2022}, month = jul, publisher = {Universidade da Coruña}, url = {http://hdl.handle.net/2183/31267}, }