Lucía Ormaechea Grijalba

About Me.

Ph.D. Candidate, Computational Linguist & Researcher in NLP

Welcome to my website 👋 My name is Lucía and I am a Researcher in NLP and Ph.D. Candidate in Multilingual Information Processing at University of Geneva (FTI/TIM) and Grenoble Computer Science Laboratory (LIG/GETALP), as part of ANR/FNS PROPICTO project.

I hold a B.A. in Hispanic Philology from University of Navarre (Pamplona, Spain) and a M.Sc. in Natural Language Processing from Institut National des Langues et Civilisations Orientales (Paris, France). My main research focuses on exploring Automatic Spoken Language Simplification in Low-Resource Conditions.

Feel free to contact me for any further information 😀



Personal Information

  • NameLucía
  • Last NameOrmaechea Grijalba
  • FromPamplona, Spain
  • ResidenceGeneva, Switzerland

Research Interests

Speech Recognition

Text Simplification

Multimodal Systems

Resume.

Education

  • Ph.D. – Multilingual Information Processing (FNS Candoc Fellow)

    University of Geneva & University of Grenoble-Alpes | Geneva, Switzerland

    Currently pursuing a joint Ph.D. between the Departement of Translation Technology at University of Geneva, and the GETALP research team, at University of Grenoble-Alpes.

    My work falls within PROPICTO, a research project that aims to create Speech-to-Pictograph cross-modal translation systems, with a special focus on French as an input language.


    Ph.D. Project: Exploring Automatic Spoken Language Simplification in Low-Resource Conditions.

    Present 06.2022
  • Master of Science – Natural Language Processing

    Inalco & University Sorbonne-Nouvelle & University Paris Nanterre | Paris, France

    Graduated with High Honors.


    Coursework: Scripting (Bash, Python, Perl) – Object-Oriented Programming (C++, Java) – Statistical Methods for Corpus Exploitation – Text Mining – Convolutional Neural Networks for Language Identification – Corpus Linguistics – Mark-up Languages (XML, XSLT) – Databases (SQL, Neo4J).

    Master's Thesis: "Mise en place d'un système robuste de reconnaissance automatique de la parole appliqué au domaine médical". GPA: 19/20.

    09.2020 09.2018
  • Bachelor of Arts – Hispanic Philology

    University of Navarre | Pamplona, Spain

    Extraordinary End-of-Degree Award Nominee.


    Coursework: Phonetics and Phonology – Lexicology and Semantics – Sociolinguistics and Dialectal Variation – Discourse Analysis – Morphology and Syntax.

    06.2018 09.2014

Experience

  • Research & Teaching Assistant

    University of Geneva | Geneva, Switzerland

    Contributed to BabelDr:
    • Developed a specialized Automatic Speech Recognition (ASR) system.
    • Deployed a Docker application to perform ASR inside the BabelDr medical translation device.

    Participated in PROPICTO:
    • Compilation of complex-simple sentence pairs from comparable corpora.
    • Front-end development and maintenance of the project's website.


    Collaborated as an assistant in Localization (M.A. Course): preparation of course materials – assignment grading – support to students during practical sessions.

    Present 12.2020
  • NLP Research Intern

    Grenoble Computer Science Lab. | Grenoble, France

    Participation within the BabelDr project: development of an Automatic Speech Recognition (ASR) system for medical-related applications.


    Main tasks:
    • Created a complete pipeline for injecting grammar-based language models into the Kaldi Speech Processing Toolkit.
    • Containerized ASR-related tools using Docker.
    • Trained HMM-DNN-based acoustic models using open-source corpora.
    • Developed a Kaldi web server API for ASR applications.
    • Conducted a prototype testing and evaluation.

    09.2020 02.2020
  • Translation Intern

    New York Habitat | Remote Working

    ENG > ESP translation of commercial texts: video transcriptions – travel articles – apartment reviews – client testimonials.

    09.2018 07.2018
  • Undergraduate Research Assistant

    University of Navarre | Pamplona, Spain

    Collaborated as a research student at the Department of Philology.

    Main tasks:
    • Development of educational materials aiming to automatically evaluate knowledge of Spanish language in L2 learners.
    • One-on-one tutorial assessment on academic writing.
    • Targeted classes to non-native Spanish speakers.
    • Proofreading of academic papers, document classification.

    06.2017 09.2016

Languages

Spanish

100%

English

95%

French

95%

Italian

33%

Skills

Programming

  • Python
  • Bash
  • Perl
  • C++
  • Java
  • SQL

Libraries

  • OpenFST
  • Keras
  • Pandas
  • NLTK
  • SpaCy

Web dev

  • HTML
  • CSS
  • Jekyll
  • Flask
  • XML
  • XSLT

Tools

  • Kaldi
  • Git
  • LaTeX
  • Docker
  • Praat
  • SRILM

Publications.



Conference Papers

Simplification Strategies in French Spontaneous Speech

Lucía Ormaechea, Nikos Tsourakis, Didier Schwab, Pierrette Bouillon and Benjamin Lecouteux.
In: Proceedings of the 1st Workshop on Evaluating Text Difficulty in a Multilingual Context (DeTermIt) within the Joint International Conference LREC-COLING , Torino (Italy).
To appear



Simple, Simpler and Beyond: A Fine-Tuning BERT-Based Approach to Enhance Sentence Complexity Assessment for Text Simplification

Lucía Ormaechea, Nikos Tsourakis, Didier Schwab, Pierrette Bouillon and Benjamin Lecouteux.
In: Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNSLP), Trento (Italy).
December 2023



Extracting Sentence Simplification Pairs from French Comparable Corpora Using a Two-Step Filtering Method

Lucía Ormaechea and Nikos Tsourakis.
In: Proceedings of the 8th Swiss Text Analytics Conference 2023 (SwissText), Neuchâtel (Switzerland).
June 2023



PROPICTO: Developing Speech-to-Pictograph Translation Systems to Enhance Communication Accessibility

Lucía Ormaechea, Pierrette Bouillon, Maximin Coavoux, Emmanuelle Esperança-Rodier, Johanna Gerlach, Jerôme Goulian, Benjamin Lecouteux, Cécile Macaire, Jonathan Mutal, Magali Norré, Adrien Pupier and Didier Schwab.
In: Proceedings of the 24th Annual Conference of The European Association for Machine Translation (EAMT), Tampere (Finland).
June 2023



Une chaîne de traitements pour la simplification automatique de la parole et sa traduction automatique vers des pictogrammes

Cécile Macaire, Lucía Ormaechea Grijalba and Adrien Pupier.
In: 29ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), Avignon (France).
June 2022



Presentations

Vers une simplification automatique de la parole en français : Les enjeux de l’extraction des données d’apprentissage pour la simplification linguistique

Lucía Ormaechea, Pierrette Bouillon, Benjamin Lecouteux and Didier Schwab.
In: Colloque de l'Association for French Language Studies (AFLS) — Le français et ses frontières, Lille (France).
September 2023



PROPICTO : Développer des systèmes de traduction de la parole vers des séquences de pictogrammes pour améliorer l'accessibilité de la communication

Lucía Ormaechea, Pierrette Bouillon, Maximin Coavoux, Emmanuelle Esperança-Rodier, Johanna Gerlach, Jerôme Goulian, Benjamin Lecouteux, Cécile Macaire, Jonathan Mutal, Magali Norré, Adrien Pupier, Didier Schwab and Hervé Spechbach.
In: 30ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), Paris (France).
June 2023



A Tool for Easily Integrating Grammars as Language Models into the Kaldi Speech Recognition Toolkit

Lucía Ormaechea Grijalba, Benjamin Lecouteux, Pierrette Bouillon and Didier Schwab.
In: Bridges and Gaps between Formal and Computational Linguistics (ESSLLI 2022 workshop), Galway (Ireland).
August 2022



Reconnaissance vocale du discours spontané pour le domaine médical

Lucía Ormaechea Grijalba, Pierrette Bouillon, Johanna Gerlach, Benjamin Lecouteux, Didier Schwab and Hervé Spechbach.
In: Journée Commune AFIA/TLH: Technologies du Langage Humain et Santé (Remote Event).
February 2021



Posters

Integrating Grammar-Based Language Models into Domain-Specific Speech Recognition Systems

Lucía Ormaechea Grijalba
In: Second Advanced Language Processing School (ALPS), co-organized by Univ. Grenoble-Alpes and Naver Labs Europe (Remote Event).
January 2022



Master's Thesis

Mise en place d'un système robuste de reconnaissance automatique de la parole appliqué au domaine médical

Lucía Ormaechea Grijalba
September 2020


Terminal.