I am a researcher in natural language processing (NLP) in the Lattice team (CNRS). I got my PhD in 2019 from the university of Lille (France). I prepared it in the Magnet team at Inria Lille, under the supervision of Pascal Denis and Marc Tommasi. I then did a post-doc of 18 months at the university of A Coruña (Spain) in the FastParse team of Carlos Gómez-Rodríguez until december 2020.
Research Interests
I am interested in several aspects of natural languages. Most of my work has focused on syntactic analysis in a multilingual environment. But I am also interested in morphological analysis, low resource languages (with big populations or with very small ones), historical linguistics, epigraphy and cross-lingual annotation consistency, amongst other.
I am also interested in computability and complexity theory and related areas of computer science and mathematics.
In Progress
I am the main annotator behind the 𐌉𐌊𐌖𐌅𐌉𐌍𐌀 dependency treebank, a Universal Dependencies treebank for Umbrian.
I have a project about oïl languages that aims at understanding the role of neighbouring languages (Germanic, Celtic and other Romance languages) on the syntactic evolution of these languages from the northern half of France, Belgium, Luxembourg, Switzerland and the Channel islands.
Internships
I am always happy to have interns. I have a running language evolution project that is currently occupied but might eventually reopen, see this for more details.
Publications
2025
-
Comparative Concepts or Descriptive Categories: a UD Case study.
Matthieu Pierre Boyer, Mathieu Dehouck.
The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), Tallinn, Estonia.
[ACL] -
Lattice @MultiGEC-2025: A Spitful Multilingual Language Error Correction System Using LLaMA.
Olga Seminck, Yoann Dupont, Mathieu Dehouck, Qi Wang, Noé Durandard, Margo Novikov.
The 14th Workshop on Natural Language Processing for Computer Assisted Language Learning.
(NoDaLiDa/Baltic-HLT 2025), Tallinn, Estonia.
[ACL] -
Rule-based Approaches to the Automatic Generation of Puns Based on Given Names in French.
Mathieu Dehouck and Marine Delaborde.
The 1st Workshop on Computational Humor (CHum 2025), Online, collocated with The 31st International Conference on Computational Linguistics (COLING 25), Abu Dhabi, United Arab Emirates.
[ACL]
2024
- Profiterole : un corpus morpho-syntaxique et syntaxique de français médiéval.
Sophie Prévost, Loïc Grobol, Mathieu Dehouck, Alexei Lavrentiev, Serge Heiden.
Corpus, 2024, La constitution de corpus en diachronie longue. Méthodologies, objectifs et exploitations linguistiques et stylistiques.
[Corpus]
2023
-
EvoSem: A database of polysemous cognate sets.
Mathieu Dehouck, Alexandre François, David Kletz, Siva Kalyan, Martial Pastor.
4th Workshop on Computational Approaches to Historical Language Change (LChange’23), Dec 2023, Singapore (SG), Singapore. [ACL] [HAL] -
Challenging the “One Single Vector per Token” Assumption.
Mathieu Dehouck.
The SIGNLL Conference on Computational Natural Language Learning, Dec 2023, Singapore, Singapore.
[ACL] [HAL] -
Génération automatique de jeux de mots à base de prénoms.
Mathieu Dehouck et Marine Delaborde.
CORIA-TALN 2023, Paris, France.
[TALN]
2022
- The 𐌉𐌊𐌖𐌅𐌉𐌍𐌀 Treebank.
Mathieu Dehouck.
LT4HALA (LREC 2022 workshop), Marseille, France.
[lrec]
2021
-
A Falta de Pan, Buenas Son Tortas: The Efficacy of Predicted UPOS Tags for Low Resource UD Parsing.
Mark Anderson, Mathieu Dehouck and Carlos Gómez-Rodríguez.
IWPT 2021, Covid-19 virtual venue.
[arXiv] -
Revisiting modal sense classification with state-of-the-art language models.
Mathieu Dehouck and Pascal Denis.
ISLE 6 workshops: Rethinking English modal constructions: From feature-based paradigms to usage-based probabilistic representations, June 2021, Covid-19 virtual venue. (abstract)
[isle] -
La phylogénie des langues au service de l’analyse automatique.
Mathieu Dehouck and Pascal Denis.
La lettre de l’InSHS, n. 69, January 2021.
[cnrs]
2020
-
Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages.
Mathieu Dehouck and Carlos Gómez-Rodríguez.
Coling 2020, Covid-19 virtual venue.
[acl] -
Efficient EUD Parsing.
Mark Anderson, Mathieu Dehouck and Carlos Gómez-Rodríguez.
IWPT 2020 shared task, Covid-19 virtual venue.
[iwpt]
2019
-
Phylogenetic Multi-Lingual Dependency Parsing.
Mathieu Dehouck and Pascal Denis.
NAACL 2019, Minneapolis, USA.
[acl] -
Modal sense classification with task-specific context embeddings.
Bo Li, Mathieu Dehouck and Pascal Denis.
ESANN 2019, Bruges, Belgium.
[esann]
2018
- A Framework for Understanding the Role of Morphology in Universal Dependency Parsing.
Mathieu Dehouck and Pascal Denis.
EMNLP 2018, Brussels, Belgium.
[acl]
2017
-
Delexicalized Word Embeddings for Cross-lingual Dependency Parsing.
Mathieu Dehouck and Pascal Denis.
EACL 2017, Valencia, Spain.
[acl] -
Learning Morpho-Syntactic Attributes Representation for Cross-Lingual Dependency Parsing.
Mathieu Dehouck and Pascal Denis.
CLIN27, Leuven, Belgium. (abstract)
[clin]
2013
- Pragmatic Visualizations for Roassal: a Florilegium.
Mathieu Dehouck, Usman Bhatti, Alexandre Bergel, and Stéphane Ducasse.
IWST 2013, Annecy, France.
[hal]
Thesis
- Multi-Lingual Dependency Parsing : Word Representation and Joint Training for Syntactic Analysis.
Mathieu Dehouck.
PhD thesis, 2019.
[theses]
Chapters in books
- Models of Modals: From Pragmatics and Corpus Linguistics to Machine Learning.
Ilse Depraetere, Bert Cappelle, Martin Hilpert, Ludovic De Cuypere, Mathieu Dehouck, Pascal Denis, Susanne Flach, Natalia Grabar, Cyril Grandin, Thierry Hamon, Clemens Hufeld, Benoît Leclercq, and Hans-Jörg Schmid.
De Gruyter Mouton, 2023.
[De Gruyter Mouton]
Communications
2024
- Comparaison de deux approches pour l’analyse syntaxique du français et du latin en diachronie.
Mathieu Dehouck, Sophie Prévost, Mathilde Regnault, Loïc Grobol.
Concordial 2024, Lyon, France.
2022
- Profiterole : un corpus morpho-syntaxique et syntaxique de français médiéval.
Sophie Prévost, Loïc Grobol, Mathieu Dehouck, Alexey Lavrentev and Serge Heiden.
Concordial 2022, Grenoble, France.
2021
- Multi-Lingual Dependency Parsing : Word Representation and Joint Training for Syntactic Analysis.
TALN-RECITAL 2021, Lille, France. (CoVid 2019 - Online).
Invited speaker as recipient of the ATALA thesis of the year award 2020.
Teaching
-
Université Sorbonne Nouvelle (Paris):
Programmation et Algorithmique. (L8DN003). (2023-2024, 2024-2025)
Traitement automatique du langage. (LZSY005) (2021-2022, 2022-2023, 2023-2024)
Traitement automatique du langage (ENEAD). (LYSY009) (2022-2023) -
PSL, CPES Science des données, art et culture (Paris):
Traitement automatique du langage et littérature. (2024-2025) -
Centrale Lille, Mines de Douai, Université de Lille (Villeneuve d’Ascq):
Natural Language Processing. (2021-2022, 2022-2023, 2023-2024) -
Université de Lille (Villeneuve d’Ascq):
Foundation of Computation (3rd year Bachelor degree) (Fall 2017).
Projects in Computer Science for Humanities (2nd year Bachelor degree) (Fall 2016, Fall 2017).
Principles of Network Technologies (1nd year Bachelor degree) (Spring 2017, Spring 2018).
Computer Science and Technologies (2nd year Bachelor degree) (Spring 2017).