I am a researcher in natural language processing (NLP) in the Lattice team (CNRS). I got my PhD in 2019 from the university of Lille (France). I prepared it in the Magnet team at Inria Lille, under the supervision of Pascal Denis and Marc Tommasi. I then did a post-doc of 18 months at the university of A Coruña (Spain) in the FastParse team of Carlos Gómez-Rodríguez until december 2020.
Research Interests
I am interested in several aspects of natural languages. Most of my work has focused on syntactic analysis in a multilingual environment. But I am also interested in morphological analysis, low resource languages (with big populations or with very small ones), historical linguistics, epigraphy and cross-lingual annotation consistency, amongst other.
I am also interested in computability and complexity theory and related areas of computer science and mathematics.
In Progress
I am the main annotator behind the 𐌉𐌊𐌖𐌅𐌉𐌍𐌀 dependency treebank, a Universal Dependencies treebank for Umbrian.
Internships
I am always happy to have interns. I have a running language evolution project that is currently occupied but might eventually reopen, see this for more details.
Publications
2023
-
EvoSem: A database of polysemous cognate sets.
Mathieu Dehouck, Alexandre François, David Kletz, Siva Kalyan, Martial Pastor.
4th Workshop on Computational Approaches to Historical Language Change (LChange’23), Dec 2023, Singapore (SG), Singapore. [ACL] [HAL] -
Challenging the “One Single Vector per Token” Assumption.
Mathieu Dehouck.
The SIGNLL Conference on Computational Natural Language Learning, Dec 2023, Singapore, Singapore.
[ACL] [HAL] -
Génération automatique de jeux de mots à base de prénoms.
Mathieu Dehouck et Marine Delaborde.
CORIA-TALN 2023, Paris, France.
[TALN]
2022
- The 𐌉𐌊𐌖𐌅𐌉𐌍𐌀 Treebank.
Mathieu Dehouck.
LT4HALA (LREC 2022 workshop), Marseille, France.
[lrec]
2021
-
A Falta de Pan, Buenas Son Tortas: The Efficacy of Predicted UPOS Tags for Low Resource UD Parsing.
Mark Anderson, Mathieu Dehouck and Carlos Gómez-Rodríguez.
IWPT 2021, Covid-19 virtual venue.
[arXiv] -
Revisiting modal sense classification with state-of-the-art language models.
Mathieu Dehouck and Pascal Denis.
ISLE 6 workshops: Rethinking English modal constructions: From feature-based paradigms to usage-based probabilistic representations, June 2021, Covid-19 virtual venue. (abstract)
[isle] -
La phylogénie des langues au service de l’analyse automatique.
Mathieu Dehouck and Pascal Denis.
La lettre de l’InSHS, n. 69, January 2021.
[cnrs]
2020
-
Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages.
Mathieu Dehouck and Carlos Gómez-Rodríguez.
Coling 2020, Covid-19 virtual venue.
[acl] -
Efficient EUD Parsing.
Mark Anderson, Mathieu Dehouck and Carlos Gómez-Rodríguez.
IWPT 2020 shared task, Covid-19 virtual venue.
[iwpt]
2019
-
Phylogenetic Multi-Lingual Dependency Parsing.
Mathieu Dehouck and Pascal Denis.
NAACL 2019, Minneapolis, USA.
[acl] -
Modal sense classification with task-specific context embeddings.
Bo Li, Mathieu Dehouck and Pascal Denis.
ESANN 2019, Bruges, Belgium.
[esann]
2018
- A Framework for Understanding the Role of Morphology in Universal Dependency Parsing.
Mathieu Dehouck and Pascal Denis.
EMNLP 2018, Brussels, Belgium.
[acl]
2017
-
Delexicalized Word Embeddings for Cross-lingual Dependency Parsing.
Mathieu Dehouck and Pascal Denis.
EACL 2017, Valencia, Spain.
[acl] -
Learning Morpho-Syntactic Attributes Representation for Cross-Lingual Dependency Parsing.
Mathieu Dehouck and Pascal Denis.
CLIN27, Leuven, Belgium. (abstract)
[clin]
2013
- Pragmatic Visualizations for Roassal: a Florilegium.
Mathieu Dehouck, Usman Bhatti, Alexandre Bergel, and Stéphane Ducasse.
IWST 2013, Annecy, France.
[hal]
Thesis
- Multi-Lingual Dependency Parsing : Word Representation and Joint Training for Syntactic Analysis.
Mathieu Dehouck.
PhD thesis, 2019.
[theses]
Chapters in books
- Models of Modals: From Pragmatics and Corpus Linguistics to Machine Learning.
Ilse Depraetere, Bert Cappelle, Martin Hilpert, Ludovic De Cuypere, Mathieu Dehouck, Pascal Denis, Susanne Flach, Natalia Grabar, Cyril Grandin, Thierry Hamon, Clemens Hufeld, Benoît Leclercq, and Hans-Jörg Schmid.
De Gruyter Mouton, 2023.
[De Gruyter Mouton]
Communications
2022
- Profiterole : un corpus morpho-syntaxique et syntaxique de français médiéval.
Sophie Prévost, Loïc Grobol, Mathieu Dehouck, Alexey Lavrentev and Serge Heiden.
Concordial 2022, Grenoble, France.
2021
- Multi-Lingual Dependency Parsing : Word Representation and Joint Training for Syntactic Analysis.
TALN-RECITAL 2021, Lille, France. (CoVid 2019 - Online).
Invited speaker as recipient of the ATALA thesis of the year award 2020.
Teaching
-
Université Sorbonne Nouvelle (Paris):
Programmation et Algorithmique. (L8DN003). (2023-2024)
Traitement automatique du langage. (LZSY005) (2021-2022, 2022-2023, 2023-2024)
Traitement automatique du langage (ENEAD). (LYSY009) (2022-2023) -
Centrale Lille, Mines de Douai, Université de Lille (Villeneuve d’Ascq):
Natural Language Processing. (2021-2022, 2022-2023, 2023-2024) -
Université de Lille (Villeneuve d’Ascq):
Foundation of Computation (3rd year Bachelor degree) (Fall 2017).
Projects in Computer Science for Humanities (2nd year Bachelor degree) (Fall 2016, Fall 2017).
Principles of Network Technologies (1nd year Bachelor degree) (Spring 2017, Spring 2018).
Computer Science and Technologies (2nd year Bachelor degree) (Spring 2017).