Dagstuhl Seminar: Universals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics


from June 22, 2020 to June 27, 2020

  • 2020, Schloss Dagstuhl, Wadern, Germany

Timothy Baldwin (University of Melbourne, Australia)
William Croft (University of New Mexico, Albuquerque, USA)
Joakim Nivre (Uppsala University, Sweden)
Agata Savary (University of Tours, France)

Computational linguistics builds models that can usefully process and produce language and that can increase our
understanding of linguistic phenomena. From a computational perspective, language is particularly challenging
notably due to its variable degree of ​ idiosyncrasy (unexpected properties shared by few peer objects), and the
pervasiveness of ​ non-compositional phenomena such as ​ multiword expressions (whose meaning cannot be
straightforwardly deduced from the meanings of their components, e.g. ​ red tape ​ , ​ by and large ​ , ​ to pay a visit and
to pull one’s leg ​ ) and ​ constructions (conventional associations of forms and meanings). Additionally, if models and
methods are to be consistent and valid ​ across languages​ , they have to face specificities inherent either to
particular languages, or to various linguistic traditions.

A few existing initiatives, such as ​ Universal Dependencies​ , 1 ​ PARSEME 2 and ​ UniMorph​ , 3 have been addressing
these challenges with the aim of revealing the universals of idiosyncrasy in language, proposing cross-lingually
applicable typologies and methodologies for language modelling, and creating highly multilingual language
resources and tools. These efforts have been carried on relatively independently, resulting in partly diverging
terminologies and methods.

The objectives of this seminar are threefold:
- Theoretical: To deepen the understanding of ​ language universals​ , and of how they apply to linguistic
idiosyncrasy, so as to further promote unified modelling while preserving diversity.
- Practical: To improve the treatment of ​ idiosyncrasy in ​ treebanking frameworks, in ​ computationally
tractable ways and, thus, to foster high quality NLP tools for more languages with greater typological
- Networking: To promote a higher degree of convergence across typology-driven initiatives, while focusing
on three main aspects of language modelling: morphology, syntax, and semantics.

The expected outcomes of the seminar include: (i) enhanced ​ unified ​ versions of the already existing ​ annotation
guidelines ​ put forward by UD, PARSEME and UniMorph, (ii) criteria for applying unified guidelines to ​ specific
languages​ , (iii) recommendations on syntactic and semantic representation of ​ MWEs in lexicons, ​ and ​ (iv)
recommendations on how to cover ​ grammatical constructions​ within treebanking frameworks and NLP tools.

The list of invitees includes researchers in NLP, linguistics and typology, with expertise in morphology, syntax,
semantics, MWEs, constructions, annotation, parsing, and dozens of languages from diverse language families.
They are based in 22 countries, spread across 4 continents.

For more details about this event, visit its wiki pages at ​​ .