Locating Language-Specific Information in Contextualized Embeddings

Liang, Sheng; Dufter, Philipp; Schütze, Hinrich

Computer Science > Computation and Language

arXiv:2109.08040 (cs)

[Submitted on 16 Sep 2021]

Title:Locating Language-Specific Information in Contextualized Embeddings

Authors:Sheng Liang, Philipp Dufter, Hinrich Schütze

View PDF

Abstract:Multilingual pretrained language models (MPLMs) exhibit multilinguality and are well suited for transfer across languages. Most MPLMs are trained in an unsupervised fashion and the relationship between their objective and multilinguality is unclear. More specifically, the question whether MPLM representations are language-agnostic or they simply interleave well with learned task prediction heads arises. In this work, we locate language-specific information in MPLMs and identify its dimensionality and the layers where this information occurs. We show that language-specific information is scattered across many dimensions, which can be projected into a linear subspace. Our study contributes to a better understanding of MPLM representations, going beyond treating them as unanalyzable blobs of information.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2109.08040 [cs.CL]
	(or arXiv:2109.08040v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2109.08040

Submission history

From: Sheng Liang [view email]
[v1] Thu, 16 Sep 2021 15:11:55 UTC (77 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2109

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Philipp Dufter
Hinrich Schütze

export BibTeX citation

Computer Science > Computation and Language

Title:Locating Language-Specific Information in Contextualized Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Locating Language-Specific Information in Contextualized Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators