当前位置: X-MOL 学术Int. J. Lexicogr. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Collaborative Construction of a Good Quality, Broad Coverage and Copyright Free Japanese-French Dictionary
International Journal of Lexicography ( IF 0.652 ) Pub Date : 2016-09-10 , DOI: 10.1093/ijl/ecw035
Mathieu Mangeot-Nagata

Although French and Japanese are regarded as well-resourced languages concerning tools and linguistic resources, the French-Japanese couple is considered an under-resourced language pair regarding its availability on the Web. Indeed, there are few bilingual electronic lexical resources of quality and which are both royalty and copyright free. French-Japanese bilingual aligned corpora and machine translation systems are logically equally rare. Fortunately, there are printed French-Japanese dictionaries of good quality and which are sufficiently old to be royalty-free. It should be possible to reuse these resources as part of our project to build a good quality and broad coverage dictionary available on the Web. In order to update this data whose vocabulary might be old, we could reuse existing electronic resources such as Wikipedia or Japanese-English electronic resources. The resulting resource could be then available on the Web for lookup and correction by voluntary contributors. This methodology could be applied to other language couples in a similar situation with good printed dictionaries but few electronic resources. We first conduct an inventory of Japanese bilingual dictionaries (printed or electronic) with their historical evolution. Then, we describe the resource we want to build. The next part concerns the conversion of three resources: the Cesselin Japanese-French printed dictionary, the language links between Japanese, French and English Wikipedia pages and the JMdict Japanese-English electronic dictionary. The Cesselin dictionary has been scanned, OCRized and parsed to detect headwords and entries. Then several error correction were performed on French and Japanese. New entries were created from Wikipedia links and finally, missing JMdict dictionary entries missing in the result resource were converted and added. Finally, we released the resource on a Web site built around the Jibiki platform allowing articles to be viewed and edited online. A French-Japanese bilingual corpus and an active reading moduel are also available. The resulting resources (dictionaries and corpora) are available for download on the project website. The data is released under public domain.

中文翻译:

协作构建优质、覆盖面广且无版权的日法词典

尽管法语和日语在工具和语言资源方面被认为是资源丰富的语言,但就其在 Web 上的可用性而言,法语-日语对被认为是资源不足的语言对。事实上,很少有高质量的双语电子词汇资源,而且是免版税和免版权的。法日双语对齐语料库和机器翻译系统在逻辑上同样罕见。幸运的是,印刷的法日词典质量上乘,而且年代久远,可以免版税。作为我们项目的一部分,应该可以重用这些资源,以构建一个在 Web 上可用的高质量和广泛覆盖的字典。为了更新这些词汇可能很旧的数据,我们可以重复使用现有的电子资源,例如维基百科或日英电子资源。然后可以在 Web 上获得由此产生的资源,供自愿贡献者查找和更正。这种方法可以应用于具有类似情况的其他语言对,这些语言对印刷好的字典很好,但电子资源很少。我们首先对日语双语词典(印刷版或电子版)及其历史演变进行了盘点。然后,我们描述我们想要构建的资源。下一部分涉及三种资源的转换:Cesselin 日法印刷词典、日法英维基百科页面之间的语言链接以及 JMdict 日英电子词典。Cesselin 词典已经过扫描、OCR 化和解析,以检测词条和条目。然后对法语和日语进行了几次纠错。新条目是从维基百科链接创建的,最后,结果资源中缺少的 JMdict 字典条目被转换和添加。最后,我们在围绕 Jibiki 平台构建的网站上发布了资源,允许在线查看和编辑文章。还提供法日双语语料库和主动阅读模块。生成的资源(词典和语料库)可在项目网站上下载。数据在公共领域发布。我们在围绕 Jibiki 平台构建的网站上发布了该资源,允许在线查看和编辑文章。还提供法日双语语料库和主动阅读模块。生成的资源(词典和语料库)可在项目网站上下载。数据在公共领域发布。我们在围绕 Jibiki 平台构建的网站上发布了该资源,允许在线查看和编辑文章。还提供法日双语语料库和主动阅读模块。生成的资源(词典和语料库)可在项目网站上下载。数据在公共领域发布。
更新日期:2016-09-10
down
wechat
bug