当前位置: X-MOL 学术Mathematics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison of Entropy and Dictionary Based Text Compression in English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian
Mathematics ( IF 2.3 ) Pub Date : 2020-07-01 , DOI: 10.3390/math8071059
Matea Ignatoski , Jonatan Lerga , Ljubiša Stanković , Miloš Daković

The rapid growth in the amount of data in the digital world leads to the need for data compression, and so forth, reducing the number of bits needed to represent a text file, an image, audio, or video content. Compressing data saves storage capacity and speeds up data transmission. In this paper, we focus on the text compression and provide a comparison of algorithms (in particular, entropy-based arithmetic and dictionary-based Lempel–Ziv–Welch (LZW) methods) for text compression in different languages (Croatian, Finnish, Hungarian, Czech, Italian, French, German, and English). The main goal is to answer a question: ”How does the language of a text affect the compression ratio?” The results indicated that the compression ratio is affected by the size of the language alphabet, and size or type of the text. For example, The European Green Deal was compressed by 75.79%, 76.17%, 77.33%, 76.84%, 73.25%, 74.63%, 75.14%, and 74.51% using the LZW algorithm, and by 72.54%, 71.47%, 72.87%, 73.43%, 69.62%, 69.94%, 72.42% and 72% using the arithmetic algorithm for the English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian versions, respectively.

中文翻译:

基于熵和字典的英语,德语,法语,意大利语,捷克语,匈牙利语,芬兰语和克罗地亚语文本压缩的比较

数字世界中数据量的快速增长导致对数据压缩的需求等等,从而减少了表示文本文件,图像,音频或视频内容所需的位数。压缩数据可节省存储容量并加快数据传输速度。在本文中,我们专注于文本压缩,并提供了不同语言(克罗地亚语,芬兰语,匈牙利语)的文本压缩算法(特别是基于熵的算术和基于字典的Lempel-Ziv-Welch(LZW)方法)的比较。 ,捷克语,意大利语,法语,德语和英语)。主要目标是回答一个问题:“文本的语言如何影响压缩率?” 结果表明,压缩率受语言字母的大小以及文本的大小或类型的影响。例如,
更新日期:2020-07-01
down
wechat
bug