当前位置: X-MOL 学术Concurr. Comput. Pract. Exp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient computation of positional population counts using SIMD instructions
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2021-05-03 , DOI: 10.1002/cpe.6304
Marcus D. R. Klarqvist 1, 2 , Wojciech Muła 3 , Daniel Lemire 4
Affiliation  

In several fields such as statistics, machine learning, and bioinformatics, categorical variables are frequently represented as one-hot encoded vectors. For example, given eight distinct values, we map each value to a byte where only a single bit has been set. We are motivated to quickly compute statistics over such encodings. Given a stream of k-bit words, we seek to compute k distinct sums corresponding to bit values at indexes 0, 1, 2, …, k − 1. If the k-bit words are one-hot encoded then the sums correspond to a frequency histogram. This multiple-sum problem is a generalization of the population-count problem where we seek the sum of all bit values. Accordingly, we refer to the multiple-sum problem as a positional population-count. Using SIMD (Single Instruction, Multiple Data) instructions from recent Intel processors, we describe algorithms for computing the 16-bit position population count using less than half of a CPU cycle per 16-bit word. Our best approach uses up to 400 times fewer instructions and is up to 50 times faster than baseline code using only regular (non-SIMD) instructions, for sufficiently large inputs.

中文翻译:

使用 SIMD 指令有效计算位置人口计数

在统计学、机器学习和生物信息学等多个领域,分类变量经常表示为单热编码向量。例如,给定八个不同的值,我们将每个值映射到一个字节,其中只设置了一个位。我们有动力快速计算此类编码的统计数据。给定一个k位字流,我们寻求计算 对应于索引 0, 1, 2, …, k  − 1处的位值的k 个不同的和。 如果k位字是单热编码的,则和对应于频率直方图。这个多重和问题是人口计数问题的推广,我们寻求所有位值的总和。因此,我们将多重和问题称为位置人口计数。使用来自最新英特尔处理器的 SIMD(单指令多数据)指令,我们描述了使用每个 16 位字不到一半的 CPU 周期来计算 16 位位置填充计数的算法。对于足够大的输入,我们的最佳方法使用的指令最多减少 400 倍,比仅使用常规(非 SIMD)指令的基线代码快 50 倍。
更新日期:2021-05-03
down
wechat
bug