skip to main content
research-article
Open Access

Polynomial Evaluation on Superscalar Architecture, Applied to the Elementary Function ex

Authors Info & Claims
Published:15 September 2020Publication History
Skip Abstract Section

Abstract

The evaluation of small degree polynomials is critical for the computation of elementary functions. It has been extensively studied and is well documented. In this article, we evaluate existing methods for polynomial evaluation on superscalar architecture. In addition, we have completed this work with a factorization method, which is surprisingly neglected in the literature. This work focuses on out-of-order Intel processors, amongst others, of which computational units are available. Moreover, we applied our work on the elementary function ex that requires, in the current implementation, an evaluation of a polynomial of degree 10 for a satisfying precision and performance. Our results show that the factorization scheme is the fastest in benchmarks, and that latency and throughput are intrinsically dependent on each other on superscalar architecture.

References

  1. Muhammad Abbas and Oscar Gustafsson. 2011. Computational and implementation complexity of polynomial evaluation schemes. In Proceedings of the NORCHIP Conference. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  2. George A. Baker and Peter Graves-Morris. 1996. Padé Approximants (2nd ed.). Cambridge University Press. DOI:https://doi.org/10.1017/CBO9780511530074Google ScholarGoogle Scholar
  3. Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc. 2018. Autotuning in high-performance computing applications. Proc. IEEE 99 (2018), 1--16.Google ScholarGoogle Scholar
  4. M. Boersma, M. Kroner, C. Layer, P. Leber, S. M. Muller, and K. Schelm. 2011. The POWER7 binary floating-point unit. In Proceedings of the 20th IEEE Symposium on Computer Arithmetic (ARITH’11). 87--91.Google ScholarGoogle Scholar
  5. T. Agerwala and J. Cocke. 1987. High Performance Reduced Instruction Set Processors. IBM Watson Research Center.Google ScholarGoogle Scholar
  6. S. Chevillard, M. Joldeş, and C. Lauter. 2010. Sollya: An environment for the development of numerical codes. In Mathematical Software - ICMS 2010 (Lecture Notes in Computer Science), K. Fukuda, J. van der Hoeven, M. Joswig, and N. Takayama (Eds.), Vol. 6327. Springer,Germany, 28--31.Google ScholarGoogle Scholar
  7. Hugues de Lassus Saint-Genies, David Defour, and Guillaume Revy. 2017. Exact lookup tables for the evaluation of trigonometric and hyperbolic functions. IEEE Trans. Comput. 66, 12 (2017), 2058--2071.Google ScholarGoogle ScholarCross RefCross Ref
  8. W. S. Dorn. 1962. Generalizations of Horner’s rule for polynomial evaluation. IBM J. Res. Dev. 6, 2 (Apr. 1962), 239--245.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Marat Dukhan and Richard W. Vuduc. 2013. Methods for high-throughput computation of elementary functions. In Proceedings of the 10th International Conference on Parallel Processing and Applied Mathematics (PPAM’13), Revised Selected Papers, Part I. 86--95.Google ScholarGoogle Scholar
  10. Milos D. Ercegovac. 1977. A general hardware-oriented method for evaluation of functions and computations in a digital computer. IEEE Trans. Comput. 7 (1977), 667--680.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gerald Estrin. 1960. Organization of computer systems—The fixed plus variable structure computer. In Proceedings of the International Workshop on Managing Requirements Knowledge. 33.Google ScholarGoogle Scholar
  12. Timothée Ewart, Fabien Delalondre, and Felix Schürmann. 2014. Cyme: A library maximizing SIMD computation on user-defined containers. In Supercomputing, Julian Martin Kunkel, Thomas Ludwig, and Hans Werner Meuer (Eds.). Lecture Notes in Computer Science, Vol. 8488. Springer International Publishing, 440--449.Google ScholarGoogle Scholar
  13. Timothée Ewart, Stuart Yates, Francesco Cremonesi, Pramod Kumbhar, Felix Schürmann, and Fabien Delalondre. 2015. Performance evaluation of the IBM POWER8 architecture to support computational neuroscientific application using morphologically detailed neurons. In Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems (PMBS’15). ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Richard J. Fateman. 2002. Code generation: Evaluating polynomials. University of California, Berkeley. Retrieved from http://people.eecs.berkeley.edu/~fateman/papers/polyval.pdf.Google ScholarGoogle Scholar
  15. Agner Fog. 1996-2016. The microarchitecture of Intel, AMD and VIA CPUs An optimization guide for assembly programmers and compiler makers. Retrieved from http://www.agner.org/optimize/microarchitecture.pdf.Google ScholarGoogle Scholar
  16. Agner Fog. 2018. Instruction tables. Retrieved from http://www.agner.org/optimize/instruction_tables.pdf.Google ScholarGoogle Scholar
  17. W. Fraser. 1965. A survey of methods of computing minimax and near-minimax polynomial approximations for functions of a single independent variable. J. ACM 12, 3 (July 1965), 295--314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Curtis F. Gerald and Patrick O. Wheatley. 2004. Applied Numerical Analysis. Pearson/Addison-Wesley.Google ScholarGoogle Scholar
  19. David Goldberg. 1991. What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. 23, 1 (March 1991), 5--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kazushige Goto and Robert A. van de Geijn. 2008. Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34, 3 (May 2008).Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. HiPEAC 2015. Fast Exponential Computation on SIMD Architectures. HiPEAC.Google ScholarGoogle Scholar
  22. Intel. 2009--2012. Intel Architecture Code Analyser. Retrieved from https://software.intel.com/en-us/articles/intel-architecture-code-analyzer.Google ScholarGoogle Scholar
  23. Mioara Joldes, Jean-Michel Muller, and Valentina Popescu. 2017. Tight and rigorous error bounds for basic building blocks of double-word arithmetic. ACM Trans. Math. Softw. 44, 2 (Oct. 2017). DOI:https://doi.org/10.1145/3121432Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Kahan. 2002. On the Cost of Floating-point Computation without Extra-precise Arithmetic. Retrieved from https://people.eecs.berkeley.edu/ wkahan/Qdrtcs.pdf.Google ScholarGoogle Scholar
  25. Felix Klein. 1932. Elementary Mathematics from an Advanced Standpoint. MacMillan and Co. Limited.Google ScholarGoogle Scholar
  26. Donald E. Knuth. 1962. Evaluation of polynomials by computer. Commun. ACM 5, 12 (1962), 595--599.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Donald E. Knuth. 1997. The Art of Computer Programming, Volume 2 (3rd ed.): Seminumerical Algorithms. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Monica S. Lam. 1990. Instruction scheduling for superscalar architectures. Annu. Rev. Comput. Sci. 4 (1990), 173--201.Google ScholarGoogle ScholarCross RefCross Ref
  29. C. Lauter. 2016. A new open-source SIMD vector libm fully implemented with high-level scalar C. In Proceedings of the 50th Asilomar Conference on Signals, Systems and Computers. 407--411.Google ScholarGoogle ScholarCross RefCross Ref
  30. Christoph Quirin Lauter. 2005. Basic Building Blocks for a Triple-double Intermediate Format. Technical Report RR-5702. INRIA. Retrieved from https://hal.inria.fr/inria-00070314.Google ScholarGoogle Scholar
  31. Richard J. Lipton and Larry J. Stockmeyer. 1978. Evaluation of polynomials with super-preconditioning. J. Comput. Syst. Sci. 16, 2 (1978), 124--139.Google ScholarGoogle ScholarCross RefCross Ref
  32. Sparsh Mittal. 2018. A Survey of Techniques for Dynamic Branch. Retrieved from https://arxiv.org/abs/1804.00261.Google ScholarGoogle Scholar
  33. S. L. Moshier. 2000. Cephes Math Library. Retrieved from http://www.moshier.net.Google ScholarGoogle Scholar
  34. Christophe Mouilleron and Guillaume Revy. 2011. Automatic generation of fast and certified code for polynomial evaluation. In Proceedings of the 20th IEEE Symposium on Computer Arithmetic (ARITH’11). IEEE, 233--242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jean-Michel Muller. 1997. Elementary Functions: Algorithms and Implementation. Birkhauser Boston, Inc., Secaucus, NJ.Google ScholarGoogle ScholarCross RefCross Ref
  36. Jean-Michel Muller. 2005. On the Definition of ulp(x). Retrieved from http://www.ens-lyon.fr/LIP/Pub/Rapports/RR/RR2005/RR2005-09.pdf.Google ScholarGoogle Scholar
  37. Jean-Michel Muller. 2006. Elementary Functions. Springer.Google ScholarGoogle Scholar
  38. A. C. R. Newbery. 1975. Polynomial evaluation schemes. Math. Comp. 29, 132 (1975), 1046--1050.Google ScholarGoogle ScholarCross RefCross Ref
  39. Richard E. Overill and Stephen Wilson. 1994. Performance of parallel algorithms for the evaluation of power series. Parallel Comput. 20, 8 (1994), 1205--1213.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Angela Pohl, Biagio Cosenza, Mauricio Alvarez Mesa, Chi Ching Chi, and Ben Juurlink. 2016. An evaluation of current SIMD programming models for C++. In Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing (WPMVP’16). ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Michael O. Rabin and Shmuel Winograd. 1972. Fast evaluation of polynomials by rational preparation. Commun. Pure Appl. Math. 25, 4 (1972), 433--458.Google ScholarGoogle ScholarCross RefCross Ref
  42. Gavin S. Reynolds. 2010. Investigation of Different Methods of Fast Polynomial Evaluation. Master’s thesis. The University of Edinburgh.Google ScholarGoogle Scholar
  43. Hugues De Lassus Saint-Genies. 2018. Elementary Functions: Towards Automatically Generated, Efficient, and Vectorizable Implementations. Ph.D. Dissertation. Université de Perpignan.Google ScholarGoogle Scholar
  44. Naoki Shibata. 2010. Efficient evaluation methods of elementary functions suitable for SIMD computation. Comput. Sci. Res. Dev. 25, 1 (2010), 25--32.Google ScholarGoogle ScholarCross RefCross Ref
  45. Lol Software. 2012. Remez exchange toolbox. Retrieved from http://lolengine.net/wiki/doc/maths/remez.Google ScholarGoogle Scholar
  46. Ping-Tak Peter Tang. 1989. Table-driven implementation of the exponential function in IEEE floating-point arithmetic. ACM Trans. Math. Softw. 15, 2 (June 1989), 144--157.Google ScholarGoogle Scholar
  47. P. T. P. Tang. 1991. Table-lookup algorithms for elementary functions and their error analysis. In Proceedings of the 10th IEEE Symposium on Computer Arithmetic. 232--236.Google ScholarGoogle ScholarCross RefCross Ref
  48. David Vandevoorde and Nicolai M. Josuttis. 2002. C++ Templates: The Complete Guide (1st ed.). Addison-Wesley Professional.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Polynomial Evaluation on Superscalar Architecture, Applied to the Elementary Function ex

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Mathematical Software
      ACM Transactions on Mathematical Software  Volume 46, Issue 3
      September 2020
      267 pages
      ISSN:0098-3500
      EISSN:1557-7295
      DOI:10.1145/3410509
      Issue’s Table of Contents

      Copyright © 2020 Owner/Author

      This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 September 2020
      • Accepted: 1 June 2020
      • Revised: 1 September 2019
      • Received: 1 July 2018
      Published in toms Volume 46, Issue 3

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format