Optimized Pipelined Fast Fourier Transform Using Split and Merge Parallel Processing Units for OFDM

Prasanna Kumar, G.; Krishna, B. T.; Pushpa, K.

doi:10.1007/s11277-020-07471-3

Optimized Pipelined Fast Fourier Transform Using Split and Merge Parallel Processing Units for OFDM

Published: 19 May 2020

Volume 117, pages 3067–3089, (2021)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

G. Prasanna Kumar¹,
B. T. Krishna¹ &
K. Pushpa²

372 Accesses
5 Citations
Explore all metrics

Abstract

This paper presents a new optimized design to implement Fast Fourier Transform in orthogonal frequency division multiplexing for real valued signals. The proposed method is a new pipelined design, it is the combination of serial as well as parallel split and merge processing butterfly units. The proposed design initially has a single processing unit, in second stage, processing unit splits into two parallel processing units, and thereafter from stage three onwards, merges into a single processing until the last stage. The proposed design is optimized to reduce the latency as well as the hardware complexity for real inputs and complex inputs, as real valued signals itself reduce half of the computations, redundant data paths are eliminated effectively by this design. The parallel processing unit speeds up the processing at the initial stage, whereas single processing units reduce the hardware complexity. This design is a solution for ambiguity between Latency and hardware. The proposed design is very effective for real valued signals. The proposed design is implemented in Radix-2² model, in which multipliers are required for every two stages. The proposed design reduces Latency parameter more than 50% with a small cost of adders and multipliers, also power consumption for higher order FFT’s is low.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Area and Power-Efficient Serial Commutator FFT with Recursive LUT Multiplier

Canonic FFT flow graphs for real-valued even/odd symmetric inputs

Article Open access 14 June 2017

Throughput Radix-8 Based FFT Architecture

References

Rabiner, L. R., & Gold, B. (1975). Theory and application of digital signal processing. Englewood Cliffs, NJ: Prentice-Hall Inc.
Google Scholar
Guan, W., & Xiang, H. (2012). Low-complexity channel estimation using short-point FFT/IFFT for an iterative receiver in an LDPC-coded OFDM system. Wireless Personal Communications, 64(4), 739–747.
Article Google Scholar
Cooley, J. W., & Tukey, J. W. (1965). An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 19(90), 297–301.
Article MathSciNet Google Scholar
Wold, E. H., & Despain, A. M. (1984). Pipeline and parallel-pipeline FFT processors for VLSI implementations. IEEE Transactions on Computers, 5, 414–426.
Article Google Scholar
Hassan, S. L. M., Sulaiman, N., & Halim, I. S. A. (2018) Low power pipelined FFT processor architecture on FPGA. In 2018 9th IEEE control and system graduate research colloquium (ICSGRC) (pp. 31–34).
Bi, G., & Jones, E. V. (1989). A pipelined FFT processor for word-sequential data. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(12), 1982–1985.
Article Google Scholar
Parhi, K. K., Wang, C.-Y., & Brown, A. P. (1992). Synthesis of control circuits in folded pipelined DSP architectures. IEEE Journal of Solid-State Circuits, 27(1), 29–43.
Article Google Scholar
Swartzlander, E. E., Jain, V. K., & Hikawa, H. (1992). A radix-8 wafer scale FFT processor. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 4(2–3), 165–176.
Article Google Scholar
Sundararajan, M., & Govindaswamy, U. (2014). Multicarrier spread spectrum modulation schemes and efficient FFT algorithms for cognitive radio systems. Electronics, 3(3), 419–443.
Article Google Scholar
Thyagaturu, A. S., Alharbi, Z., & Reisslein, M. (2018). R-FFT: Function split at IFFT/FFT in unified LTE CRAN and cable access network. IEEE Transactions on Broadcasting, 64(3), 648–665.
Article Google Scholar
Shih, X.-Y., Chou, H.-R., & Liu, Y.-Q. (2017). VLSI design and implementation of reconfigurable 46-mode combined-radix-based FFT hardware architecture for 3GPP-LTE applications. IEEE Transactions on Circuits and Systems I: Regular Papers, 65(1), 118–129.
Article Google Scholar
Kanders, H., Mellqvist, T., Garrido, M., Palmkvist, K., & Gustafsson, O. (2019). A 1 million-Point FFT on a single FPGA. IEEE Transactions on Circuits and Systems I: Regular Papers, 66(10), 3863–3873.
Article MathSciNet Google Scholar
Yazicioglu, R. F., Merken, P., Puers, R., & Van Hoof, C. (2006). Low-power low-noise 8-channel EEG front-end ASIC for ambulatory acquisition systems. In 2006 proceedings of the 32nd European solid-state circuits conference (pp. 247–250).
Noor, S. M., John, E., & Panday, M. (2018). Design and implementation of an ultralow-energy FFT ASIC for processing ECG in cardiac pacemakers. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(4), 983–987.
Article Google Scholar
Sekhar, B. R., & Prabhu, K. M. M. (1999). Radix-2 decimation-in-frequency algorithm for the computation of the real-valued FFT. IEEE Transactions on Signal Processing, 47(4), 1181–1184.
Article Google Scholar
Andersson, R. (2013). FFT hardware architectures with reduced twiddle factor sets.
Lin, Y.-T., Tsai, P.-Y., & Chiueh, T.-D. (2005). Low-power variable-length fast Fourier transform processor. IEE Proceedings-Computers and Digital Techniques, 152(4), 499–506.
Article Google Scholar
Yeh, W.-C., & Jen, C.-W. (2003). High-speed and low-power split-radix FFT. IEEE Transactions on Signal Processing, 51(3), 864–874.
Article MathSciNet Google Scholar
Yang, L., Zhang, K., Liu, H., Huang, J., & Huang, S. (2006). An efficient locally pipelined FFT processor. IEEE Transactions on Circuits and Systems II Express Briefs, 53(7), 585–589.
Article Google Scholar
Shin, M., & Lee, H. (2008). A high-speed four-parallel radix-2 4 FFT/IFFT processor for UWB applications. In 2008 IEEE international symposium on circuits and systems (pp. 960–963).
Ayinala, M., & Parhi, K. K. (2013). FFT architectures for real-valued signals based on radix-23 and radix-24 algorithms. IEEE Transactions on Circuits and Systems I: Regular Papers, 60(9), 2422–2430.
Article MathSciNet Google Scholar
Garrido, M., Parhi, K. K., & Grajal, J. (2009). A pipelined FFT architecture for real-valued signals. IEEE Transactions on Circuits and Systems I: Regular Papers, 56(12), 2634–2643.
Article MathSciNet Google Scholar
Yang, K.-J., Tsai, S.-H., & Chuang, G. C. H. (2012). MDC FFT/IFFT processor with variable length for MIMO–OFDM systems. IEEE Transactions on Circuits and Systems I: Regular Papers, 21(4), 720–731.
Google Scholar
Langemeyer, S., Pirsch, P., & Blume, H. (2011). A FPGA architecture for real-time processing of variable-length FFTs. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1705–1708).
Ayinala, M., Brown, M., & Parhi, K. K. (2012). Pipelined parallel FFT architectures via folding transformation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 20(6), 1068–1081.
Article Google Scholar
Garrido, M., Grajal, J., Sanchez, M. A., & Gustafsson, O. (2011). Pipelined radix-2k feedforward FFT architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 21(1), 23–32.
Article Google Scholar
Salehi, S. A., Amirfattahi, R., & Parhi, K. K. (2013). Pipelined architectures for real-valued FFT and hermitian-symmetric IFFT with real datapaths. IEEE Transactions on Circuits and Systems II: Express Briefs, 60(8), 507–511.
Article Google Scholar
Glittas, A. X., Sellathurai, M., & Lakshminarayanan, G. (2016). A normal I/O order radix-2 FFT architecture to process twin data streams for MIMO. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 24(6), 2402–2406.
Article Google Scholar
Nash, J. (2018). Distributed-memory-based FFT architecture and FPGA implementations. Electronics, 7(7), 116.
Article Google Scholar
Nguyen, H., Khan, S., Kim, C.-H., & Kim, J.-M. (2018). A Pipelined FFT processor using an optimal hybrid rotation scheme for complex multiplication: Design, FPGA implementation and analysis. Electronics, 7(8), 137.
Article Google Scholar
Chen, S.-G., Huang, S.-J., Garrido, M., & Jou, S.-J. (2014). Continuous-flow parallel bit-reversal circuit for MDF and MDC FFT architectures. IEEE Transactions on Circuits and Systems I: Regular Papers, 61(10), 2869–2877.
Article Google Scholar
Tang, T., Pan, J., Yang, D., & Wang, E. (2018). Frequency measurement by FFT based on FPGA. In 2018 IEEE 3rd advanced information technology, electronic and automation control conference (IAEAC) (pp. 453–456).

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, JNTUK, Kakinada, India
G. Prasanna Kumar & B. T. Krishna
Department of Electronics and Communication Engineering, Shri Vishnu Engineering College, Bhimavaram, India
K. Pushpa

Authors

G. Prasanna Kumar
View author publications
You can also search for this author in PubMed Google Scholar
B. T. Krishna
View author publications
You can also search for this author in PubMed Google Scholar
K. Pushpa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Prasanna Kumar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Latency Convergence Phenomenon

The latency convergence curve for 16-input pipelined FFT is with respect to parallel units is shown in Fig. 16, 1-parallel to 8-parallel units along x-axis and latency along y-axis. The curve clearly shows that 1-parallel designs has very high latency and 8-parallel design has no latency and as the number of parallel units increases Latency reduces.

The proposed design is in between 1 and 2 parallel designs, so that in the it is represented as 1.5. The proposed design with additional one BU unit placed based on the reduction of latency. The optimum reduction in latency is obtained by splitting the BU units in appropriate stage.

1.2 Optimum Reduction in Latency

Figure 17 shows the latency comparison of proposed split and merge designs, with different stages of splitting. The one parallel design has very high latency, split and merge in stage 2(sp2) has lower latency compared to third(sp3) and fourth stage(sp4) splitting.so that optimum reduction in latency is obtained with the splitting section in 1-parallel pipelined architecture is placed in stage-2.

1.3 Statistical Data Analysis of the Proposed Method

Figure 18 shows correlation analysis of the proposed design with some existing designs. in almost all designs throughput is highly correlated and utilization of hardware is moderately correlated. The statistical data analysis of the number of adders and latency is for N = 16 is measured in Rx643.5.3 with estimation package.

The histogram of number adders and latency is shown in Fig. 19, it clearly shows that histogram is uniform, QQ plot is shown in Fig. 20. The correlation analysis shown in Fig. 21, it shows that the latency of proposed design is highly uncorrelated to existing designs, number of adders are weakly correlated to existing designs.

1.4 Butterfly Units (BU)

The internal operation of the Butterfly Unit is shown in Fig. 22, BU1, BU21 and BU22 are the butterfly units for processing real signals, Bu1 is the butterfly unit in stage -1, BU21 and BU22 are the two parallel butterfly units in stage-2, BU3 and BU4 are butterfly units in stage -3 and stage-4. The internal design of BU unit for real signal consists of two real adders, whereas for complex signals requires four real adders.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prasanna Kumar, G., Krishna, B.T. & Pushpa, K. Optimized Pipelined Fast Fourier Transform Using Split and Merge Parallel Processing Units for OFDM. Wireless Pers Commun 117, 3067–3089 (2021). https://doi.org/10.1007/s11277-020-07471-3

Download citation

Published: 19 May 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11277-020-07471-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimized Pipelined Fast Fourier Transform Using Split and Merge Parallel Processing Units for OFDM

Abstract

Access this article

Similar content being viewed by others

An Area and Power-Efficient Serial Commutator FFT with Recursive LUT Multiplier

Canonic FFT flow graphs for real-valued even/odd symmetric inputs

Throughput Radix-8 Based FFT Architecture

References