Volume: 08 Issue: 03 | Mar 2021 www.irjet.net p-ISSN: 2395-0072

# A 32-BIT PIPELINED FFT PROCESSOR FOR OFDM IN COMMUNICATION **SYSTEM**

Ms. R.MOULIKA<sup>1</sup>, Dr. R.POOVENDRAN<sup>2</sup>

<sup>1</sup> Student of Adhiyamaan College of Engineering (Autonomous), Hosur <sup>2</sup>Faculty of Adhiyamaan College of Engineering (Autonomous), Hosur

**Abstract:** This paper comprises of an investigation of Fast Fourier Transform (FFT) designs which are the foundation of any OFDM based remote organizations. By utilizing the FFT ideas we are in fact in building up an effective structures for remote organizations which are normal in all over . This paper will explicitly address the force proficient plan of a FFT processor as it identifies with arising OFDM correspondences, for example, intellectual radio .Increasing paces and unpredictability of remote correspondence frameworks have required the advancement and progression of superior sign handling components, Our ideas are simply founded on being developed we need by utilizing (FPGA), These assets incorporate force, memory, and chip region. Progressing research looks to enhance asset utilization just as execution. Configuration turns into an equilibrium and bargain of adaptability, execution, multifaceted nature, and cost.

#### Keywords: FFT, OFDM, FPGA.

1. INTRODUCTION:

Cognitive radio is a strategy for remote correspondence via powerfully adjusting the various subcarriers to transmission changing conditions in the correspondence channels. The subcarriers are empowered by a balance plot referred to as orthogonal frequency division multiplexing (OFDM). OFDM changes over a high information rate signal into various lower information rate signals for concurrent transmission through various channels. The Fast Fourier Transform processor is the core of OFDM that empowers its quick and productive adjustment of signs. The FFT calculation is a quick calculation of the Discrete Fourier Transform which is a fundamental part of the balance plot utilized in OFDM. As the FFT processor is the most computationally escalated segment in OFDM correspondence, an improvement in the force productivity of this segment can impactly affect the general framework. These effects are huge considering the quantity of portable and far off specialized gadgets that depend on restricted battery-fueled activity. This undertaking will fill in as an Investigation of current FFT processor calculations and models just as improvement methods that mean to decrease the force utilization of these gadgets. Super Wideband (UWB) Technology brings the accommodation and portability of remote correspondences to rapid interconnects in gadgets all through the computerized home and office. Instead of wired connection, this technology enables wireless connection for transmitting video, audio, and other data with high data speed and low power consumption. After IEEE 802.15 3a was withdrawn in the Spring of 2006, Multiband-OFDM has been controlled by ECMA International. However, some key issues need to be solved for designing CMOS based Multiband- OFDM UWB solution in support of the low power requirement. One of the issues focuses on its FFT (Fast Fourier Transform) block, which takes 25% design complexity of the total digital baseband transceiver [1]. Although many results have already been published in this research area in the past few years [4], some key problems still exist and need to be improved for the speed, area and power consumption consideration. In light of ECMA-368, for the necessity of Multiband-OFDM framework, this FFT processor should deal with a couple hundred MHz, which makes it hard to execute. Furthermore, since this framework focuses for the remote convenient gadgets, little territory and low force utilization are likewise basic. Hence, this theory centers around the region and force utilization improvement under the ECMA-368 standard necessities. To achieve this goal, a couple of stages should be followed. The underlying advance is to find the points of interest for this FFT processor, which is directed by the Multiband OFDM UWB standard. The movement requires the examination on OFDM and UWB development and the goals of its FFT processor. Resulting to describing the points of interest, overhauled FFT count

e-ISSN: 2395-0056

and configuration should be used for these subtleties. There are an enormous number of FFT calculations and structures in the sign preparing writing [7]. Subsequently, the condition of workmanship calculations and structures should be broke down and analyzed. In view of various calculations and designs, diverse force utilizations, zone and speed of the processor will be accomplished. So their ASIC appropriateness should be examined and the exertion should be centered around the picking calculations and structures and improvement. Besides, the improvement space should be investigated and the engineering should be additionally upgraded. The proposed calculation and design should be approved by reenactment before usage. After that, this circuit needs to be implemented with VHDL. The synthesis step is followed by using both Simplify Pro targeted for FPGA and style Compiler for ASIC. For this, each carrier may use one among the several available digital modulation techniques.

# 1. Literature survey:

FFT pruning calculations and structures [9] are intended for figuring specific recurrence receptacles on the range however not for mirroring the asset designation conspire in the OFDMA framework. Subsequently in this concise. we propose the unfinished stored **FFT** calculation for common asset block (RB) dispersions within the OFDMA framework. Besides, we design a mixed pipelined/cached-FFT processor with constellation and power awareness. Finally, we implement and measure the FFT processor chip to point out its energy efficiency.

# 2.1. PARTIAL CACHED - FFT ALGORITHM FOR Resource Block ALLOCATION:

By and large, considering the genuine equipment plan, the exactness of FFT/IFFT module is a significant plan factor of framework execution. By and by, fixed-point number juggling is utilized o actualize FFT calculations in equipment since it is beyond the realm of imagination to expect to keep limitless goal of coefficients and activities. All coefficients and information signals must be spoken to with number of pieces in paired configuration relying upon the tradeoff between the equipment cost (memory use) and the precision f yield signals. As a rule, every increase

may acquaint a mistake due with adjusting tasks or truncations, which is alluded as number juggling quantization blunder. The theoretical performance evaluation has been given in previous works. Several previous works have analyzed the effect of fixed-point arithmetic for radix-2 FFT.In this section, we derive the equivalent matrix form of both DIF and DIT FFT algorithms. Although the alternative DIT and DIF FFT algorithms have the same multiplicative complexity, sequence of butterfly stages and twiddle factor stages is re-versed. In other words, the signal flow of two alternative representations is actually the mutual mirroring of each other.

e-ISSN: 2395-0056



2D pipelined. The proposed 64 to 1024-point cached-FFT processor, as shown in Fig. 1, consists of a radix-2/22 pipelined butter- y processor, cache sets, and main memory. We propose a dual-delay feedback (DDF) butterfly architecture, to avoid the idling of the butterfly unit (BU) the traditional single-delay-feedback architecture. Two radix- 2 sequences share the butterfly units by using a delay register the first butterfly. The time schedule for the dual-stream processing. The gray and slash blocks of time slots represent the operations of BF2 D and BF2 2D butterfly units, respectively. Thus, the butterfly processor can achieve 100% utilization rate. If the radix23 butterfly processor is used for the dual streams, the latency will be increased, because more delay registers must be inserted for 100% utilization. If radix-4 and four streams are used, the cost will significantly grow. Therefore, the radix-2/22 dual-stream architecture is a compromise of the cost and the throughput.

#### 2.2. CACHE/MEMORY ARCHITECTURE:

We use two cache sets to facilitate the data transfers between butterfly processor and the cache sets and between the cache ts and the main memory. The cache sets

must perform two rite operations and two read operations in one clock cycle r the dual-stream processing. To avoid the utilization of the ur-port cache and simplify the control complexity, we divide ach cache set into two banks for even and odd addresses.

The dd/even address detector serves to detect whether the data e accessed in the correct bank and to exchange the access positions if necessary. Thereby, we can carefully manipulate the computational time schedule to ensure that one even address and one odd address are write operations in one cycle. Therefore, we divide the main memory into two banks of single-port 512-word static random-access memory (SRAM) for even and odd memory addresses.



Fig-2 CACHE/MEMORY ARCHITECTURE

## 3.Existing System

The input file stream is split into N parallel low bandwidth modulated data streams. thanks to orthogonality of subcarriers they are doing not interfere with each other. Each subcarrier features a low symbol rate .But the mixture of subcarriers carrying the knowledge in parallel allows for top data rates. Low image rate is utilized to decrease the matter of Inter Symbol Interference (ISI)8-10. Before regulation the transmitter phase of an OFDM handset takes information. Converts the information and encodes it into a sequential stream. The age of OFDM signal is happens by utilizing Inverse Fast Fourier Transform (IFFT). Switch measure happens in recipient stage.

#### 3.1 Transmitter



e-ISSN: 2395-0056

Fig.3 Block diagram of OFDM Transmitter and receiver.

#### 3.1.1 BPSK Modulation

The BPSK Modulation scheme is one of the Phase shift keying technique which can be used for the modulation with 180 deg phase shift keying is applied.

#### 3.1.2 Serial to Parallel

The Serial to parallel block contains Technique that converts serial form input data to parallel form outputs, it converts the data serial inputs to parallel outputs that can be works after getting all inputs in the parallel signal.

#### 3.1.3IFFT Block

The IFFT Block is converts the Input data from Frequency domain to Time domain outputs.

#### 3.1.4 Cyclic Prefix

The Cyclic Prefix is block that assigns empty memory space with your input data in the base of 1/N value of the input is to be assigned for cyclic prefix memory.

# 3.15 Receiver Block

The Receiver block contains Remove cyclic prefix block, serial to parallel block, FFT Block, Parallel to serial block, Demodulation.

#### 3.1.6 Remove Cyclic Prefix

The Remove cyclic prefix is removes the empty space that created in the transmitter block. The cyclic prefix. Memory is contains noise and errors that can be removed in this block.

#### 3.1.7 Fast Fourier TransformBlock

The Fast Fourier Transform block is performs the input from time to frequency operations. The FFT blocks basically contains different type of radix based FFT architectures as follows,

- Radix-2
- Radix-4
- Mixed- Radix
- Split Radix

during this paper we propose the pipelined implementation of Radix-2 based single delay feedback (R2SDF), Radix-2 single delay commutator (R2SDC), combined architecture is implemented within the receiver architecture of OFDM

## 4.1Computation Time Schedule:

Since the cache flush/refresh time is smaller than the one- ass processing time, the butterfly processor can continuously compute without an idle time slot. However, the timing overlaps between the cache flushing/refreshing and part, appeared in Table 1.



Fig.4 Proposed system block diagram.

The contrasting post-stage revamps back the new progression to the mind boggling setup. The SDC stage (t = 1, 2,

.., log2N – 1) contains a SDC PE, which may accomplish 100% math asset usage of both complex adders and sophisticated multipliers. Last, the even information are recovered in ordinary request. Along these lines, the touch reverser requires just N/2 information support. The last phase of Single Delay Feedback (SDF) is indistinguishable from the radix-2 Single Delay Feedback (SDF), containing a perplexing snake and a complex subtractor. By utilizing the altered tending to method12, the information with an even file are composed into memory in typical request, and they are then recovered from memory in bit turned around request while the ones with an odd file are reversed order.

Normally the Fast Fourier Transform architectures

are working based upon the parallel architectures, so FFT's are consume large area and latency, in order to perform low area and latency we are in deed in developing the pruning the FFT's with help of Pipelining Data path, feed forward, feed backward. And some various architectures for reducing the area and delay of the FFT processors, there are various radix-2 based architectures such as R2MDC, R2SDF, R2SDC; these architectures are very useful in making the pipelined operation of FFT's.

e-ISSN: 2395-0056

The proposed system consists of the architecture that can be have the stage-1, stage-2, stage-3, stage-4, stage-5 all are designed for 64-point FFT architectures, as shown in the Figure.

Fig.5 Block diagram showing pipelined based radix-2 FFT for 8-point

The proposed system consists of five stages and the stage-t which is also present in 64-point FFT architecture, All FFT based architectures are Operating based upon the parallel architectures, so we are proposing a Pipelined operation of FFT radix-2 architectures with combination of SDF-SDC architectures in order to achieve higher data rate. log2N – 1) contains a SDC PE, which may accomplish 100% math asset usage of both complex adders and sophisticated multipliers

four  $8 \times 13$  caches, and four  $128 \times 13$  ROMs. The function of the chip is verified, and its performance is measured using a digital test station. This chip can operate at maximum 51 MHz with 33.3 mw. We use the worst and best-case patterns for measuring the energy- saving capability of different RB allocation schemes, as shown in Fig. 4. Each curve represents a series of  $\{1, 3/4, 1/2, 1/4, 1/8, 1/16, 1/32, 1/64\}$  computed RBs in the whole OFDMA spectrum. We can see that the energy dissipation scales to the modulation order and the amount of the allocated resource

Volume: 08 Issue: 03 | Mar 2021

www.irjet.net



Fig.6 FFT radix

The comb-distributed scheme has the best performance, whereas the localized scheme has the worst performance. Note that the RB- level distributed scheme is a compromise between localized and comb-distributed schemes. Thus, its energy dissipation also lies between them. The energy scaling ranges from 1.90 to 0.64 nJ/FFT point for the 1024-point FFT, i.e., 67% energy can be saved in the FFT processor for minimal RB transmission.

This result outperforms other cached-memory-based FFT chips. The latency for processing 1024-point full FFT is 61  $\mu$ s, which meets the 66.7- and 91.5- $\mu$ s OFDM symbol durations of the 3GPP-LTE and Mobile Wi MAX standards, respectively.



| Supply power          |              |              |  |
|-----------------------|--------------|--------------|--|
| Parameters            | Existing FFT | Modified FFT |  |
| Quiescent Power in mW | 6.40E-04     | 4.95E-03     |  |
| Dynamic Power in mW   | 8.06E-02     | 8.70E-03     |  |
| Total Power in mW     | 8.12E-02     | 8.57E-02     |  |

#### **6.RESULT DUSSCUSSION:**

The input file length of our proposed pipeline FFT processor may be a parameter which may be decided at the range of 128, 512, 1024 and 2048 Take 1024 points FFT as an example. At first, the 1024 points

FFT is coded using MATLAB software. After the chosen FFT algorithm is valid, the architecture of the processor was modeled in VHDL language and functionally verified using Xilinx 12.3i software and timing simulation using ISIM SE software. During the timing simulation, a test bench file included the TEXTIO package was written to read input data and write FFT result. The behavioral simulation waveforms for the Processor.

e-ISSN: 2395-0056

p-ISSN: 2395-0072



Fig.7 Simulation of 1024 point FFT



Fig 8.Comparison chart showing OFDM with Radix-2 based FFT with OFDM based optimum multiplier and their AREA (slices) and LUT  $\,$ 



Fig9. Comparison chart showing OFDM with Radix-2 based FFT with OFDM based optimum multiplier and their Frequency

Volume: 08 Issue: 03 | Mar 2021 www.irjet.net p-ISSN: 2395-0072



Fig10.Comparison chart showing OFDM with Radix-2 based FFT with OFDM based optimum multiplier and their Delay.

# Output waveform of OFDM in Tx and RX



Fig11.Screen short of output in OFDM



Fig12.Simulation waveform of OFDM in transmitter



Fig13.Simulation waveform of OFDM in receiver

| Area                  |                 |                 |  |
|-----------------------|-----------------|-----------------|--|
| _                     | Existing<br>FFT | Modified<br>FFT |  |
| No of LUT's           | 3115            | 4393            |  |
| No of Slice Registers | 1764            | 2092            |  |

e-ISSN: 2395-0056

| Timing            |          |          |  |
|-------------------|----------|----------|--|
|                   | Existing | Modified |  |
| Parameters        | FFT      | FFT      |  |
| Delay in ns       | 26.458   | 22.742   |  |
| Frequency in Mhz  | 37.796   | 43.972   |  |
| Latency in cycles | 28       | 20       |  |

# 7. Acknowledgements:

The creators might want to thank the mysterious commentators for their remarks which were useful in improving the quality and introduction of this paper .The results shows that the pipelined execution of this engineering is having low territory, high postponement an,less recurrence is removed from the outcomes, the deferral is high since the design is actualized pipeline structures.

# References

- [1] Cimini LJ. Analysis and simulation of a digital mobile channel using orthogonal frequency division multiplexing. IEEE Trans Commun. 1985; 665–75.
- [2] Lin YW, Liu HY, Lee CY. A 1-GS/s FFT/IFFT processor for UWB applications. IEEE J Sol-State Circ. 2005; 40(8):1726–35.
- [3] Cheng C, Parhi KK. High throughput VLSI architecture for FFT computation. IEEE Trans. Circuits Syst II, Exp Briefs. 2007 Oct; 54(10):339–44.

Tang SN, Tsai JW, Chang TY. A 2.4-GS/s FFT Processor for OFDM-Based WPAN Applications. IEEE Trans Circuits Syst II, Exp Briefs. 2010 Jun; 57(6):451–5.



- [4] Jung Y, Yoon H, Kim J. New efficient FFT algorithm and pipeline implementation results for OFDM/DMT applications. IEEE Trans Consum Electron. 2005; 49:14–20.
- [5] ShinM, Lee H. A high-speed, four-parallel radix- 24 FFT processor for UWB applications. Proceeding of IEEE ISCAS; 2008. p. 960–3.
- [6] Y. W. Lin, H. Y. Liu, and C. Y. Lee, "A dynamic scaling FFT processor for DVB-T applications," IEEE J. Solid-State Circuits, vol. 39, no. 11, pp. 2005–2013, Nov. 2004.
- [7] T. H. Yu, C. Z. Zhan, Y. J. Cho, C. L. Yu, and A. Y. Wu, "Efficient fast Fourier transform processor design for DVB-H system," in Proc. 18<sup>th</sup> VLSI/CAD Symp.,
- [8] C. P. Fan and G. A. Su, "A grouped fast Fourier transform algorithm design for selective transformed outputs," in Proc. IEEE APCCAS, 2006, pp. 1939–.1950
- [9] C80216m-08\_503, Motorola IEEE Downlink Resource Mapping, IEEE, May 2008.
- [10] 3GPP, R1-071091, Philips Resource- Block Mapping of Distributed Transmissions in E-UTRA Downlink, Feb. 2007.

e-ISSN: 2395-0056