# VLSI Implementation of Low Power FIR Filter using Variable Precision Two-Dimensional Pipeline Gating Multiplier

Satish Bojjawar<sup>1\*</sup>, Prabhu G Benakop<sup>2</sup>

<sup>1</sup>Associate Professor, EIE Department, CVR College of Engineering, India.
 <sup>1</sup>Research Scholar, ECE Department, JNTUH, Hyderabad, India.
 \*Corresponding author: satishbojjawar@gmail.com
 <sup>2</sup> ECE Department, Methodist College of Engineering and Technology, India.

### **Abstract**

The low-power FIR filter is required for many DSP applications. The crucial and powerhungry block in the filter is a multiplier. To implement the low power FIR filter a twodimensional variable precision fine-grain pipeline gating technique is introduced in the multiplier. The optimized multiplier is used to implement the transposed form-based FIR filter for the order N = 8 in ASIC design tools from Cadence in CMOS 45nm Technology. The designed FIR filter is compared with the existing multiplier-based FIR filters. The power-saving is achieved by the proposed filter is 22% without any degradation in the speed. The area penalty is 3% only due to the variable precision two-dimensional pipeline gating technique.

*Keywords:* FIR filter, Fine grain pipeline, dynamic power, clock gating, low power multiplier, VLSI, and variable precision.

### 1. Introduction

Low power is a major interest of IC technology, due to computing devices, portable communication, and decreasing of device feature size. The power is reduced for the fewer switching activities. The dynamic power depends on clock frequency, supply voltage, and switching activities [1]. The frequency and supply voltage of the circuit can reduce the power and at the same time reduce circuit performance. The switching activity reduction cannot degrade the performance of the circuit and it is considered an important parameter to save power [2].

The glitching power is minimized in VLSI circuits by Benini et al. [3], using the replacement of existing gate cells with the same functionality cells. This concept is suitable in the layout level only and gives a small saving in the power as optimization is restricted to layout only. In the work [4], some part of the functional block is disabled based on the input operand's dynamic range. It is called as Partial Guarded Computation (PGC) method. A power-aware multiplier design is proposed by Manish Bhardwaj [5], based on the property of time-varying input signal with adjustment of output quality. It is very suitable for non-pipelined arithmetic circuits but occupies more area. In [6], an array multiplier is introduced using a two-dimensional gating technique. It provides the gating signals to the multiplier and multiplier. This method is not suitable for pipelined multipliers. It is only suitable for non-pipelined multipliers. Switching activities cannot be reduced in this method.

Chua Chin Wang et al [7][8] discussed a two-dimensional bypassing method for nonpipelined multipliers. In this work, several regions of the multiplier are bypassed to reduce the switching activities and further power consumption is reduced. But it cannot be used for the reduction of the register activities. The existing works [8], [10]-[14], presented a pipeline gating technique to reduce the power consumption of the multipliers. It is a two-dimensional pipeline gating technique, but it cannot maintain the speed and efficiency of the pipelining.

In this paper, a variable precision two-dimensional fine-grain pipeline gating is proposed to reduce the switching activities in the registers and as well as in the multiplier. The optimized low power multiplier is used in the application of the FIR filter. The transposed or data broadcast-type FIR filter is designed for the order of N = 8. The pipelining is introduced in the transposed FIR filter structure. The optimized multiplier is divided into two sub multipliers and registers are placed between the sub-blocks. The critical path is reduced, and the total power of the FIR filter is much reduced.

The paper is organized as follows. Section II represents the basic architecture of onedimensional and two-dimensional pipelining techniques. The array multiplier implementation based on the two-dimensional pipeline gating technique is explained in section III. In section IV, the 8-tap transposed FIR filter is implemented using the optimized array multiplier presented. Section V gives the results, and the conclusion is discussed in section VI.

#### 2. Fine Grain Pipeline Gating Techniques

In this section, the variable precision fine-grain pipelining technique is introduced in the FIR filter implementation to reduce power consumption. The multiplier is an important block in the FIR filter structure, and it is the most power-hungry block in the filter. The power consumption of the multiplier depends on the number of transitions. The parallel or array multiplier is considered for the implementation of the transposed FIR filter to increase the throughput and speed. The transposed form FIR filter consists of inherent pipelining to reduce the critical path. The variable precision pipelining is used in the pipelined filter for further reduction of the power by reducing the number of switching activities.

The gating clock is applied to registers in horizontal and vertical directions to the data flow of the filter. From the current precision of the input data, the gating signals are generated. The gating signals are combined with the system clock to determine the sub-clocks. The basic pipeline gating is shown in Fig. 1. This technique disables the selective blocks using gating clocks and gives the correct results with the help of multiplexers. This technique gives good power and latency results with a little fine of area. Extra multiplexers and more AND gates are required to implement pipeline gating. This technique is called one-dimensional pipeline gating as it clocks the unnecessary stages in the data flow only.

#### 2.1. Two-dimensional pipeline gating

The two-dimensional pipeline gating is used to clock the stages in both vertical and horizontal directions, whereas in one-dimensional pipelining clock gates in vertically only. The architecture and principle of general two-dimensional pipeline gating are shown in Fig. 2. From the one-dimensional pipeline gating structure, the system clock is gated to generate sub-clocks using several gating signals. Each pipeline stage is connected by one sub clock and in that stage, all the registers are driven by each sub clock. For a particular condition, stage-4 can be disabled, and previous stage registers also can be disabled if the data flow through stage-4 or the output of stage-3 is bypassed. Hence, the number of switching activities is pruned and latency also. For the improvement in the power saving, the multipliers and adders are used with pipelining concept. In this pipelining, if the data in a pipeline register is correlated with previous register data, then one pipeline stage or some registers in that stage may be disabled. Here, some parts of the data are only processed in the current stage and the remaining data is passed to the next stage. For these arithmetic blocks, the two-dimensional pipeline gating is applied to reduce the power consumption.



Figure 1. The basic structure of one-dimensional pipelining gating.

In the structure of the two-dimensional pipelining technique, stage-4 at particular conditions could be disabled and the registers of stage-1, stage-2, and stage-3 also are disabled. Especially the first two registers of those stages are disabled. The data corresponding to these registers is processed in stage-4 only. The clock-4 is used to disable the redundant registers in the previous stage of stage-4. Similarly, the third stage is considered, the 3<sup>rd</sup> and 4<sup>th</sup> registers of stage-1 and stage-2 could be disabled. The number of switching transitions is reduced to trim the power consumption. The total number of registers and registers in each stage increases, then further benefit can be achieved in power saving.

#### 3. Pipelined Array Multiplier Implementation

The two-dimensional pipeline gating is used in the array multiplier. This optimized poweraware multiplier is used in the FIR filter. The multiplication process of the 4X4 multiplier is examined to understand the concept as shown in figure 3. In Fig. 3, X and Y are inputs and S is the output of the multiplier. The input precision is considered as 4, and example 1111 X 1111 is multiplied, then the result S is the addition of all inner partial products. If the precision is 3 i.e 0111 X 0111, the output is a combination of six digits only as  $X_3$  and  $Y_3$  bits are zero.

The computation corresponding to zero bits is not required and hence the clock connected to these registers can be disabled. For precision 2, ex: 0011 X 0011, the partial product combinations with  $X_2$  and  $Y_2$  also be disabled as shown in the figure as square boxes. If the precision is '1' considered, then the partial products  $X_1$  and  $Y_2$  terms highlighted in elliptical can be disabled. The required pipeline stages are reduced if the length of the output S is decreased. The 4-bit array multiplier with two-dimensional pipeline gating is shown in Fig. 4.



Figure 2. The general architecture of the two-dimensional pipeline gating method.



Figure 3. Basic 4 X 4 multiplication process.



Figure 4. Two-dimensional pipeline clock gating Array multiplier.

The array multiplier structure consists of Full Adders (FA), Half Adder (HA), and registers (Reg) as shown in Fig. 4. The multiplexer is denoted by n-2 means n to 1 Mux. The four gating signals are combined with the system clock and determine the sub-clocks. The input precision is provided by the number of gating clocks from the CPU. The sub clocks are connected to the registers of each pipeline stage. Based on the precision, certain sub-clocks are disabled. The registers corresponding to these sub-clocks cannot function during the multiplication computation. Finally, the multiplexers select the corresponding correct outputs from the functioning blocks.

If the multiplier and multiplicand are selected as 0111 and 0111, then S0 is a useful value for the multiplication. The numerical value is only selected from the pipeline stage after the AND matrix. Apart from those registers, the remaining registers and left side registers of the S0 combination, such as  $X_0$  and  $Y_0$  also will not function due to their clocks being disabled. Because the many clocks are disabled the registers in the many stages of the multiplier, the power consumption of the multiplier for particular precision and inputs is reduced significantly. If the precision varies the corresponding registers, clocks are disabled based on the precision and input operands, and hence power and latency are reduced.

# 4. Implementation of FIR Filter using Optimized Multiplier

In this section, a power-aware FIR filter is implemented using a two-dimensional pipeline gating array multiplier. Using many styles of the FIR filter structures, to get high throughput and to reduce critical path an inherent pipelined transposed form FIR filter is selected for the implementation [9]. The 8-tap transposed form FIR filter is shown in Fig. 5



Figure 5. Transpose form of 8-tap FIR filter.

The output of the FIR filter is given by equation (1)

$$y(n) = h_0 x(n) + h_1 x(n-1) + \dots + h_{N-1} x(n-N+1)$$
(1)

Where, x(n) and y(n) are input and output of the filter respectively,  $h_0$ ,  $h_{1,...}$ ,  $h_n$  are coefficients of the n-th order filter. As this filter gives the minimum throughput, to improve the throughput of the filter, pipelining is used in multipliers.

In order to achieve a shorter critical path, careful balancing is taken in the pipeline stages. The time for multiplication is denoted by  $T_M$  and adder time is represented by  $T_A$ . The  $T_M$  is greater than the  $T_A$  since a less critical path is achieved. Fig. 6 represents the pipelining technique in multipliers of FIR structure. The multiplier which is placed in the normal filter is divided into 2 pipeline multipliers and registers are placed between the pipelined multipliers. Then the  $T_{M1}$  and  $T_{M2}$  are times corresponding to two pipelined multipliers, and register time is  $T_R$ . the dashed line shows the critical path, which is denoted in equation (2),

$$T_{M1} = T_R + T_{M2} + T_A$$
 (2)

The pipelining technique in multiplier limits the throughput of the circuit. Due to the dividing of the multipliers, the time taken for the multiplier is less than the adder. This concept is fine for the short length of the coefficients or the inputs of the filter. The operating sampling frequency is good enough for the filter. If the word length of the filter input data or coefficients are increased, then the adder also takes significant time and should be considered to define the critical path. The adders need to be pipelined along with the multiplier.



Figure 6. FIR filter using Pipelining multiplication

# 5. Implementation Results

In this paper, a variable precision two-dimensional pipeline gating multiplier is proposed and implemented. For the comparison purpose, the basic array multiplier without any optimization and two-dimensional pipeline gating multipliers are also implemented. These three multipliers are placed in the 8-tap FIR filter and individually implemented. The power, delay, and area parameters are compared. The functionality of these architectures is represented in the Verilog HDL. The simulation and synthesis are carried out by ASIC based tools from Cadence. The genus synthesis tool is used in 45nm CMOS technology with generic library. Area, delay and power reports are generated by this tool.

The VLSI design metrics such as area, delay, and power consumption of the three multipliers with 8-bit input data with different input precisions are compared in Table.1.

Table 1. Comparison of area, delay, and power of the proposed multiplier with other existing multipliers

| Metrics                 | Array<br>Multiplier | Pipeline Gating<br>Multiplier | Proposed<br>Multiplier |
|-------------------------|---------------------|-------------------------------|------------------------|
| Delay (ns)              | 28.6                | 9.92                          | 9.1                    |
| Area (µm <sup>2</sup> ) | 1660                | 1780                          | 1793                   |
| Power (µW)              | 12.37               | 10.97                         | 8.11                   |



Figure 7. Power consumption of three multipliers; Array = Array multiplier, PGM = Pipeline Gating Multiplier, VPGM = Variable Precision Pipeline Gating Multiplier.

The total power saving due to the proposed multiplier is 25% when compared with the normal pipeline gating multiplier for the case of 1-bit precision. Fig. 8 and Fig. 9 show the delay and area comparisons of the proposed multiplier with existing multipliers respectively. The proposed architecture represents the power consumption is reduced without affecting the delay and area of the multipliers.



Figure 8. Delay comparison of three multipliers



The proposed designed multiplier and existing multipliers are used in the implementation of the 8-tap FIR filter. The functionality of the FIR filters are verified and the results are generated. The 8-tap FIR filters with Variable Precision two-dimensional Pipeline Gating Multiplier (FIR-VPGM), Pipeline Gating Multiplier (FIR-PGM), and Array Multiplier (FIR-Array) are implemented and compared.

| Metrics                 | FIR-Array<br>Multiplier | FIR-PGM | Proposed FIR-<br>VPGM |
|-------------------------|-------------------------|---------|-----------------------|
| Delay (ns)              | 32.3ns                  | 12.87   | 12.32                 |
| Area (µm <sup>2</sup> ) | 9982                    | 11201   | 11980                 |
| Power (µW)              | 13.87                   | 12.13   | 10.035                |

Table 2: Comparison of 8-tap FIR filters using three types of multipliers.





Fig. 10 shows the delay comparison of the three different types of FIR filters. Fig. 11 and Fig. 12 represent the comparison graphs of the gate area and power consumption of the three FIR filters. The proposed FIR filter using variable precision fine-grain pipeline gating multiplier achieved much saving of VLSI metrics such as power, delay, and area.



Figure 11. Comparison of gate count or area of three types of FIR filters.



Figure 12. Power consumption of 8-tap FIR filters.

The graph results of the FIR filter and Table.1 represents the FIR-VGPM achieved significant power saving (18%) and a fine of 3% area, without any degradation in the speed and functionality of the filter.

## 6. Conclusions

In this paper, the optimized multiplier with variable precision two-dimensional pipeline gating is designed and implemented. This multiplier is used in the 8-tap transposed form FIR filter and tested with all combinations of input data and different precisions are considered. The obtained results are compared with the 8-tap FIR filter using the existing multipliers. The comparison shows that the proposed FIR-VGPM achieves 18% power saving with an additional area of 3% without any degradation in the speed.

### References

- [1] Roy K. and Prasad S. 'Low-Power CMOS VLSI Circuit Design', John Wiley & Sons in 2000,
- [2] Pedram M. 'Power minimization in IC design: Principles and applications', ACM Transactions on Design Automation of Electronic Systems (TODAES), Vol.1, No.1 pp. 3-56, 1996.
- [3] Benini L., Micheli G.D., Macii A., Macii E., Poncino M. and Scarsi R. 'Glitching power minimization by selective gate freezing', IEEE Transactions on Very Large Scale Integration (VLSI) Syst., Vol. 8, No. 3, pp.287–297, 2000.
- [4] Junghwan Choi, Jinhwan Jeon and Kiyoung Choi 'Power Minimization of Functional Units by Partially Guarded Computation', Int. Symp. Low-Power Electronic Design (ISLPED-2000), pp. 131–136.
- [5] Manish Bhardwaj, Min R. and Chandrakasan A.P. 'Quantifying and enhancing power awareness of VLSI systems' .IEEE Trans. VLSI System, Vol.9, No.6 pp. 757-772, 2001,
- [6] Huang Z. and Ercegovac M.D. 'Two-dimensional signal gating for low-power array multiplier design'. IEEE International Symposium on Circuits and Systems, ISCAS'2002, IEEE Computer Society, Washington DC., USA, pp. 489-492, 2002.
- [7] Chua-Chin Wang and Gang-NengSung 'A Low-Power 2-Dimensional Bypassing Multiplier Using 0.35 um CMOS Technology', IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), Karlsruhe, Germany, pp. 405-410, 2006.
- [8] Michel Keating, David Flynn, Robert Aitken, Allen Gibbons and Kaijian shi 'Low Power Methodology Manual for System on Chip Design', Springer, New York, USA, 2007.
- [9] K. K. Parhi, VLSI Digital Signal Processing Systems, John Willey & Sons Inc., 1999.
- [10] Bansal, Bharat Naresh, Amanpreet Singh, and Jaskarn Singh Bhullar. "A Review of FIR Filter Designs." *Networking Communication and Data Knowledge Engineering*. Springer, Singapore, 2018. 125-140.
- [11] Shearer, Robert, Matthew Tubbs, and Ryan Haraden. "Variable precision in hardware pipelines for power conservation." U.S. Patent No. 9,927,862. 27 Mar. **2018**.
- [12] Tu, Jin-Hao, and Lan-Da Van. "Power-efficient pipelined reconfigurable fixed-width Baugh-Wooley multipliers." *IEEE transactions on computers* 58.10 **2009**: 1346-1355.
- [13] Paidimarri, Arun, et al. "FPGA implementation of a single-precision floating-point multiplyaccumulator with single-cycle accumulation." 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines. IEEE, 2009.
- [14] Rashidi, Bahram, Bahman Rashidi, and Majid Pourormazd. "Design and implementation of low power digital FIR filter based on low power multipliers and adders on Xilinx FPGA." 2011 3rd International Conference on Electronics Computer Technology. Vol. 2. IEEE, 2011.