1、PDF外文:http:/ 1 一、英文原文 An Assessment of the Suitability of FPGA-Based Systems for use in Digital Signal Processing Russell J. Petersen and Brad L. Hutchings Brigham Young University, Dept. of Electrical and Computer Engineering, 459 CB, Provo UT 84602, USA Abstract. FPGAs have been
2、 proposed as high-performance alternatives to DSP processors. This paper quantitatively compares FPGA performance against DSP processors and ASICs using actual applications and existing CAD tools and devices. Performance measures were based on actual multiplier performance with FPGAs, DSP processors
3、 and ASICs. This study demonstrates that FPGAs can provide an order of magnitude better performance than DSP processors and can in many cases approach or exceed ASIC levels of performance. 1 Introduction To meet the intensive computation and I/O demands imposed by DSP systems many custom digital har
4、dware systems utilizing ASICs have been designed and built. Custom hardware solutions have been necessary due to the low performance of other approaches such as microprocessor-based systems, but have the disadvantage of inflexibility and a high cost of development. The DSP processor attempts to over
5、come the inflexibility and development costs of custom hardware. The DSP processor provides flexibility through software instruction decoding and execution while providing high performance arithmetic components such as fast array multipliers and multiple memory banks to increase data throughput. The
6、 FPGA has also recently generated interest for use in implementing digital signal processing systems due to its ability to implement custom hardware solutions while still maintaining flexibility through device reprogramming 2. Using the FPGA it is hoped that a significant
7、 To be published in 5th International Workshop on Field-Programmable Logic and Applications, Oxford, England, Aug. 1995. This
8、work was supported by ARPA/CSTO under contract number DABT63-94-C-0085 under a subcontract to National Semiconductor. 2 performance improvement can be obtained over the DSP processor without sacrificing system flexibility. This paper is an attempt to quantify the ability of the FPGA to provide
9、 an acceptable performance improvement over the DSP processor in the area of digital signal processing. 2 Multiplication and digital signal processing A core operation in digital signal processing algorithms is multiplication. Often, the computational performance of a DSP system is limited by its mu
10、ltiplication performance, hence the multiplication rate of the system must be maximized. Custom hardware systems based on ASICs and DSP processors maximize multiplication performance by using fast parallel-array multipliers either singly or in parallel. FPGAs also have the ability to implement multi
11、pliers singly or in parallel according to the needs of the application. Thus, in order to understand the performance of the FPGA relative to the ASIC and the DSP processor a comparison of FPGA multiplication alternatives and their performance relative to custom multiplier solutions is needed. This s
12、ection presents the basic alternatives for multiplier implementations and their performance when implemented on FPGAs. 2.1 Multiplier architecture alternatives When implementing multipliers in hardware two basic alternatives are available. The multiplier can be implemented as a fully parallel-array
13、multiplier or as a fully bit-serial multiplier as shown in Figure 1. The advantage of the fully parallel approach is that all of the product bits are produced at once which generally results in a faster multiplication rate. The multiplication rate for a parallel multiplier is just the delay through
14、the combinational logic. However, parallel multipliers also require a large amount of area to implement. Bit-serial multipliers on the other hand generally require only1/N th the area of an equivalent parallel multiplier but take 2N bit times to compute the entire product (N is the number of b
15、its of multiplier precision). This often leads one to believe that the bit-serial approach is thus 2N times slower than an equivalent parallel multiplier but this is not true. The bit-times (clock cycles for synchronous bit-serial multipliers) are very short in duration due to the reduced size and h
16、ence propagation paths of the multiplier. This results in a bit-serial multiplier achieving about1/2 the multiplication rate of an equivalent parallel multiplier on average, even exceeding the performance of the parallel multiplier in some cases. 3 Fig. 1. Block diagrams of basic m
17、ultiplier alternatives 2.2 FPGA multiplication results Table 1 lists the performance of several multipliers implemented on three different FPGAs. The FPGAs used were a Xilinx 4010, an Altera Flex8000 81188, and a National Semiconductor CLAy31. The first two FPGAs can be characterized as medium-grain
18、ed architectures and are approximately equivalent in logic-density while the last FPGA is a fine-grained architecture utilizing smaller but more numerous cells. The multiplication rate of each multiplier is listed in MHz as well as the percentage of the FPGA required to implement the multiplier. The
19、 bit-serial multipliers have listed both their clock rate (bit-rate) and their effective multiplication rate (clock rate/2N). 2.3 Multiplier table contents The majority of the multipliers in this study used common architectures such as the Baugh-Wooley two's complement parallel-array multiplier
20、5 and pipelined versions of the bit-serial multiplier 6 shown in Figure 1. In addition, several custom parallel multipliers were built that take advantage of the special features available on the Altera and Xilinx FPGAs. These are intended to represent near the absolute maximum possible multiplier p
21、erformance that can be achieved with these current FPGAs. These specific customizations will be discussed below. Table 1. FPGA Multiplier Performance Results Type of Multiplier # CLB/LC's % of FPGA Mult. Speed Altera 81188 Parallel Multipliers 8-bit unsigned fast-adder 8-bit signed fast-adder 8-bit unsigned synthesis 8-bit signed synthesis 8-bit signed complex synthesis 16-bit unsigned fast-adder 133 150 129 135 584 645 13 14 12 13 57 63 14.8 MHz 12.8MHz 7MHz 6.84MHz 5.86MHz 3.34MHz