文档库 最新最全的文档下载
当前位置:文档库 › A Subword-Parallel Multiplication and Sum-of-Squares Unit

A Subword-Parallel Multiplication and Sum-of-Squares Unit

A Subword-Parallel Multiplication and Sum-of-Squares Unit

Shankar Krithivasan?,Michael J.Schulte?,and John Glossner?

?Dept.of ECE,University of Wisconsin-Madison,Madison,WI

?Sandbridge Technologies,White Plains,NY

skrithi@https://www.wendangku.net/doc/1e6310067.html,,schulte@https://www.wendangku.net/doc/1e6310067.html,,jglossner@https://www.wendangku.net/doc/1e6310067.html,

Abstract

Several recent digital signal processors,multimedia pro-cessors,and general-purpose processors with multimedia extensions support subword parallelism.With subword par-allelism,each operand is partitioned into multiple lower-precision operands,called subwords.A single subword-parallel instruction performs the same operation on multi-ple sets of subwords in parallel.This paper presents the design of a subword-parallel multiplication and sum-of-squares unit(SPMSSU).The SPMSSU uses novel subword partitioning and partial product mapping techniques to per-form one32-bit,two16-bit,or four8-bit multiplications or sum-of-squares operations in parallel.The SPMSSU ef?-ciently performs subword-parallel operations with area and delay estimates that are comparable to those of a conven-tional32-bit multiplier.

1SPMSSU

The design of a novel combined multiplication and sum-of-squares unit(CMSSU)was introduced in[1].The CMSSU performs either multiplications(P=A·B)or sum-of-squares operations(P=A2+B2),and has only slightly more area and delay than a conventional multiplier. Since the CMSSU performs two squares and their addition with roughly the same delay as a multiplication,it improves the performance of many DSP and multimedia applications. The CMSSU presented in[1],however,does not support subword-parallel operations,and by enhancing the CMSSU to support subword-parallelism

This paper presents the design and implementation of a subword-parallel multiplication and sum-of-squares unit (SPMSSU),which performs one32-bit,two16-bit,or four 8-bit multiplications or sum-of-squares operations in https://www.wendangku.net/doc/1e6310067.html,pared to previous subword-parallel multipliers,the SPMSSU uses a novel subword-partitioning technique,in-troduced in[2],that can be applied to multiplier trees that

are not Booth encoded.An extension of the partial product mapping technique presented in[1],allows the SPMSSU to perform either subword-parallel multiplication or subword-parallel sum-of-squares operations,based on an input con-trol signal.Although the SPMSSU presented in this paper is designed to operate on two’s complement operands,which frequently occur in DSP and multimedia applications,it can easily be extended to also operate on unsigned operands.

The basic techniques presented in this paper can also be ex-tended to other operand and subword sizes.

For the design of a SPMSSU,the multiplication ma-trix of a conventional multiplier is modi?ed to allow it to also perform subword-parallel multiplication,using the technique presented in[2].The technique used to design the two’s complement combined multiplication and sum-of-squares unit described in[1]takes advantage of similar-ities between the partial product matrices for P=A·B and P=A2+B2,where A=a n?1a n?2...a1a0and B=b n?1b n?2...b1b0,are two n-bit two’s complement binary integers,to design a single unit that performs ei-ther multiplication or sum-of-squares computations,based on an input control signal s,where s is set to‘1’for sum-of-squares computations and’0’for multiplication.This signal is then used to de?ne the following variables:

c j=a j s∨b j

s∨b j s(for0≤j≤n?1)(2)

e i=s(a i⊕b i)(for0≤i≤n?1)(3)

Using c j,d j,and e i results in the following combined equation for two’s complement multiplication or sum-of-squares computations:

P=2n+22n?1+

n?2

j=0

(d j b n?1)2n+j?1

+

n?1

i=0

a i

b i22i+

n?2

i=1

i?1

j=0

(a i c j+d j b i)2i+j

1

8e 16p 16e 24e 32p 32p 15p 47

p 16p 48p 15

p 46

p ?1

p 31p p 63Z8Z8Z8

Z8Z16

Z160

01

1

2

2D C D e (b) 8?bit subwords

(a) 16?bit subwords

30

p p 62

MUL 16e p 62

p 30SOS

15

C D C 3

D 3C 02?1

2

48B 2

B 1

B 0

B 3A 2A 56

2

48

2

3

1

B 0

B 1

A 0

A 1

A 0

A 40

2

40

2

32

2

24

2

63

2

16

2

8

2

0248

2

32

2

24

2

63

2

16

2

8

2

?1

2

56

2

32p 7

e 15

e 31e 25e 23e 0e 9

p p 0p 17

e p p 63310e 1

1

D C

D C 0

?1

p 32p 31

e 17

e 0e Z16

Z160

Figure 1.Mapping Subword Parallel Multipli-cations and Sum-of-Squares Operations.

+

n ?1 i =0

e i 22i ?1

(4)

Figure 1shows how the subword-parallel multiplications

and sum-of-squares operations are mapped onto a 32-bit partial product matrix.Each region marked by marked by C i and D i contain partial products of the form a i c j and d i b j respectively.An additional row to add in the e i bits needed for sum-of-squares computations,are shown as the subwords E i in the ?gure.Based on the subword size,par-tial product bits are set to zero or inverted,product bits are inverted,and ones are added to the partial product matrix using the techniques presented in [2].

As illustrated in Figure 1,when performing multipli-cations (MUL),s =0and the the subword results are available from p 63p 62...p 1p 0.When performing sum-of-squares (SOS)operations,s =1and the subword results are available from p 62p 61...p 1p ?1.Since the product sub-words are one bit to the left of sum-of-squares subwords,the most signi?cant bit of each product subword,overlaps with the least signi?cant bit of each sum-of-squares sub-word.To correctly handle this situation when the subword size is k and there are m words,the multiplexers in these bit positions select e i ·k/2when s =1and p i ·k ?1when s =0,for 1≤i ≤m ?1.For example,with a subword size of 8bits,e 8,e 16and e 24are selected as the bits for the ?nal result,instead of p 15,p 31,and p 47.With a subword size of 16bits,e 16is used instead of p 31.

The design of the SPMSSU can be modi?ed or additional functional units can be added to support a wide range of in-structions.For example,multimedia processors often return only the less signi?cant or more signi?cant half of the prod-uct to avoid growth in subword size.To handle these types of instructions,additional multiplexors can be added to the output of the SPMSSU.

Another useful extension is to have a multioperand adder,which sums the SPMSSU result operand’s sub-words plus an optional accumulator value.This exten-sion can quickly compute vector dot products by having the SPMSSU perform subword-parallel multiplications,or

Functional Unit

Delay (ns)

122927126301132321

相关文档
相关文档 最新文档