Whatsapp

Whatsapp
KAMERA SIGURIE - DVR HDD KABULL

Prezantimi i numrave ne informatike - Number Representation and Computer Arithmetic



Number Representation and Computer Arithmetic (B. Parhami / UCSB) 1
Number Representation and Computer Arithmetic
Article to appear in Encyclopedia of Information Systems, Academic Press, 2001.
*********** The following outline is not part of the article *********** # pages
Introductory paragraph 0.5
I Natural Numbers 4
Brief history of number representation,
positional number systems, number radix conversion
II Digit Sets and Encodings 1.5
Binary, BCD, ASCII, standard or nonstandard, irredundant or redundant
III Integers 2.5
Representing negative whole numbers, signed magnitude,
1’s- and 2’s-complement numbers, biasing
IV Counting 1.5
Incrementers and decrementers, synchronous counters,
asynchronous counters, ultrafast counters
V Fixed-Point Numbers 3
Radix point, radix conversion for fractions, range and precision
VI Addition and Subtraction 3
Half- and full-adders, digit-serial adders, ripple-carry adders,
2’s-complement addition
VII Fast Adders 3.5
Carry recurrence, carry-lookahead adders, other fast adders
VIII Multiplication 3
Shift-add multiplication algorithms, programmed multiplication,
hardware binary multipliers, high-radix multipliers
IX Fast Multipliers 2.5
Tree, partial-tree, and array multipliers
X Division 2.5
Shift-subtract division algorithms, programmed division,
hardware binary dividers, faster dividers, shared multiply/divide
XI Real Numbers 2.5
Floating-point numbers, ANSI/IEEE standard, rounding modes
XII Floating-Point Arithmetic 3.5
Addition, subtraction, multiplication, and division
of floating-point numbers, floating-point hardware
XIII Function Evaluation 2.5
Square root, trig functions, logarithms, exponentiation,
approximating functions, table lookup
XIV Precision and Errors 3
Representation and computation errors, error accumulation,
surprises in floating-point arithmetic, certifiable arithmetic
XV Speed and Reliability 3
Ultrafast arithmetic circuits, pipelining techniques,
fault tolerance, error checking
XVI Unconventional Number Systems 3.5
Exotic representations, redundancy in arithmetic,
rational numbers, logarithmic numbers, RNS
XVII Research Directions 1
Further Reading 0.5
******************* end of outline ********************* Total pages 47
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 2
Arithmetic is a branch of mathematics that deals with numbers and numerical
computation. Arithmetic operations on pairs of numbers x and y include addition,
producing the sum s = x + y, subtraction, yielding the difference d = x y, multiplication,
resulting in the product p = x × y, and division, generating the quotient q = x / y (and, in
the case of integer division, the remainder z = x mod y). Subtraction and division can be
viewed as operations that undo the effects of addition and multiplication, respectively.
Computer arithmetic is a branch of computer engineering that deals with methods of
representing integers and real values (e.g., fixed- and floating-point numbers) in digital
systems and efficient algorithms for manipulating such numbers by means of hardware
circuits or software routines. On the hardware side, various types of adders, subtractors,
multipliers, dividers, square-rooters, and circuit techniques for function evaluation are
considered. Both abstract structures and technology-specific designs are dealt with.
Software aspects of computer arithmetic include complexity, error characteristics,
stability, and certifiability of computational algorithms.
I Natural Numbers
When we think of numbers, it is usually the natural numbers that first come to our mind;
the type of numbers that sequence book or calendar pages, mark clock dials, flash on
stadium scoreboards, and guide deliveries to our houses. The set {0, 1, 2, 3, . . .} of
natural numbers, also known as whole numbers or unsigned integers, forms the basis of
arithmetic. We use natural numbers for counting and for answering questions that ask
“how many?”.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 3
Four-thousand years ago, Babylonians knew about natural numbers and were proficient
in arithmetic. Since then, representations of natural numbers have advanced in parallel
with the evolution of language. Ancient civilizations used sticks and pebbles to record
inventories or accounts. When the need for larger numbers arose, the idea of grouping
sticks or pebbles simplified counting and comparisons; for example, 27 was represented
by five groups of five sticks, plus two sticks. Eventually, objects of different shapes or
colors were used to denote such groups, leading to more compact representations.
Numbers must be differentiated from their representations, sometimes called numerals.
For example, the number “twenty-seven” can be represented in different ways using
various numerals or numeration systems; these include:
||||| ||||| ||||| ||||| ||||| || sticks or unary code
27 radix-10 or decimal code
11011 radix-2 or binary code
XXVII Roman numerals
However, we don’t always make the distinction between numbers and numerals and may
use “decimal numbers” in lieu of “decimal numerals.”
Roman numerals were used in Europe during the Middle Ages. Even when Europeans
learned that the Arabs had a better way of representing numbers, it took them centuries to
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 4
adopt the Arabic system based on numerals, or digits, 0-9 and a radix of 10. In these
decimal numbers, the worth of each position is 10 times that of the adjacent position to its
right, so that the string of digits “5327” represents five thousands, plus three hundreds,
plus two tens, plus seven ones.
Other radices have also appeared over the ages (see A. Glaser’s History of Binary and
Other Nondecimal Numeration, Tomash Publishers, 1981). Babylonians used radix-60
numbers, which make dealing with time easy. The radices 12 (duodecimal) and 5
(quinary) have also been used. The use of radix-2 ( binary) numbers became popular with
the onset of electronic computers, because their use of binary digits, or bits, having only
two possible values 0 and 1, is compatible with electronic signals. Radix-8 (octal) and
radix-16 (hexadecimal) numbers have been used as shorthand notation for binary
numbers. For example, a 24-bit binary number can be represented as an 8-digit octal or a
6-digit hexadecimal number by taking the bits in groups of threes and fours, respectively.
In a general radix-r positional number system, with a fixed word width of k, a number x is
represented by a string of k digits xi, with 0 xi r – 1:
x = Ói=0 to k–1 xi ri = (xk–1xk–2 . . . x1x0) r (1)
For example:
27 = (1 × 24) + (1 × 23) + (0 × 22) + (1 × 21) + (1 × 20) = (11011)two
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 5
In a k-digit radix-r number system, natural numbers from 0 to rk – 1 can be represented.
Conversely, given a desired representation range [0, M – 1], the required number k of
digits in radix r is obtained from the following equation:
k = logr M= logr(M – 1)+ 1 (2)
For example, representing the decimal number 3125 requires 12 bits in radix 2, five digits
in radix 5, and four digits in radix 8.
Given a number x represented in radix r, one can obtain its radix-R representation in two
ways. If we wish to perform arithmetic in the new radix R, we simply evaluate a
polynomial in r whose coefficients are the digits xi. This corresponds to Equation (1) and
can be performed more efficiently by using Horner’s rule which involves repeated steps
of multiplying by r followed by addition:
(xk–1xk–2 . . . x1x0) r = ((…(xk–1r + xk–2)r + . . . x1)r + x0) (3)
This method is particularly suitable for manual conversion from an arbitrary radix r to
radix 10, given the relative ease with which we can perform radix-10 arithmetic.
To perform the radix conversion using arithmetic in the old radix r, we repeatedly divide
the number x by the new radix R, keeping track of the remainder in each step. These
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 6
remainders correspond to the radix-R digits Xi, beginning from X0. For example, we
convert the decimal number 23 to radix 2 as follows:
23 divided by 2 yields 11 with remainder 1
11 divided by 2 yields 5 with remainder 1
5 divided by 2 yields 2 with remainder 1
2 divided by 2 yields 1 with remainder 0
1 divided by 2 yields 0 with remainder 1
Reading the computed remainders from bottom to top, we find 23 = (10111)two. Using the
same process, we can convert 23 to radix 5 to get 23 = (43)five.
II Digit Sets and Encodings
The standard digit set used for radix-r numbers is {0, 1, . . . , r – 1}. This digit set leads to
a unique representation for every natural number. The binary or rdix-2 digit set is {0, 1}
which is conveniently representable by electronic signals. Typically, low voltage is used
to represent 0 and high voltage denotes 1, but the reverse polarity can also be used.
Conceptually, the decimal digit values 0 through 9 could be represented by 10 different
voltage levels. However, encoding everything with 0s and 1s makes it easier for
electronic circuits to interpret and process the information speedily and reliably.
One way to encode decimal digits using binary signals is to encode each of the digits 0-9
by means of its 4-bit binary representation. The resulting binary-coded decimal (BCD)
representation is shown below:
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 7
Digit BCD representation
0 0 0 0 0
1 0 0 0 1
2 0 0 1 0
3 0 0 1 1
4 0 1 0 0
5 0 1 0 1
6 0 1 1 0
7 0 1 1 1
8 1 0 0 0
9 1 0 0 1
The use of digit values 0 through r – 1 in radix r is just a convention. We could use more
than r digit values (for example, digit values –2 to 2 in radix 4) or use r digit values that
do not start with 0 (for example, digit set {–1, 0, 1} in radix 3). In the first instance, the
resulting number system possesses redundancy in that some numbers will have multiple
representations. More on this in Section XVI.
Of course, any finite set of symbols, not just digits, can be encoded using binary signals.
The American Standard Code for Information Interchange (ASCII) is one such
convention that represents upper- and lower-case letters, numerals, punctuation marks,
and other symbols in an 8-bit byte. For example, the 8-bit ASCII codes for the ten
decimal digits are of the form 0011xxxx, where the “xxxx” part is identical to the BCD
code discussed earlier. ASCII digits take twice as much space as BCD digits and thus are
not used in arithmetic units. Even less compact than ASCII is the 16-bit Unicode which
can accommodate symbols from many different languages.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 8
III Integers
The set { . . . , –3, –2, –1, 0, 1, 2, 3, . . . } of integers is also referred to as signed or
directed (whole) numbers. The most straightforward representation of integers consists of
attaching a sign bit to any desired representation of natural numbers, leading to signedmagnitude
representation. The standard convention is to use 0 for positive and 1 for
negative and attach the sign bit to the left end of the magnitude. Here are two examples:
+27 in 8-bit signed-magnitude binary code 00011011
–27 in 2-digit decimal code with BCD digits 1 0010 0111
Another option for encoding signed integers in the range [–N, P] is the biased
representation. If we add the positive value N (the bias) to all numbers in the desired
range, unsigned integers in the range [0, P + N] result. Any method for representing
natural numbers in [0, P + N] can then be used for representing the original signed
integers in [–N, P]. This type of biased representation has only limited application in
encoding of the exponents in floating-point numbers (see Section XI).
By far the most common machine encoding of signed integers is the 2’s-complement
representation. In the k-bit 2’s-complement format, a negative value –x, with x > 0, is
encoded as the unsigned number 2k x. Figure 1 shows encodings of positive and
negative integers in the 4-bit 2’s-complement format. Note that the positive integers 0
through 7 (or 2k–1 – 1, in general) have the standard binary encoding, whereas negative
values –1 through –8 (or –2k–1, in general) have been transformed to unsigned values by
adding 16 (or 2k, in general) to them.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 9
0000
1111 0001
1110 0010
1101 0011
1100 0100
1000
1011 0101
1010 0110
1001 0111
+0
+1
+2
+3
+4
+5
+6
+7
-1
-5
-2
-3
-4
-8
-7
-6
_ +
Fig. 1. Schematic representation of 4-bit 2’s-complement code for integers in [–8, +7].
Two important properties of 2’s-complement numbers are worth noting. First, the
leftmost bit of the representation acts a the sign bit (0 for positive values, 1 for negative
ones). Second, the value represented by a particular bit pattern can be derived without a
need to follow different procedures for negative and positive values. We simply use
Equation (1), as we did for unsigned integers, except that the sign bit is considered to
have a negative weight. Here are two examples:
(01011)2’s-compl = (–0 × 24) + (1 × 23) + (0 × 22) + (1 × 21) + (1 × 20) = +11
(11011)2’s-compl = (–1 × 24) + (1 × 23) + (0 × 22) + (1 × 21) + (1 × 20) = –5
The reason for the popularity of 2’s-complement representation will become clear when
we discuss addition and subtraction in Section VI.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 10
Other complement representation systems can also be devised, but none is in widespread
use. Choosing any complementation constant M, that is at least as large as N + P + 1,
allows us to represent signed integers in the range [–N, P], with the positive numbers in
[0, +P] corresponding to the unsigned values in [0, P] and negative numbers in [–N, –1]
represented as unsigned values in [M N, M – 1]. Sometimes, M itself is used as an
alternate code for 0 (actually, –0). For example, the k-bit 1’s-complement system is based
on M = 2k – 1 and includes numbers in the range [–(2k/2 – 1), 2k/2 – 1], with 0 having two
representations: the all-0s string and the all-1s string.
IV Counting
The natural numbers are ordered. Each natural number x has a successor succ(x), which is
the next number in this order and, except when x = 0, it has a predecessor pred(x).
Counting is the act of going through the successor or predecessor sequence, beginning
with some initial value. So, (3, 4, 5, 6, . . . ) is an up-counting sequence and (15, 14, 13,
12, . . . ) is a down-counting sequence (for a detailed treatment of various types of
counters, see R.M.M. Oberman’s Counting and Counters, Macmillan, 1981). Hardware
circuits that mimic this process, advancing to the successor number or retrogressing to the
predecessor each time a count control signal is asserted, are known as up-counters and
down-counters, respectively. A k-bit modulo-2k down-counter would go into negative
2’s-complement values if it is decremented past zero. An up/down-counter can go in
either direction, depending on the value of a direction control signal.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 11
In what follows, we focus on up-counters that hold unsigned values. Asynchronous
counters can be built of edge-triggered storage elements (flip-flops) and nothing else. The
more commonly used synchronous counters are built of a register, a hardware
incrementer, and some logic that allows the incremented count or an initial value to be
loaded into the register when appropriate control signals are asserted. Figure 2 shows the
block diagram for a synchronous counter.
Count register
Mux
Incrementer
+1
Input
Load
Incr / Init
___
x + 1
x
0 1
Fig. 2. Synchronous binary counter with initialization capability.
The counter design shown in Fig. 2 is adequate for most applications. It can be made
faster by using a fast incrementer with carry-lookahead feature similar to that used in fast
adders, to be discussed in Section VII. If still higher speed is required, the counter can be
divided into blocks. A short initial block (say, 3 bits wide) can easily keep up with the
fast incoming signals. Increasingly wider blocks to the left of the initial block need not be
as fast because they are adjusted less and less frequently.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 12
V Fixed-Point Numbers
A fixed-point number consists of a whole or integral part and a fractional part, with the
two parts separated by a radix point (decimal point in radix 10, binary point in radix 2,
and so on). The position of the radix point is almost always implied and thus the point is
not explicitly shown. If a fixed-point number has k whole digits and l fractional digits, its
value is obtained from the formula:
x = Ói=–l to k–1 xi ri = (xk–1xk–2 . . . x1x0 . x–1x–2 . . . xl) r (4)
In other words, the digits to the right of the radix point are given negative indices and
their weights are negative powers of the radix. For example:
2.375 = (1 × 21) + (0 × 20) + (0 × 2–1) + (1 × 2–2) + (1 × 2–3) = (10.011)two
In a (k + l)-digit radix-r fixed-point number system with k whole digits, numbers from 0
to rk rl, in increments of rl, can be represented. The step size or resolution rl is often
referred to as ulp, or unit in least position. For example, in a (2 + 3)-bit binary fixed-point
number system, we have ulp = 2–3 and the values 0 = (00.000)two through 22 – 2–3 = 3.875
= (11.111)two are representable. For the same total number k + l of digits in a fixed-point
number system, increasing k will lead to enlarged range of numbers, whereas increasing l
leads to greater precision. Therefore, there is a tradeoff between range and precision.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 13
Signed fixed-point numbers can be represented by the same methods discussed for signed
integers: signed-magnitude, biased format, and complement method. In particular, for
2’s-complement format, a negative value –x is represented as the unsigned value 2k x.
Figure 3 shows encodings of positive and negative integers in the (1 + 3)-bit fixed-point
2’s-complement format. Note that the positive values 0 to 7/8 (or 2k–1 – 2l, in general)
have the standard binary encoding, whereas negative values –1/8 to –1 (or –2l to –2k–1, in
general) are transformed to unsigned values by adding 2 (or 2k, in general) to them.
0.000
1.111 0.001
1.110 0.010
1.101 0.011
1.100 0.100
1.000
1.011 0.101
1.010 0.110
1.001 0.111
+0
+.125
+.25
+.375
+.5
+.625
+.75
+.875
-.125
-.625
-.25
-.375
-.5
-1
-.875
-.75
_ +
Fig. 3. Schematic representation of 4-bit 2’s-complement encoding for (1 + 3)-bit fixedpoint
numbers in the range [–1, +7/8].
The two important properties of 2’s-complement numbers, previously mentioned in
connection with integers, are valid here as well; namely, the leftmost bit of the number
acts a the sign bit, and the value represented by a particular bit pattern can be derived by
considering the sign bit as having a negative weight. Here are two examples:
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 14
(01.011)2’s-compl = (–0 × 21) + (1 × 20) + (0 × 2–1) + (1 × 2–2) + (1 × 2–3) = +1.375
(11.011)2’s-compl = (–1 × 21) + (1 × 20) + (0 × 2–1) + (1 × 2–2) + (1 × 2–3) = –0.625
Conversion of fixed-point numbers from radix r to another radix R is done separately for
the whole and fractional parts. Converting the whole part was discussed in Section I. To
convert the fractional part, we can again use arithmetic in the new radix R or in the old
radix r, whichever is more convenient. With radix-R arithmetic, we simply evaluate a
polynomial in r–1 whose coefficients are the digits xi. The simplest way to do this is to
view the fractional part as an l-digit integer, convert this integer to radix R, and divide the
result by rl.
To perform radix conversion using arithmetic in the old radix r, we repeatedly multiply
the fraction y by the new radix R, noting and removing the integer part in each step.
These integer parts correspond to the radix-R digits Xi, beginning from X–1. For example,
we convert .175 to radix 2 as follows:
.175 multiplied by 2 yields .350 with integer part 0
.350 multiplied by 2 yields .700 with integer part 0
.700 multiplied by 2 yields .400 with integer part 1
.400 multiplied by 2 yields .800 with integer part 0
.800 multiplied by 2 yields .600 with integer part 1
.600 multiplied by 2 yields .200 with integer part 1
.200 multiplied by 2 yields .400 with integer part 0
.400 multiplied by 2 yields .800 with integer part 0
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 15
Reading the recorded integer parts from top to bottom, we find .175 = (.00101100)two.
This equality is approximate because the result did not converge to 0. In general a
fraction in one radix may not have an exact representation in another radix. In any case,
we simply carry out the process above until the required number of digits in the new
radix have been obtained.
VI Addition and Subtraction
We will cover only integer addition and subtraction. Fixed-point numbers that are both in
the same format can be added like integers by simply ignoring the implied radix point.
When two bits are added, the sum is a value in the range [0, 2] which can be represented
by a sum bit and a carry bit. The circuit that can compute the sum and carry bits is known
as a half-adder (HA) with its truth table and symbolic representation shown in Fig. 4. The
carry output is the logical AND of the two inputs, while the sum output is the exclusive
OR (XOR) of the inputs. By adding a carry input to a half-adder, we get a binary fulladder
(FA) whose truth table and schematic diagram are depicted in Fig. 5.
x y c s
----------------
0 0 0 0
0 1 0 1
1 0 0 1
1 1 1 0
Inputs Outputs
HA
x y
c
s
Fig. 4. Truth table and schematic diagram for a binary half-adder.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 16
x y c c s
----------------------
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Inputs Outputs
c out c in
in out x
y
s
FA
Fig. 5. Truth table and schematic diagram for a binary full-adder.
A full-adder, connected to a flip-flop for holding the carry bit from one cycle to the next,
functions as a bit-serial adder. The inputs of a bit-serial adder are supplied in synchrony
with a clock signal, one bit from each operand per clock cycle, beginning from the leastsignificant
bits. One bit of the output is produced per clock cycle and the carry from one
cycle is held and used as input in the next cycle. A ripple-carry adder, on the other hand,
unfolds this sequential behavior into space, using a cascade of k full-adders to add two kbit
numbers (Fig. 6).
x
s
y
c
x
s
y
c
x
s
y
c
x
s
y
c
c out c in
0 0
0
c 0
1 1
1
1
2 2
2
4 2
3
3
3
3
FA FA FA FA
Fig. 6. Four-bit ripple-carry binary adder.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 17
The ripple-carry design of Fig. 6 becomes a radix-r adder if each binary full-adder is
replaced by a radix-r full-adder that accepts two radix-r digits (each encoded in binary)
and a carry-in signal, producing a radix-r sum digit and a carry-out signal. Because the
carry signals are always binary and their propagation is independent of the radix r, in the
rest of this section and in Section VII, we do not deal with radices other than 2.
An adder/subtractor for 2’s-complement numbers can be built by adding a controlled
complementation circuit (a two-way multiplexer with y and its bitwise complement as the
two inputs) to an adder of any type. The resulting circuit is shown in Fig. 7. To justify
this design, note that x y can be computed by forming the 2’s-complement of y and
adding it to x. However, the 2’s-complement of y, which is 2k y, can be computed by
adding 1 to (2 k – 1) – y, the bitwise complement of y. The addition of 1 is accomplished
by inserting a carry-in of 1 into the adder when it is used to perform subtraction.
Mux
Adder
0 1
o
x y
y or y
_
s
add/sub
___
c in
Cont rolled
complementation
0 for addition,
1 for subtraction
Fig. 7. Two’s-complement adder/subtractor.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 18
VII Fast Adders
Ripple-carry adders are quite simple and easily expandable to any desired width.
However, they are rather slow, because carries may propagate across the full width of the
adder. This happens, for example, when the two 8-bit numbers 10101011 and 01010101
are added. Because each full-adder requires some time to generate its carry output,
cascading k such units together implies k times as much signal delay in the worst case.
This linear amount of time becomes unacceptable for wide words (say, 32 or 64 bits) or
in high-performance computers, though it might be acceptable in an embedded system
that is dedicated to a single task and is not expected to be fast.
A variety of fast adders can be designed that require logarithmic, rather than linear, time.
In other words, the delay of such fast adders grows as the logarithm of k. The best-known
and most widely used such adders are carry-lookahead adders. The basic idea in carrylookahead
addition is to form the required intermediate carries directly from the inputs
rather than from the previous carries, as was done in Fig. 6. For example, the carry c3 in
Fig. 6, which is normally expressed in terms of c2 using the carry recurrence
c3 = x2y2 + (x2 y2)c2
can be directly derived from the inputs based on the logical expression:
c3 = x2y2 + (x2 y2)x1y1 + (x2 y2)(x1 y1)x0y0 + (x2 y2)(x1 y1)(x0 y0)c0
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 19
To simplify the rest of our discussion of fast adders, we define the carry generate and
carry propagate signals and use them in writing a carry recurrence that relates ci+1 to ci:
gi = xi yi
pi = xi yi
ci+1 = gi + pi ci
The expanded form of c3, derived earlier, now becomes:
c3 = g2 + p2 g1 + p2 p1 g0 + p2 p1 p0 c0
It is easy to see that the expanded expression above would grow quite large for a wider
adder that requires c31 or c52, say, to be derived. A variety of lookahead carry networks
exist that systematize the preceding derivation for all the intermediate carries in parallel
and make the computation efficient by sharing parts of the required circuits whenever
possible. Various designs offer tradeoffs in speed, cost, VLSI chip area, and power
consumption. Information on the design of fast carry networks and other types of fast
adders can be found in books on computer arithmetic.
Here, we present just one example of a fast carry network. The building blocks of this
network are the carry operator which combines the generate and propagate signals for
two adjacent blocks [i + 1, j] and [h, i] of digit positions into the respective signals for the
wider combined block [h, j]. In other words
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 20
[i + 1, j] ¢ [h, i] = [h, j]
where [a, b] stands for (g[a,b], p[a,b]) representing the pair of generate and propagate
signals for the block extending from digit position a to digit position b. Because the
problem of determining all the carries ci is the same as computing the cumulative
generate signals g[0,i], a network built of ¢ operator blocks, such as the one depicted in
Fig. 9, can be used to derive all the carries in parallel. If a cin signal is required for the
adder, it can be accommodated as the generate signal g–1 of an extra position on the right.
¢ ¢ ¢ ¢
¢ ¢
¢ ¢
¢ ¢ ¢
[7, 7 ] [6, 6 ] [5, 5 ] [4, 4 ] [3, 3 ] [2, 2 ] [1, 1 ] [0, 0 ]
[0, 7 ] [0, 6 ] [0, 5 ] [0, 4 ] [0, 3 ] [0, 2 ] [0, 1 ] [0, 0 ]
g[ 0 , 1 ] p [ 0,1]
g[ 1 , 1 ] p [ 1,1]
g
p
[0,0]
[0,0]
[2, 3 ]
[4, 5 ]
[6, 7 ]
[4, 7 ]
[0, 3 ]
[0, 1 ]
Fig. 9. Brent-Kung lookahead carry network for an 8-digit adder.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 21
For a k-digit adder, the number of operator blocks on the critical path of the carry
network exemplified by Fig. 9 is 2( log2 k– 1). Many other carry networks can be
designed that offer speed-cost tradeoffs.
An important method for fast adder design, that often complements the carry-lookahead
scheme, is carry-select. In the simplest application of the carry-select method, a k-bit
adder is built of a (k/2)-bit adder in the lower half, two (k/2)-bit adders in the upper half
(forming two versions of the k/2 upper sum bits with ck/2 = 0 and ck/2 = 1), and a
multiplexer for choosing the correct set of values once ck/2 becomes known. A hybrid
design, in which some of the carries (say, c8, c16, and c24 in a 32-bit adder) are derived via
carry-lookahead and are then used to select one of two versions of the sum bits that are
produced for 8-bit blocks concurrently with the operation of the carry network, is quite
popular in modern arithmetic units.
VIII Multiplication
The simplest machine multipliers are designed to follow a variant of the pencil-and-paper
multiplication algorithm depicted in Fig. 9, where each row of dots in the partial
products bit-matrix is either all 0s (if the corresponding yi = 0) or the same as x (if yi = 1).
When we perform a k × k multiplication manually, we form all of the k partial products
and add the resulting k numbers to obtain the product p.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 22
o o o o
o o o o
---------------
o o o o
o o o o
o o o o
o o o o
---------------
o o o o o o o o
Operands
Partial
products
bit-matrix
x
y
p
Fig. 9. Multiplication of 4-bit numbers in dot notation.
For machine execution, it is easier if the cumulative partial product is initialized to 0,
each row of the bit-matrix added to it as the corresponding term is generated, and the
result of addition shifted to the right by one bit to achieve proper alignment with the next
term, as depicted in Fig. 9. In fact, this is exactly how programmed multiplication is
performed on a machine that does not have a hardware multiply unit. The recurrence
equation describing the process above is:
p(j+1) = (p(j) + yj x 2k) 2–1 with p(0) = 0 and p(k) = p (5)
|––– add –––|
|–– shift right ––|
Because by the time we are done, the right shifts will have caused the first partial product
to be multiplied by 2k, we pre-multiply x by 2k to offset the effect of these right shifts.
This is not an actual multiplication but is done by aligning x with the upper half of the 2kbit
cumulative partial product in the addition steps. Figure 10 depicts a possible hardware
realization of the foregoing shift-add multiplication algorithm. The shifting of the partial
product need not be done in a separate step but can be incorporated in the connecting
wires that go from the adder output to the doublewidth register.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 23
After k iterations, recurrence (5) leads to:
p(k) = xy + p(0)2k
Thus if p(0) is initialized to 2kz (z padded with k zeros) instead of 0, the expression xy + z
will be evaluated. This multiply-add operation is quite useful for many applications and
is performed at essentially no extra cost compared to plain shift-add multiplication.
Multiplier y
Mux
Adder
0
c out
0 1
Doublewidth partial product p
Multiplicand x
Shift
Shift
(j)
yj
yj x
Fig. 10. Hardware multiplier based on the shift-add algorithm.
The preceding bit-at-a-time multiplication scheme can be easily extended to a digit-at-atime
algorithm in a higher radix such as 4, 8, or 16. In this case, the multiplexer in Fig. 10
(which is really a bit-by-number multiplier) must be replaced with a digit-by-number
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 24
multiplier circuit (perhaps implemented as a multioperand adder), the single-bit shifts
replaced by h-bit shifts for radix-2h algorithm, and the number of iterations reduced to
from k to k/h. These faster high-radix multipliers may still be too slow for some
applications. In such cases, fully combinational tree multipliers are used in which the
addition of the partial products bit-matrix is done by means of a tree-structured
combinational circuit.
IX Fast Multipliers
Instead of developing the partial products one at a time in radix 2 or in radix 2h, we can
form all of them simultaneously, thus reducing the multiplication problem to n-operand
addition, where n = k in radix 2, n = k/2 in radix 4, and so on. For example, a 16 × 16
multiplication becomes a 16-operand addition problem in radix 2 or an 8-operand
addition problem in radix 4.
In tree multipliers, the n operands thus formed are added in two stages. In stage 1, a tree
built of carry-save adders or similar compression circuits is used to reduce the n
operands to two operands that have the same sum as the original n numbers. A carry-save
adder (see Section XVI) reduces three values to two values, for a reduction factor of 1.5,
thus leading to a log1.5(n/2)-level circuit for reducing n numbers to two. The two
numbers thus derived are then added by a fast logarithmic-time adder, leading to an
overall logarithmic latency for the multiplier (Fig. 11).
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 25
Such a full-tree multiplier is rather complex and its speed may not be needed for all
applications. In such cases, more economical partial-tree multipliers might be
implemented. For example, if about half of the partial products are accommodated by the
tree part, then two passes through the tree can be used to form the two numbers
representing the desired product, with the results of the first pass fed back to the inputs
and combined with the second set of partial products (Fig. 11). A partial-tree multiplier
can be viewed as a (very-) high-radix multiplier. For example, if 12 partial products are
combined in each pass, then a radix-212 multiplication is effectively performed.
Adder
Large tree of
carry-save
adders
. . .
All partial products
Product
Adder
Small tree of
carry-save
adders
. . .
Several partial products
Product
Logdepth
Logdepth
Fig. 11. Schematic diagrams for full-tree and partial-tree multipliers.
An array multiplier uses the same two-stage computation scheme of a tree multiplier,
with the difference that the tree of carry-save adders is one-sided (has the maximum
possible depth) and the final adder is of ripple-carry type (quite slow). An example 4 x 4
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 26
array multiplier is depicted in Fig. 12. HA and FA cells are half- and full-adders defined
in Figs. 4 and 5, respectively, and MA cells are modified full-adders, one of whose inputs
is internally formed as the logical AND of xi and yj.
The reader may well ask why such a slow tree-type multiplier is of any interest at all. The
answer is that array multipliers are quite suitable for VLSI realization, given their highly
regular design and efficient wiring pattern. They can also be readily pipelined by
inserting latches between some of the rows of cells, thus allowing several multiplications
to be performed on the same hardware structure.
0
1
2
3
x 2 x1 x0
y
y
y
p
y
3
x 3
0
0
0
0
0
0
0
0
0
0
0
p 2
p 1
p 0
p7 p6 p5 p4
FA FA HA
MA MA MA MA
MA MA MA MA
MA MA MA MA
MA MA MA MA
FA
0
Fig. 12. Array multiplier for 4-bit unsigned operands.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 27
X Division
Like multipliers, the simplest machine dividers are designed to follow a variant of the
pencil-and-paper division algorithm depicted in Fig. 13, where each row of dots in the
subtracted bit-matrix is either all 0s (if the corresponding qi = 0) or the same as y (if qi =
1). When we perform a 2k / k division manually, we form the subtracted terms one at a
time by “guessing” the value of the next quotient digit, subtract the appropriate term (0 or
a suitably shifted version of y) from the partial remainder (initialized to the value of the
dividend x), and proceed until all k bits of q have been determined. At this time, the
partial remainder becomes the final remainder z.
o o o o
----------------
o o o o | o o o o o o o o
o o o o
o o o o
o o o o
o o o o
---------------
o o o o
Dividend
Subtracted
bit-matrix
x
z Remainder
q Quotient
y Divisor
Fig. 13. Division of an 8-bit number by a 4-bit number in dot notation.
For hardware or software implementation, a recurrence equation describing the process
above is used:
z(j+1) = 2 z(j) qk–j y 2k with z(0) = x and z(k) = 2kz (6)
|shift|
|–– subtract ––|
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 28
Because by the time we are done, the left shifts will have caused the partial remainder to
be multiplied by 2k, the true remainder is obtained by multiplying the final partial
remainder by 2k (shifting it to the right by k bits). Figure 14 depicts a possible hardware
realization of the foregoing shift-subtract division algorithm. As in multiplication, the
shifting of the partial remainder need not be done in a separate step but can be
incorporated in the wires connecting the adder output to the partial remainder register.
Quotient q
Mux
Adder
c out
0 1
Partial remainder z (ini tial value x)
Divisor y
Shift
Shift
Quotient
digit
selector
Load
1
ci n
o
(j)
Fig. 14. Hardware divider based on the shift-subtract algorithm.
A comparison of Figs. 10 and 14 reveals that multipliers and dividers are quite similar
and can be implemented with shared hardware within an arithmetic/logic unit (ALU) that
performs different operations based on an externally supplied function code.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 29
As in the case of multipliers, high-radix dividers speed up the division process by
producing several bits of the quotient in each cycle. Whereas there is no counterpart to
fast tree multipliers for performing division, array dividers do exist and are structurally
quite similar to the array multiplier of Fig. 12. It is also possible to perform division by
using a sequence of multiplications instead of additions. Even though multiplications are
slower and costlier to implement than additions, advantage over additive schemes may be
gained because far fewer multiplications are needed to perform division. Details can be
found in any book on computer arithmetic.
XI Real Numbers
Integers in a prescribed range can be represented exactly for automatic processing, but
most real numbers must be approximated within the machine’s finite word width. Some
real numbers can be represented as, or approximated by, (k + l)-bit fixed-point numbers
(see Section V). A problem with fixed-point representations is that they are not very good
for dealing with very large and extremely small numbers at the same time. Consider the
two (8 + 8)-bit fixed-point numbers shown below:
x = (0000 0000 . 0000 1001)two Small number
y = (1001 0000 . 0000 0000) two Large number
The relative representation error due to truncation or rounding of digits beyond the –8th
position is quite significant for x, but it is much less severe for y. On the other hand,
neither y2 nor y / x is representable in this number format.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 30
The most common representation of numbers for computations dealing with values in a
wide range is the floating-point format. Old computers used many different floating-point
number formats, and some specialized machines still do, but for the most part, the IEEE
floating-point standard format (ANSI/IEEE Standard 754-1985; available from IEEE
Press) has been adopted by the computer industry. We thus formulate our discussion of
floating-point numbers and arithmetic exclusively in terms of this standard format. Other
formats will differ in their parameters and representation details, but the basic tradeoffs
and algorithms remain the same.
A floating-point number has three components: sign ±, exponent e, and significand s,
together representing the value ±2es. The exponent is a signed integer represented in
biased format (a fixed bias is added to it to make it into an unsigned number). The
significand is a fixed-point number in the range [1, 2). Because the binary representation
of the significand always starts with “1.”, this fixed 1 is hidden and only the fractional
part of the significand is explicitly represented.
Figure 15 shows the details of short (32-bit) and long (64-bit) floating-point formats. The
short format has adequate range and precision for most common applications
(magnitudes ranging from 1.2 × 10–38 to 3.4 × 1038). The long format is used for highly
precise computations or those involving extreme variations in magnitude (from about 2.2
× 10–308 to 1.8 × 10308). Of course in both of these formats, as explained thus far, zero has
no proper representation. To remedy this problem, and to be able to represent certain
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 31
other special values, the smallest and largest exponent codes (all 0s and all 1s in the
biased exponent field) are not used for ordinary numbers. An all-0s word (0s in sign,
exponent, and significand fields) represents +0; similarly, –0 and ±∞ have special
representations, as does any nonsensical or indeterminate value, known as “not a
number” (NaN). Certain other details of this standard are beyond the scope of this article.
Short (32-bit) format
Long (64-bit) format
Sign Exponent Significand
8 bits,
bias = 127,
-126 to 127
11 bits,
bias = 1023,
-1022 to 1023
52 bits for fractional part
(plus hidden 1 in integer part)
23 bits for fractional part
(plus hidden 1 in integer part)
Fig. 15. The ANSI/IEEE standard floating-point formats.
When an arithmetic operation produces a result that is not exactly representable in the
format being used, the result must be rounded to some representable value. The
ANSI/IEEE standard prescribes four rounding options. The default rounding mode is
round to nearest even”: choose the closest representable value and, in case of a tie,
choose the value with its least-significant bit 0. There are also three directed rounding
modes: “round toward +” (choose the next higher value), “round toward ” (choose
the next lower value), and “round toward 0” (choose the closest value that is less than the
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 32
value at hand in magnitude). With the round-to-nearest option, the maximum rounding
error is 0.5 ulp, while with directed rounding schemes, the error can be up to 1 ulp.
XII Floating-Point Arithmetic
The floating-point operations of multiplication and division are not much different from
their fixed-point counterparts. For multiplication, exponents of the two operands are
added and their significands are multiplied:
(±2e1s1) × (±2e2s2) = ±2e1+e2(s1 × s2)
Thus, a hardware floating-point multiplier consists of a significand multiplier and an
exponent adder that together compute 2es, with e = e1 + e2 and s = s1 × s2. The result’s
sign is easily obtained from the signs of the operands. This is not all, however. With s1
and s2 in [1, 2), their product will lie in [1, 4) and may thus be outside the permitted
range. If the product of the two significands is in [2, 4), dividing it by 2 via a 1-bit right
shift will put it back into the desired range. When this normalization is needed, the
exponent e must be incremented by 1 to compensate for the division of s by 2.
Floating-point division is similar and is governed by the equation:
(±2e1s1) / (±2e2s2) = ±2e1–e2(s1 / s2)
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 33
Again, given that the ratio s1 / s2 of the two significands lies in (0.5, 2), normalization
may be required for results that are less than 1. This normalization consists of multiplying
the significand by 2 via a 1-bit left shift and decrementing the resulting exponent by 1 to
compensate for the doubling of the significand. Of course, throughout the operation and
ensuing adjustments, for both multiplication and division, the hardware must check for
exceptions such as overflow (exponent too large) and underflow (exponent too small).
We next discuss floating-point addition. Subtraction can be converted to addition by
changing the sign of the subtrahend. To perform addition, the exponents of the two
operands must be equalized, if needed. Consider the addition of ±2e1s1 and ±2e2s2 , with
e1 > e2. By making both exponents equal to e1, the addition becomes:
(±2e1s1) + (±2e1(s2/2 e1–e2)) = ±2e1(s1 ± s2/2 e1–e2)
We thus see that s2 must be right-shifted by e1 – e2 bits before being added to s1. This
alignment shift is also called preshift (so named to distinguish it from the postshift needed
for normalizing a floating-point result). Figure 16 shows a complete example of floatingpoint
addition, including preshift, addition of aligned significands, and final rounding. In
this example, no postshift is needed because the result is already normalized. In general,
though, the result may need a 1-bit right shift, when it is in [2, 4), or a multibit left shift
when the addition of operands with different signs leads to cancellation or loss of
significance and one or more leading 0s appear in the result.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 34
Operands after alignment shi ft:
x = 2 1.00101101
y = 2 0.000111101101
Numbers to be added:
x = 2 1.00101101
y = 2 1.11101101
5
1
×
×
5
5
×
×
Extra bits to be
rounded off
Operand with
smaller exponent
to be preshifted
Result of addition:
s = 2 1.010010111101
s = 2 1.01001100
5
5 Rounded sum
×
×
Fig. 16. Alignment shift and rounding in floating-point addition.
A simplified block diagram for a hardware floating-point adder is shown in Fig. 17. It is
seen that once the operands are unpacked, their exponents and significands are processed
in two separate tracks whose functions are coordinated by the block labeled “Control &
sign logic.” For example, based on the difference of the two exponents, this unit decides
which operand must be preshifted. To economize on hardware, usually only one
preshifter is provided, say for the left operand of the adder. If the other operand needs to
be preshifted, the operands are physically swapped. Also, in adding operands with unlike
signs, the operand that is not preshifted is complemented. This time-saving strategy may
lead to the computation of y x when in fact x y is the desired result. The control unit
corrects this problem by forcing the complementation of the result if needed. Finally,
normalization and rounding are preformed and the exponent is adjusted accordingly.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 35
Normalize
& round
Add
Align
significands
Possible swap
& complement
Unpack
Cont rol
& sign
logic
Add/
Sub
Pack
Inputs
Output
Signs Exponents Significands
Sign Exponent Significand
x y
s
Sub
Add
Mux
Fig. 17. Simplified schematic of a floating-point adder.
XIII Function Evaluation
In many numeric calculations, there is a need to evaluate functions such as square-root,
logarithm, sine, or tangent. One approach to function evaluation is the use of an
approximating function that is easier to evaluate than the original function. Polynomial
approximations, derived from Taylor-series and other expansions, allow function
evaluation by means of addition, multiplication, and division. Here are a few examples:
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 36
ln x = 2(z + z3/3 + z5/5 + z7/7 + . . . ) where z = (x – 1)/(x + 1)
ex = 1 + x/1! + x2/2! + x3/3! + x4/4! + . . .
cos x = 1 – x2/2! + x4/4! – x6/6! + x8/8! – . . .
tan–1 x = x x3/3 + x5/5 – x7/7 + x9/9 – . . .
A second approach is convergence computation: begin with a suitable approximation and
proceed to refine the value with iterative evaluation. For example, if q(0) is an
approximation to the square root of x, the following recurrence can be used to refine the
value, using one addition, one division, and a 1-bit right shift per iteration:
q(i+1) = 0.5(q(i) + x/q(i))
The initial approximation can be obtained via table lookup based on a few high-order bits
of x or simply be taken to be a constant. As an example, suppose that we want to use this
method to extract the square root of a floating-point number. To do this, we must halve
the exponent and find the square root of the significand. If the exponent is odd, we can
subtract 1 from it and double the significand, before applying this method. As a result, the
adjusted significand will be in [1, 4). We may thus take 1.5 as our initial approximation.
The better the initial approximation, the fewer the number of iterations needed to achieve
a certain level of accuracy. A special case of convergence computation is when each
iteration leads to the determination of one digit of the result, beginning with the mostsignificant
digit. Such digit recurrence schemes are similar in nature to shift-subtract
division discussed in Section X.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 37
There are many other methods for function evaluation. As our third and final example,
we consider a method based on lookup tables which is becoming increasingly popular,
given that tables can be implemented efficiently and compactly in VLSI technology.
Within a reasonably narrow interval [x(i), x(i+1)), a function f(x) can be approximated by
the linear function a + b(x x(i)). This interpolation scheme leads to the hardware
implementation depicted in Fig. 18. The range of x values is divided into 2h intervals
based on the h high-order bits of x, which define the value xH. For each of these intervals
[xH, xH + 2–h), the corresponding a and b values of the approximating linear function a +
b(x xH) = a + bxL are stored in two tables. Function evaluation with this method thus
involves two table accesses, one multiplication, and one addition.
Add
x
Table
for a
Output
Table
for b
Input x H L
f(x)
Multiply
h bits k - h bits
Fig. 18. Function evaluation by table lookup and linear interpolation.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 38
XIV Precision and Errors
Machine arithmetic is inexact for two different reasons. First, many numbers do not have
exact binary representations within a finite word format. This is referred to as
representation error. Second, even for values that are exactly representable, floatingpoint
arithmetic produces inexact results. For example, the exactly computed product of
two short floating-point numbers will have a 48-bit significand that must be rounded to fit
in 23 bits (plus the hidden 1). This is characterized as computation error.
It is important for both the designers of arithmetic circuits and for the users of machine
arithmetic to be mindful of these errors and to learn methods for estimating and
controlling them. There are documented instances of arithmetic errors leading to disasters
in computer-controlled critical systems. Even a small per-operation error of 0.5 ulp, when
accumulated over many millions, perhaps billions, of operations needed in some
applications, can lead to highly inaccurate or totally incorrect results.
We limit our discussion of errors, their sources, and countermeasures to a few examples
from floating-point arithmetic (see D. Goldberg’s article in “further reading” for more
detailed discussion and other examples).
One way to avoid excessive error accumulation is to carry extra precision in the course of
a computation. Even inexpensive calculators use extra digits that are invisible to the user
but help ensure greater accuracy for the results. Without these guard digits, the
computation of 1/3 will produce 0.333 333 333 3, assuming a 10-digit calculator.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 39
Multiplying this value by 3 will yield 0.999 999 999 9, instead of the expected 1. In a
calculator with two guard digits, the value of 1/3 is evaluated and stored as 0.333 333 333
333, but still displayed as 0.333 333 333 3. If we now multiply the stored value by 3, and
use rounding to derive the result to be displayed, the expected value 1 will be obtained.
Use of guard digits improves the accuracy of floating-point arithmetic but does not totally
eliminate some incorrect and highly surprising results. For example, floating-point
addition is not associative in that the algebraically equivalent computations (x + y) + z
and x + (y + z) may yield very different results. Similarly. Many other laws of algebra do
not hold for floating-point arithmetic, causing difficulties in result predictability and
certification. An optimizing compiler that switches the order of evaluation for the sake of
computation speed-up may inadvertently change the result obtained!
One of the sources of difficulties is loss of precision that occurs when subtraction is
performed with operands of comparable magnitudes. Such a subtraction produces a result
that is close to 0, making the effect of previous roundings performed on the operands
quite significant in relative terms. Such an event is referred to as catastrophic
cancellation. For example, when the algebraically correct equation
A = [s(s a)(s b)(s c)]1/2
with s = (a + b + c)/2, is used to calculate the area of a needlelike triangle (a triangle for
which one side a is approximately equal to the sum b + c of the other two sides), a large
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 40
error can be produced due to the catastrophic cancellation in computing s a. A user or
programmer who is aware of this problem can use an alternate formula that is not prone
to producing such large errors.
Because of the anomalies and surprises associated with floating-point arithmetic, there is
some interest in certifiable arithmetic. An example is offered by interval arithmetic
whereby each number is represented by a pair of values, a lower bound and an upper
bound. We represent x by the interval [xl, xu] if we are certain that xl x xu. Given
interval representations of x and y, arithmetic operations can be defined in such a way as
to ensure containment of the result in the interval that is produced as output. For example:
[xl, xu] + [yl, yu] = [xl + yl, xu + yu]
In this way, we always have a guaranteed error bound and will know when a result is too
imprecise to be trusted.
The ultimate in result certification is exact arithmetic, which may be feasible in some
applications through the use of rational numbers or other forms of exact representation.
For example, if each value is represented by a sign and a pair of integers for the
numerator and denominator, then numbers such as 1/3 will have exact representations
and an expression such as (1/3) × 3 will always produce the exact result. However,
besides limited applicability, exact rational arithmetic also implies significant hardware
and/or time overhead.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 41
XV Speed and Reliability
We would, of course, like to design arithmetic circuits to be as fast as possible. The adder
or multiplier must indeed be very fast if a machine is to execute one billion arithmetic
operations per second. We now have this level of performance on some desktop
computers, with 103 times greater performance in the largest available supercomputers,
and 106 times more being worked on in research laboratories. Therefore, methods for
speeding up arithmetic algorithms and their circuit implementations form an important
part of the field of computer arithmetic.
With modern VLSI technologies, design for high speed is a challenging undertaking. The
crossing of the gigahertz milestone in microprocessor clock rates signals the fact that
many hardware operations are occurring with subnanosecond latencies. Because an
electronic signal can travel only a few centimeters in one nanosecond, the roles of
interconnects and package boundary crossings are becoming increasingly important.
Given that clock distribution accounts for a significant portion of long wires and power
dissipation on a modern VLSI chip, there is significant incentive to do away with the
clocked or synchronous design style and adopt a fully asynchronous approach.
Speed is but one of several parameters that a designer must consider. In recent years,
compactness and power economy have emerged as important attributes of an
implementation. Design for compactness requires careful attention to implementation and
packaging technologies and their various constraints. One important aspect to consider is
the pin limitation at the chip and other levels of the hardware packaging hierarchy. Power
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 42
economy is somewhat related to compactness, but it also depends on signal transition
activity and circuit speed (faster circuit technologies use more power). Signal transition
activity can be reduced via judicious choice of encoding schemes and careful algorithm
design. Circuit speed can be reduced, while still keeping the same level of performance
through various architectural schemes.
A particularly useful method of designing high-throughput arithmetic circuits without a
need for ultrahigh-speed circuit elements is pipelining. We explain the use of pipelining
and the various tradeoffs involved in its implementation through an example. Consider
the array multiplier of Fig. 12. The critical (longest) signal path through this circuit goes
through four MA, one HA, and three FA blocks. This path begins at the upper left corner
of the diagram, proceeds diagonally to the lower right and then horizontally to the lower
left corner. Assuming, for simplicity, that all three block types have the same unit-time
latency, one multiplication can be performed every 8 time units.
By cutting the design of Fig. 12 in half, right below the last row of MA blocks, and
inserting temporary storage platforms (latches) to hold intermediate signal values at that
point, we can double the throughput of our multiplier. Once the intermediate signals from
one multiplication are stored in the latches, a second multiplication can begin in the upper
part of the circuit, while the lower part completes the first multiplication. Of course, if we
insert more latches, the throughput will increase even further. The improvement is not
indefinite, though, because the insertion of more latches introduces increasing overheads
in cost and time.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 43
When results of an arithmetic unit are critical to the safety of humans or success of a
mission, some form of checking must be performed. Design methods for fault-tolerant
arithmetic form active research areas. Here, we just mention one simple method based on
residue checking (for more information on this and other types of checking, see T.R.N.
Rao’s Error Codes for Arithmetic Processors, Academic Press, 1974).
Suppose that each value is represented by appending to it the residue modulo A, where A
is a suitably chosen check constant. Then addition can be checked in the manner shown
in Fig. 19. If the mod-A sum of the two check tags does not match the mod-A value of the
computed sum s, then the presence of an error is signaled. Special attention must be paid
to the design of the various components in Fig. 19 to ensure that matching of the two
residues implies fault-free operation with very high probability.
Add
x, x mod A
Add mod A
Compare Find
mod A
y, y mod A
s, s mod A Error
Not
equal
Fig. 19. Arithmetic unit with residue checking.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 44
XVI Unconventional Number Systems
Thus far, our discussion of computer arithmetic was based mostly on standard
representations that are widely used in digital systems; namely, 2’s-complement binary
and ANSI/IEEE floating-point format. Other number representation systems are either
invisible to the computer user (used to encode intermediate values for speeding up
arithmetic operations) or are applied in the design of application-specific systems. In this
section, we briefly review two examples of number systems in each of these categories.
These are only meant to give the reader a sense that other options exist (see B. Parhami’s
textbook in “further reading” for more detailed discussion and other examples).
The carry-save (or stored-carry) representation for binary numbers is extensively used in
the design of fast multipliers and other arithmetic circuits. A carry-save number is
essentially a pair of binary numbers whose sum is the value being represented. Given
such a carry-save value, it can be added to an ordinary binary number, yielding a carrysave
result at very high speed, because the addition does not involve any carry
propagation. Figure 20 shows the addition of a binary number x to a carry-save number
composed of y and z, yielding the carry-save number composed of s and c. Comparing
Fig. 20 to Fig. 6 reveals the origins of the name “carry-save”; note that here, unlike in
Fig. 6, carries are not connected to the next FA block but are saved along with the sum
bits to form a carry-save number.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 45
x
s
x y
s
x y
s
y x
s
0 y 0
0
1 1
1
2 2
2
3
3 c 2
3
FA FA FA FA
z 3 z 2 z 1 z 0
c 4 c 3 c1
Fig. 20. Three binary numbers reduced to two numbers by a carry-save adder.
Carry-save numbers can be viewed as radix-2 numbers that use the digit set {0, 1, 2}.
From this observation, it is easy to generalize to other redundant representations such as
signed-digit numbers. For example, radix-8 numbers with the digit set [–5, 5] can be
added without carry propagation chains; the carry only affects the next higher position
and nothing else. Details are beyond the scope of this article.
The logarithmic number system (LNS) is an example of number representation for
application-specific systems. In LNS, a value x is represented by its sign and the
logarithm of its absolute value in some base. For example, if we use base-2 logarithms,
with 4 whole and 4 fractional bits, the numbers 8 and 11 are represented as 0011.0000
and 0011.0111, respectively. The key advantage of LNS is that it allows us to multiply or
divide numbers through addition or subtraction of their logarithms. For example, the
product and ratio of 11 and 8 are found as follows:
log2 11 + log2 8 = (0011.0111)two + (0011.0000)two = (0110.0111)two
log2 11 – log2 8 = (0011.0111)two + (0011.0000)two = (0000.0111)two
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 46
Addition and subtraction, on the other hand, become somewhat more complicated but
they are still manageable with help from lookup tables. Details are beyond the scope of
this article. Suffice it to say that practical applications of this representation have thus far
been limited to short word widths; however, available methods are improving and an
LNS with range equivalent to the short ANSI/IEEE floating-point format has recently
been implemented as part of a European microprocessor project.
The residue number system (RNS) has a long history dating back to the ancient Chinese.
Use of this representation method in computer arithmetic was proposed in the late 1950s.
Despite this long history, applications have remained limited due to the fact that RNS
makes some key arithmetic operations, such as division and magnitude comparison, quite
difficult. Its main advantages are simple addition and multiplication, thus making
applications in which these two operations are predominant more suitable for RNS
implementation. In RNS, each number is represented by a set of residues with respect to a
set of pairwise relatively prime moduli. For example, given the moduli set {3, 5, 7}, the
numbers 11 and 8 are represented by the set of their residues with respect to the moduli,
thus leading to the codes (2, 1, 4) and (2, 3, 1), respectively. The number of distinct
natural numbers that are represented is 3 × 5 × 7 = 105. This set of values can be used to
represent the natural numbers 0 to 104 or signed values –52 to +52.
In our example RNS with the moduli set {3, 5, 7}, adding or multiplying the numbers 11
and 8 is done by separately operating on the three residues, with each operation
performed modulo the corresponding modulus. We thus get:
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 47
(2, 1, 4) + (2, 3, 1) = (4 mod 3, 4 mod 5, 5 mod 7) = (1, 4, 5)
(2, 1, 4) × (2, 3, 1) = (4 mod 3, 3 mod 5, 4 mod 7) = (1, 3, 4)
Despite lack of widespread applications, both LNS and RNS remain important from a
theoretical standpoint. Additionally, there are indications that with advances in arithmetic
algorithms and VLSI technology, these two methods may find greater use in future (for
information on applications of RNS arithmetic in signal processing, see Residue Number
System Arithmetic, edited by M.A. Soderstrand, W.K. Jenkins, G.A. Jullien, and F.J.
Taylor, IEEE Press, 1986).
XVII Research Directions
Computer arithmetic has played a central role in the development of digital computers.
A.W. Burkes, H.H. Goldstine, and J. von Neumann, in their now classic 1946 report
entitled “Preliminary Discussion of the Logical Design of an Electronic Computing
Instrument,” set the stage for many ingenious schemes to perform fast arithmetic on early
digital computers, at a time when even the simplest circuits where bulky and expensive.
After more than half a century, research and development is still continuing unabated, for
even though many of the theoretical underpinnings of the field are now well understood,
new application challenges must be faced and old solution schemes must be adapted to
emerging technological constraints and opportunities.
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 48
Examples of active research issues in computer arithmetic at the turn of the twenty-first
century include reducing power consumption, efficient handling of low-precision
multimedia data, subnanosecond operations, area-efficient implementations, configurable
processing elements, function evaluation with no error other than that dictated by the
mandatory rounding, certifiable arithmetic, compatibility/portability of numerical
software, applying novel computational paradigms, and fundamental theoretical limits.
New results in the field appear in Proceedings of the Symposia on Computer Arithmetic,
currently held in odd-numbered years, and in archival technical journals such as IEEE
Transactions on Computers, which occasionally devotes entire special issues to the topic.
Web resources can be accessed via the author’s home page at http://www.ece.ucsb.edu.
Further Reading
[1] Flynn, M.J. and S.S. Oberman, Advanced Computer Arithmetic Design, Wiley,
2001. [Reference for advanced topics in arithmetic design and implementation.]
[2] Goldberg, D., “What Every Computer Scientist Should Know About Floating-Point
Arithmetic,” ACM Computing Surveys, Vol. 23, No. 1, pp. 5-48, Mar. 1991.
[3] Knuth, D.E., The Art of Computer Programming – Vol. 2: Seminumerical
Algorithms, Addison-Wesley, 3rd Edition, 1997. [Authoritative reference work.]
[4] Parhami, B., Computer Arithmetic: Algorithms and Hardware Designs, Oxford
University Press, 2000. [Basic textbook on computer arithmetic.]
[5] Swartzlander, E.E., Jr., Computer Arithmetic, Vols. I & II, IEEE Computer Society
Press, 1990. [Compendium of key papers of historical or practical significance.]
Number Representation and Computer Arithmetic (B. Parhami / UCSB) 49
Number Representation and Computer Arithmetic
Article to appear in Encyclopedia of Information Systems, Academic Press, 2001.
Behrooz Parhami
Department of Electrical and Computer Engineering
University of California, Santa Barbara
Santa Barbara, CA 93106-9560
E-mail: parhami@ece.ucsb.edu
Web: http://www.ece.ucsb.edu/Faculty/Parhami/
List of terms specific to the topic
Array multiplier
Carry lookahead
Convergence method
Fast carry network
Fixed-point number
Floating-point number
Full-adder
High-radix arithmetic
Redundant number
Residue arithmetic
Ripple-carry adder
Rounding
Signed magnitude
Table-lookup scheme
Tree multiplier
Two’s complement

0 commenti:

Posta un commento