These notes are intended for use by students in cs1501 at the University of Pittsburgh and no one else
Navigate this page:
More Integer Multiplication
Now consider the following product:
(XH + XL) * (YH + YL) = XHYH + XHYL + XLYH + XLYL
Using our M values from the previous slide, this equals
M1 + M2 + M3 + M4
The value we want is M2 + M3, so define M23
M23 = (XH + XL) * (YH + YL)
And our desired value is
M23 – M1 – M4
Ok, all I see here is wackiness! How does this help?
Let's go back to the original equation now, and plug this back in
XY = 2NXHYH + 2N/2(XHYL + XLYH) + XLYL
= 2NM1 + 2N/2(M23 – M1 – M4) + M4
Only 3 distinct multiplications are now needed!
More Integer Multiplication
Looking back, we see that M23 involves multiplying at most (N/2)+1
, so asymptotically it is the same size as our other recursive multiplications
We have to do some extra additions and two subtractions, but these are all Theta(N) operations
Thus, we now have the following recurrence:
T(N) = 3T(N/2) + Theta(N)
This solves to Theta(Nlg3)
Now we have an asymptotic improvement over the Gradeschool algorithm
Still a lot of overhead, but for large enough N it will run faster than Gradeschool
Can we do even better?
If we multiply the integers indirectly using the Fast Fourier Transform (FFT), we can achieve an even better run-time of Theta(NlgN)
Don't worry about the details of this algorithm
But if you are interested, read the text Ch. 41
For RSA, we will be ok using a GradeSchool / Karatsuba Hybrid
GradeSchool used for smaller sizes, switching to Karatsuba for larger sizes
Karatsuba has a lot of overhead, so GS is better
for smaller number of bits
How about integer powers: XY
Natural approach: simple for loop
ZZ ans = 1;
for (ZZ ctr = 1; ctr <= Y; ctr++)
ans = ans * X;
This seems ok – one for loop and a single multiplication inside – is it linear?
Let's look more closely
Total run-time is
Number of iterations of loop *
Time per multiplication
We already know 2) since we just did it
Assuming GradeSchool, Theta(N2)
for N-bit ints
How about 1)
It seems linear, since it is a simple loop
In fact, it is LINEAR IN THE VALUE of Y
However, our calculations are based on N, the NUMBER OF BITS in Y
What's the difference?
We know an N-bit integer can have a value of up to 2N
So linear in the value of Y is exponential in the bits of Y
Thus, the iterations of the for loop are actually Theta(2N) and thus our total runtime is Theta(N22N)
is RIDICULOUSLY BAD
Consider N = 512 – we get (512)2(2512)
Just how big is this number?
Let's calculate in base 10, since we have a better intuition about size
Since every 10 powers of 2 is approximately 3 powers of ten, we can multiply the exponent by 3/10 to get the base 10 number
So (512)2(2512) = (29)2(2512) = 2530 10159
Let's assume we have a 1GHz machine (109 cyc/sec)
This would mean we need 10150 seconds
(10150/(31536000)) years 10150/108 10142 years
This is ridiculous!!
But we need exponentiation for RSA, so how can we do it more efficiently?
How about a divide and conquer algorithm
Divide and conquer is always worth a try
XY = (XY/2)2 when Y is even
how about when Y is odd?
we need a base case
XY = 1 when Y = 0
We can easily code this into a recursive function
What is the run-time?
XY = X * (XY/2)2 when Y is odd
Let's see…our problem is to calculate the exponential XY for X and Y
Let's first calculate the answer in terms of the value of Y, then we will convert to bits later
So we have a recursive call with an argument of ½ the original size, plus a multiplication (again assume we will use GradeSchool)
We'll also put the multiplication
time back in later
For now let's determine the number of function calls
How many times can we divide Y by 2 until we get to a base case?
So we have lg(Y) recursive calls
But remember that Y can have a value of up to 2N
Converting back to N, we have
lg(Y) = lg(2N) = N
Since we have one or two multiplications per call, we end up with a total runtime of
Theta(N2*N) = Theta(N3)
This is an AMAZING improvement
Consider again N = 512
N3 = 134217728 –
less than a billion
On a 1GHz machine this would take less than a second (assuming one operation per cycle – in reality it may take a few seconds)
Can we improve even more?
Well removing the recursion can always help
If we start at X, then square repeatedly, we get the same effect as the recursive calls
Square at each step, and also multiply by X if there is a 1 in the binary representation of Y (from left to right)
Ex: X45 = X101101 =
1,X X2 X4,X5 X10,X11 X22 X44,X45
1 0 1 1 0 1
Same idea as the recursive algorithm but building from the "bottom up"
See Text p. 526
Share with your friends:
The database is protected by copyright ©sckool.org 2020