# These notes are intended for use by students in cs1501 at the University of Pittsburgh and no one else

 Page 15/24 Date 03.05.2017 Size 186.06 Kb. #18863

## More Integer Multiplication

• Now consider the following product:
• (XH + XL) * (YH + YL) = XHYH + XHYL + XLYH + XLYL
• Using our M values from the previous slide, this equals
• M1 + M2 + M3 + M4
• The value we want is M2 + M3, so define M23
• M23 = (XH + XL) * (YH + YL)
• And our desired value is M23 – M1 – M4
• Ok, all I see here is wackiness! How does this help?
• Let's go back to the original equation now, and plug this back in
• XY = 2NXHYH + 2N/2(XHYL + XLYH) + XLYL
• = 2NM1 + 2N/2(M23 – M1 – M4) + M4
• Only 3 distinct multiplications are now needed!

## More Integer Multiplication

• Looking back, we see that M23 involves multiplying at most (N/2)+1 bit integers, so asymptotically it is the same size as our other recursive multiplications
• We have to do some extra additions and two subtractions, but these are all Theta(N) operations
• Thus, we now have the following recurrence:
• T(N) = 3T(N/2) + Theta(N)
• This solves to Theta(Nlg3)  Theta(N1.58)
• Now we have an asymptotic improvement over the Gradeschool algorithm
• Still a lot of overhead, but for large enough N it will run faster than Gradeschool
• See karat.txt

## Integer Multiplication

• Can we do even better?
• If we multiply the integers indirectly using the Fast Fourier Transform (FFT), we can achieve an even better run-time of Theta(NlgN)
• Don't worry about the details of this algorithm
• But if you are interested, read the text Ch. 41
• For RSA, we will be ok using a GradeSchool / Karatsuba Hybrid
• GradeSchool used for smaller sizes, switching to Karatsuba for larger sizes

## Exponentiation

• How about integer powers: XY
• Natural approach: simple for loop
• ZZ ans = 1;
• for (ZZ ctr = 1; ctr <= Y; ctr++)
• ans = ans * X;
• This seems ok – one for loop and a single multiplication inside – is it linear?
• Let's look more closely
• Total run-time is
• Number of iterations of loop *
• Time per multiplication

## Exponentiation

• We already know 2) since we just did it
• It seems linear, since it is a simple loop
• In fact, it is LINEAR IN THE VALUE of Y
• However, our calculations are based on N, the NUMBER OF BITS in Y
• What's the difference?
• We know an N-bit integer can have a value of up to  2N
• So linear in the value of Y is exponential in the bits of Y
• Thus, the iterations of the for loop are actually Theta(2N) and thus our total runtime is Theta(N22N)
• Consider N = 512 – we get (512)2(2512)
• Just how big is this number?

## Exponentiation

• Let's calculate in base 10, since we have a better intuition about size
• Since every 10 powers of 2 is approximately 3 powers of ten, we can multiply the exponent by 3/10 to get the base 10 number
• So (512)2(2512) = (29)2(2512) = 2530  10159
• Let's assume we have a 1GHz machine (109 cyc/sec)
• This would mean we need 10150 seconds
• (10150sec)(1hr/3600sec)(1day/24hr)(1yr/365days) =
• (10150/(31536000)) years  10150/108  10142 years
• This is ridiculous!!
• But we need exponentiation for RSA, so how can we do it more efficiently?

## Exponentiation

• How about a divide and conquer algorithm
• Divide and conquer is always worth a try
• Consider
• XY = (XY/2)2 when Y is even
• how about when Y is odd?
• Naturally we need a base case
• XY = 1 when Y = 0
• We can easily code this into a recursive function
• What is the run-time?
• XY = X * (XY/2)2 when Y is odd

## Exponentiation

• Let's see…our problem is to calculate the exponential XY for X and Y
• Let's first calculate the answer in terms of the value of Y, then we will convert to bits later
• So we have a recursive call with an argument of ½ the original size, plus a multiplication (again assume we will use GradeSchool)
• We'll also put the multiplication time back in later
• For now let's determine the number of function calls
• How many times can we divide Y by 2 until we get to a base case?
• lg(Y)

## Exponentiation

• So we have lg(Y) recursive calls
• But remember that Y can have a value of up to 2N
• Converting back to N, we have
• lg(Y) = lg(2N) = N
• Since we have one or two multiplications per call, we end up with a total runtime of
• Theta(N2*N) = Theta(N3)
• This is an AMAZING improvement
• Consider again N = 512
• N3 = 134217728 – less than a billion
• On a 1GHz machine this would take less than a second (assuming one operation per cycle – in reality it may take a few seconds)

## Exponentiation

• Can we improve even more?
• Well removing the recursion can always help
• If we start at X, then square repeatedly, we get the same effect as the recursive calls
• Square at each step, and also multiply by X if there is a 1 in the binary representation of Y (from left to right)
• Ex: X45 = X101101 =
• 1,X X2 X4,X5 X10,X11 X22 X44,X45
• 1 0 1 1 0 1
• Same idea as the recursive algorithm but building from the "bottom up"
• See Text p. 526