Tuesday, February 16, 2010

Problem 16: Sum of Digits

Another really simple one.
2^(15) = 32768 and the sum of its digits is 3 + 2 + 7 + 6 + 8 = 26.

What is the sum of the digits of the number 2^(1000)?

Actually this is the simplest problem so far. A simple conversion to a String and an iteration is all that is needed.
numberString = str(2**1000)
sum = 0
for number in numberString:
    sum += int(number)
print sum

Problem 10: Sum of Primes

Problem 10 is fairly interesting.
The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
Find the sum of all the primes below two million.

The solution is a lot easier if you are familiar with the Sieve of Eratosthenes. Wikipedia has a fairly good article on the Sieve, in which they sum up the steps as follows;
  1. Create a list of consecutive integers from two to n: (2, 3, 4, ..., n),
  2. Initially, let p equal 2, the first prime number,
  3. While enumerating all multiples of p starting from p2, strike them off from the original list,
  4. Find the first number remaining on the list after p (it's the next prime); let p equal this number,
  5. Repeat steps 3 and 4 until p2 is greater than n.
  6. All the remaining numbers in the list are prime.

Seems easy enough, here is an fairly direct translation in Python;
start = 2
numbers = range(start, n + 1)

for i in numbers:
    multiple = i
    while i * multiple <= n:
        try:
            numbers.remove(i * multiple)
        except ValueError:
            pass
        multiple += 1
        
return numbers

Python's ability to remove items from a List seems to be lacking here; at the very least the code is a little more terse then usual. The remove operation raises an error is the element is not present, so telling it to pass will simply ignore the resulting error. I am sure there is a terser way to represent the Sieve in Python, but the code above emphasizes readable and is comparable to pseudo code, except for the error handling.

The problem with the previous implementation is that it is slow. with a N of 10000 taking about a second, a N of 100000 taking ~101 seconds, and a N of 1000000 uncomfortably heating up my laptop, though it should clock in at about an hour.

Excuse my sloppy Big O notation, but according to Wikipedia, the complexity of the Sieve is O(n(logn)(loglogn)). 

The red flag here is the remove call. While I'm not sure how Python implements List.remove, according to the documentation it removes the first instance of the element it is searching for. So the method needs to iterate the list until it finds the element, and then readjust the remainder of the List, which is a costly operation, especially inside a for and while loop. I believe this would add another n to the complexity, O(n^2(logn)(loglogn)). (Edit: please disregard my sloppy Big O notation here, without knowing the implementation details behind the Python List remove operation I can only hazard a guess. We know that the remove operation stops at the first instance of the char, but then the List must be have an element removed, which can vary in cost depending on the data structure used. Analysis of this operation requires Python guru skills that I currently lack.)  I am really out of practice with Big O notation, so please correct me if I am wrong here.

It took me awhile to figure out how to get the run time of the sieve down, I eventually went online to try to find a hint of how to improve performance. Someone on the internet recommended replacing the composite numbers with zeros. It is painfully obvious once you see it; get rid of the remove operation altogether. Instead of removing the non-prime (composite) elements, replace them with a placeholder value.

Here is the improved implementation;
start = 2
numbers = range(start, n + 1)
placeholder = 0

#set composite numbers as a placeholder value
for i in range(len(numbers)):
    cur = numbers[i]
    if cur != placeholder:
        multiple = cur
        while cur*multiple <= n:
            listPos = (cur*multiple) - start
            numbers[listPos] = placeholder
            multiple += 1

#make a list of primes
primes = []
for i in numbers:
    if i != placeholder:
        primes.append(i)

return primes

This new implementation can handle the two million that we were looking for in problem in about 2 seconds. All that is left is to sum the primes. Here is the complete code;
def sieve(n):
    start = 2
    numbers = range(start, n + 1)
    placeholder = 0
    
    #set composite numbers as a placeholder value
    for i in range(len(numbers)):
        cur = numbers[i]
        if cur != placeholder:
            multiple = cur
            while cur*multiple <= n:
                listPos = (cur*multiple) - start
                numbers[listPos] = placeholder
                multiple += 1
    
    #make a list of primes
    primes = []
    for i in numbers:
        if i != placeholder:
            primes.append(i)
    
    return primes

if __name__ == '__main__':
    primes = sieve(2000000)
    sum = 0
    for i in primes:
        sum += i
    print sum

Wednesday, February 10, 2010

Problem 8: Consecutive Digits

Another fairly simple problem.
Find the greatest product of five consecutive digits in the 1000-digit number.
73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450

All you have to do is parse through the 1000 char string, taking chunks of 5 characters at a time and then seeing which subset has the largest product.
#raw input is large string from the problem
# surrounded by triple quotes
#remove whitespace from raw input
input = ''.join(rawInput.split())

position = 0
largestProduct = 0
while position < len(input) - 5:
    product = 1
    for i in range(position, position + 5):
        product *= int(input[i])
    if product > largestProduct:
        largestProduct = product
    position += 1
print 'Largest Product: ', largestProduct