Summation Of Four Primes/1002

1. Approach
2. full source
3. 문제점 - 4초 이내에 답이 나오지 않는다.
4. 최적화 2차 : 문제 영역을 줄이기
5. 최적화 3차 : PyRex
6. 한 일 대비 느낀점
7. 더 했었으면 하는 점. 보완

1. Approach ¶

맨 처음에 문제를 읽고 대략 연습장에 문제에의 각 변수들이 될만한 부분들을 보았다. 일단 소수들의 합이라 하고, 4자리의 합이라고 한다. 대략 pseudo code 를 다음와 같이 작성해보았다.

~cpp 
for eachGenNumberFour in GenNumbersFour
  isAllPrimeNumber:
      print eachGenNumberFour
      return
print "impossible"

GenNumbersFour = 합이 x 인 수 조합리스트

합이 x 인 수 조합리스트에 대해 어떻게 구할까 궁리하던중, 소수리스트를 먼저 만들고 소수리스트에서 4개를 골라서 합을 구한 결과가 n 인지를 비교하는 방법이 더 빨리 구현할 수 있겠구나라는 생각이 들었다. 이에 대해서 TDD로 진행.

~cpp 
class PrimeNumberList:
    def __init__(self,aNum):
        self.size = aNum
        self.resultTable = []
        self.createList()

    def createList(self):
        for i in range(2,self.size+1):
            if self.isPrimeNumber(i):
                self.resultTable.append(i)

    def isPrimeNumber(self,aNum):
        for eachValue in self.resultTable:
            if aNum % eachValue == 0:
                return False
        return True

    def getList(self):
        return self.resultTable

class PrimeNumberTest(unittest.TestCase):
    def testPrimeNumber(self):
        self.assertEquals([2,], PrimeNumberList(2).getList())
        self.assertEquals([2,3], PrimeNumberList(3).getList())
        self.assertEquals([2,3,5], PrimeNumberList(5).getList())
        self.assertEquals([2,3,5,7,11], PrimeNumberList(12).getList())
        self.assertEquals([2,3,5,7,11,13,17,19], PrimeNumberList(20).getList())

그리고 소수리스트로부터 4개를 구하는 방법에 대해 생각하다. 맨 처음에 대해서는 중복을 허용하면 안되는 줄 알고 구현하였다. 그러다가 문제에서 중복을 허용한다는 사실을 알고 다시 구현.

~cpp 
def selectionFour(aList):
    sizeOfList = len(aList)
    for i in range(sizeOfList):
        for j in range(sizeOfList):
            for k in range(sizeOfList):
                for l in range(sizeOfList):
                    yield [aList[i],aList[j],aList[k],aList[l]]

2. full source ¶

~cpp 
from __future__ import generators
import psyco

class PrimeNumberList:
    def __init__(self,aNum):
        self.size = aNum
        self.resultTable = []
        self.createList()

    def createList(self):
        for i in range(2,self.size+1):
            if self.isPrimeNumber(i):
                self.resultTable.append(i)

    def isPrimeNumber(self,aNum):
        for eachValue in self.resultTable:
            if aNum % eachValue == 0:
                return False
        return True

    def getList(self):
        return self.resultTable

def selectionFour(aList):
    sizeOfList = len(aList)
    for i in range(sizeOfList):
        for j in range(sizeOfList):
            for k in range(sizeOfList):
                for l in range(sizeOfList):
                    yield [aList[i],aList[j],aList[k],aList[l]]

def main():
    n = int(raw_input('input number : '))
    primeNumberList = PrimeNumberList(n)
    for eachPrimeNumberSeq in selectionFour(primeNumberList.getList()):
        if sum(eachPrimeNumberSeq) == n:
            print eachPrimeNumberSeq
            return
    print "Impossible"

if __name__=="__main__":
    #unittest.main(argv=('','-v'))
    psyco.full()
    main()

3. 문제점 - 4초 이내에 답이 나오지 않는다. ¶

스펙상 10000000 값 내에서 4초 이내에 답이 나와야 한다. 이에 대해서 현재의 병목지점에 대해 profiling 을 해 보았다.

~cpp 
C:\WINDOWS\system32\cmd.exe /c python SummationOfFourPrimes.py
input number : 10000
[2, 2, 23, 9973]
         21065 function calls in 6.387 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.075    0.075    1.201    1.201 summationoffourprimes.py:13(createList)
        0    0.000             0.000          profile:0(profiler)
        1    5.103    5.103    6.387    6.387 summationoffourprimes.py:88(main)
        1    0.000    0.000    1.201    1.201 summationoffourprimes.py:8(__init__)
     9999    1.126    0.000    1.126    0.000 summationoffourprimes.py:18(isPrimeNumber)
        1    0.000    0.000    0.000    0.000 summationoffourprimes.py:24(getList)
    11061    0.083    0.000    0.083    0.000 summationoffourprimes.py:79(selectionFour)
        1    0.000    0.000    6.387    6.387 <string>:1(?)

~cpp 
C:\WINDOWS\system32\cmd.exe /c python SummationOfFourPrimes.py
input number : 50000
[2, 2, 3, 49993]
         60269 function calls in 24.926 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.428    0.428   19.348   19.348 summationoffourprimes.py:13(createList)
        0    0.000             0.000          profile:0(profiler)
        1    5.492    5.492   24.923   24.923 summationoffourprimes.py:88(main)
        1    0.000    0.000   19.348   19.348 summationoffourprimes.py:8(__init__)
    49999   18.920    0.000   18.920    0.000 summationoffourprimes.py:18(isPrimeNumber)
        1    0.000    0.000    0.000    0.000 summationoffourprimes.py:24(getList)
    10265    0.083    0.000    0.083    0.000 summationoffourprimes.py:79(selectionFour)
        1    0.003    0.003   24.926   24.926 <string>:1(?)


Hit any key to close this window...

대부분의 시간은 소수테이블을 작성하는 부분이 된다. 그래서 이 부분에 대해서

PrimeNumber를 참고, 최적화된 알고리즘으로 수정하였다. 그리고 역시 psyco 를 이용하였다. 그 결과, 10000000 기준 10초. 기존 알고리즘의 경우 50000번 기준 24초 이상.

~cpp 
C:\WINDOWS\system32\cmd.exe /c python SummationOfFourPrimes.py
input number : 10000
[2, 2, 23, 9973]
         11067 function calls in 5.417 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.000    0.000 summationoffourprimes.py:48(getList)
        1    0.000    0.000    5.417    5.417 <string>:1(?)
        1    0.000    0.000    0.013    0.013 summationoffourprimes.py:36(createList)
        1    0.000    0.000    0.013    0.013 summationoffourprimes.py:31(__init__)
    11061    0.094    0.000    0.094    0.000 summationoffourprimes.py:103(selectionFour)
        1    5.309    5.309    5.416    5.416 summationoffourprimes.py:112(main)
        1    0.013    0.013    0.013    0.013 summationoffourprimes.py:8(prime2)


Hit any key to close this window...

~cpp 
C:\WINDOWS\system32\cmd.exe /c python SummationOfFourPrimes.py
input number : 50000
[2, 2, 3, 49993]
         10271 function calls in 7.878 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.000    0.000 summationoffourprimes.py:48(getList)
        1    0.002    0.002    7.878    7.878 <string>:1(?)
        1    0.002    0.002    0.090    0.090 summationoffourprimes.py:36(createList)
        1    0.000    0.000    0.090    0.090 summationoffourprimes.py:31(__init__)
    10265    0.098    0.000    0.098    0.000 summationoffourprimes.py:103(selectionFour)
        1    7.688    7.688    7.876    7.876 summationoffourprimes.py:112(main)
        1    0.088    0.088    0.088    0.088 summationoffourprimes.py:8(prime2)


Hit any key to close this window...

최적화 이후 50000번 기준으로는 병목에 대한 변별력이 없다.

~cpp 
C:\WINDOWS\system32\cmd.exe /c python SummationOfFourPrimes.py
input number : 1000000
[2, 2, 13, 999983]
         470994 function calls in 26.768 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.000    0.000 summationoffourprimes.py:48(getList)
        1    0.040    0.040   26.768   26.768 <string>:1(?)
        1    0.062    0.062    2.708    2.708 summationoffourprimes.py:36(createList)
        1    0.000    0.000    2.708    2.708 summationoffourprimes.py:31(__init__)
   470988    5.244    0.000    5.244    0.000 summationoffourprimes.py:103(selectionFour)
        1   18.777   18.777   26.728   26.728 summationoffourprimes.py:112(main)
        1    2.646    2.646    2.646    2.646 summationoffourprimes.py:8(prime2)


Hit any key to close this window...

여기서 selectionFour 의 경우는 percall 에 걸리는 시간은 적다. 하지만, call 횟수 자체가 470988 번으로 많다. 이 부분에 대해서 일종의 inlining 을 하였다.

~cpp 
C:\WINDOWS\system32\cmd.exe /c python SummationOfFourPrimes.py
input number : 1000000
(2, 2, 13, 999983)
         6 function calls in 16.671 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.000    0.000 summationoffourprimes.py:48(getList)
        1   13.736   13.736   16.540   16.540 summationoffourprimes.py:121(main2)
        1    0.067    0.067    2.804    2.804 summationoffourprimes.py:36(createList)
        1    2.738    2.738    2.738    2.738 summationoffourprimes.py:8(prime2)
        1    0.131    0.131   16.671   16.671 <string>:1(?)
        1    0.000    0.000    2.804    2.804 summationoffourprimes.py:31(__init__)


Hit any key to close this window...

10000000 건에 대해서는 7.49 초 기록 (profiler 를 이용할 경우 속도저하가 있기 때문에 profiler 를 끄고 psyco 만으로 실행)

4. 최적화 2차 : 문제 영역을 줄이기 ¶

비록 inlining 을 시켰지만 여전한 것은 selectionFour 부분이다. selectionFour 가 실행되는 경우의 최악의 경우는 (n 까지의 소수갯수)^4 가 된다. 이를 좀 더 smart 하게 할 방법이 없을까.
소수와 관련하여 좀 더 똑똑하게 검색할 방법이 존재하리라 생각한다.
가장 간단한 아이디어로, 4번째 값에 대해서는 바로 계산을 할 수가 있겠다.

selectionFour 를 다음과 같이 수정하였다.

~cpp 
def selectionFour(aList,aNum):
    sizeOfList = len(aList)
    for i in range(sizeOfList):
        for j in range(sizeOfList):
            for k in range(sizeOfList):
                for l in range(sizeOfList):
                    expectedLastValue = aNum-(aList[i]+aList[j]+aList[k])
                    if expectedLastValue in aList:
                        yield [aList[i],aList[j],aList[k],expectedLastValue]
                    else:
                        break

inlining 을 하지 않았음에도 6.4초대를 기록하였다. inlining 을 하면 5.7초대를 기록한다.

5. 최적화 3차 : PyRex ¶

PyRex 는 Python 코드를 C 코드로 전환해준다. 이를 이용, C 모듈로 만들어 컴파일하고 결과를 실행. 의외로 7.x 대로, PsyCo 를 썼을때보다 오히려 성능이 떨어졌다. C 코드를 보니 웬걸. 전혀 알아볼수가 없는 코드다. 차라리 깨끗하게 직접 작성해주는게 성능향상 상으로는 유리하겠다는 생각.

6. 한 일 대비 느낀점 ¶

이전에도 느낀 점이지만, 한가지 문제를 아주 깊게 풀어보려고 하는 것도 여러가지로 학습이 된다.
과정에 대한 기록. 무언가 잘 관리가 안되는 상황이 오래될때, 한 일들과 사실들을 기록해보는 것은 상당히 도움이 된다.

7. 더 했었으면 하는 점. 보완 ¶

PrimeNumber 의 최적화에 대해서. 기존에 있는 알고리즘이 아닌, 직접 최적화를 시도해보는 것으로 더 많은 것을 학습할 수 있으리라. 이번의 경우는 2시간 작업으로 계획을 잡았던 관계로.
이러한 문제의 경우 특정 알고리즘의 아주 최적화 된 결과물이 답이기 보다는, 무언가 다른 차원에서 봤을때 너무나 빨리 답이 나오게 되는 경우일것이라 추측. 전혀 다른 방법의 어프로치도 생각해보고 싶다.

SummationOfFourPrimes