## Friday, February 5, 2010

### SICP Exercise 1.22: Timed Prime Test

From SICP section 1.2.6 Example: Testing for Primality

Exercise 1.22 introduces a new primitive called runtime...
Most Lisp implementations include a primitive called runtime that returns an integer that specifies the amount of time the system has been running (measured, for example, in microseconds).

Unfortunately, this turns out not to be the case for Scheme. After a bit of research, I found what I think to be a suitable replacement: current-inexact-milliseconds. It's not as precise as I'd like, but there are a couple of workarounds to this problem that I'll explain shortly.

The exercise goes on to to introduce the timed-prime-test procedure, which prints its input n and tests to see if n is prime. If n is prime, the procedure prints three asterisks followed by the amount of time used in performing the test. Here is the modified version that I'll be using, including all of the required support procedures from the book:
(define (timed-prime-test n)   (newline)   (display n)   (start-prime-test n (current-inexact-milliseconds)))(define (start-prime-test n start-time)   (cond ((prime? n)          (report-prime (- (current-inexact-milliseconds) start-time)))))(define (report-prime elapsed-time)   (display " *** ")   (display elapsed-time))(define (smallest-divisor n)   (find-divisor n 2))(define (find-divisor n test-divisor)   (cond ((> (square test-divisor) n) n)         ((divides? test-divisor n) test-divisor)         (else (find-divisor n (+ test-divisor 1)))))(define (divides? a b)   (= (remainder b a) 0))(define (square x)   (* x x))(define (prime? n)   (= n (smallest-divisor n)))

Our task (finally) is to use timed-prime-test to write a procedure named search-for-primes that checks the primality of consecutive odd integers in a specified range. We're instructed to use this procedure to find the three smallest primes larger than 1000, 10000, 100000, and 1000000. Since the prime? procedure has an order of growth of O(√n), we should expect a run time increase of a factor of √10 at each step. We'll use the output from timed-prime-test to check this assumption.

The following procedure checks the primality of consecutive odd integers in the range from start to end.
(define (search-for-primes start end)   (if (even? start)       (search-for-primes (+ start 1) end)       (cond ((< start end) (timed-prime-test start)                            (search-for-primes (+ start 2) end)))))(define (even? n)   (= (remainder n 2) 0))

The procedure initially checks to see if the start input argument is even. If it is, the procedure simply starts over with the next (odd) integer. Next the procedure checks to see if start is less than end. If so, it performs a timed-prime-test on start, then recursively calls itself with the next odd integer as a starting value. This process will continue until start exceeds end.

With a little bit of trial and error we can find start and end values that surround the first three primes larger than our target values. Here's a sample run starting at 1000.
> (search-for-primes 1000 1020)10011003100510071009 *** 1.010111013 *** 1.0101510171019 *** 0.0

We can tell from this output that our timer's resolution isn't fine enough to give us meaningful results on modern desktop hardware (even if it is cheap). I see two things we can do to work around this problem. First, we could ignore the suggested inputs from the exercise and just keep increasing the input values until we get meaningful measurements. Second, we could run prime? in a loop and measure how long it takes to test a value 1000 (or maybe more) times. The code is ready as-is for the first option, so that's what I'm going to try. (If anyone implements the second option, please leave a comment so we can compare results.)

The first consistent results I got were by using a start value of 1010. It took an average of 154 milliseconds to test each of the first three primes after that value. The following table shows details of the time required to test the primality of numbers spread over several orders of magnitude.

start primes time(ms) avg time(ms) ratio
1010 10000000019 141 155.67 -
10000000033 172
10000000061 154
1011 100000000003 516 512 3.289
100000000019 493
100000000057 527
1012 1000000000039 1627 1578.33 3.082
1000000000061 1559
1000000000063 1549
1013 10000000000037 5014 4933.67 3.126
10000000000051 4932
10000000000099 4855
1014 100000000000031 15745 15876 3.218
100000000000067 16022
100000000000097 15861
1015 1000000000000037 48950 48931.33 3.082
1000000000000091 48836
1000000000000159 49008

The last column shows the ratio of the average time for each row to the average time for the preceding row. These ratios stay very close to √10, which is approximately 3.162. These results do seem to verify that programs on my machine run in time proportional to the number of steps required for the computation.

Related:

For links to all of the SICP lecture notes and exercises that I've done so far, see The SICP Challenge.

Troy said...

I think you have some transcription errors in your table. You list the first three primes after 10**10 as

10000000019
10000000019
10000000019

Also, all of the "primes" after 10**15 are divisible by 10.

pbewig said...

Use the time function. See http://programmingpraxis.codepad.org/b4BtoRwq.

Bill the Lizard said...

Troy,
You have a sharp eye! The first set of errors were all my bad, since I just copied the first row to fill in the table, then somehow forgot to correct that group. In the second set (in the 10^15 group), the numbers were actually too big for Excel, where I had saved them temporarily, so the values were rounded off. Still, that was my fault for blindly copying and pasting them into my blog editor. I really appreciate the correction.

Bill the Lizard said...

pbewig,
Good suggestion. That is one of the options that I looked at when I realized 'runtime' wasn't going to work. I'd hoped I'd be able to rewrite 'timed-prime-test' using 'time', but I couldn't figure out a clean way to conditionally report the timing stats only if the input was prime. The only way I could get it to work was to test once for primality, then test again wrapped in a 'time' call.

Alex Marandon said...

The 'runtime' function is available in DrRacket with http://www.neilvandyke.org/racket-sicp/

James W Dunne said...

As suggested above and also on a StackOverflow question (see link below), there is a Racket package for SICP.

The install process for this package has got to be the smoothest I've ever experienced when installing packages for a language:

Replace #lang racket with #lang planet neil/sicp

That's it!

Credit goes to orange80's answer on the following StackOverflow question:

http://stackoverflow.com/questions/3597781/dr-racket-problems-with-sicp

Benoît Fleury said...

I wonder if you start getting consistent results after 1e10 because it is the first order of magnitude that requires a double (1e10 > 2^32 > 1e9).

Naeblis said...

Hmm. I get somewhat accurate results for the time measurements using current-inexact-milliseconds:

;; Time taken to find the first 3 primes larger than 1000:
;; 1009 *** 0.032958984375
;; 1013 *** 0.03515625
;; 1019 *** 0.032958984375

;; Time taken to find the first 3 primes larger than 10000:
;; 10007 *** 0.10498046875
;; 10009 *** 0.10498046875
;; 10037 *** 0.10400390625

;; Time taken to find the first 3 primes larger than 100000:
;; 100003 *** 0.328857421875
;; 100019 *** 0.328125
;; 100043 *** 0.328125

=Y= said...

I just used this function in this exercise (on Racket):

(define (runtime)
(inexact->exact (* (current-inexact-milliseconds)
1000)))

That returns time in microseconds. Of course there are rough edges because of maybe the GC, but timing the same set of input multiple times got me what I wanted. Here is my data:

Range Avg. time (mu s) Factor of increase
---------------- ---------------- ------------------
100-200 6.0
1,000-1,100 6.3 x 1.3
10,000-10,100 12.0 x 1.9
100,000-100,100 66.0 x 5.5
1,000,000-1,000,100 210.0 x 3.2
10e7-... 350.0 x 1.66
10e8-... 1250.0 x 3.57
10e9-... 3800.0 x 3.04
10e10-... 10800.0 x 2.88

I can see that in the lower ranges, the O(sqrt(n)) behaviour is not apparent, but as the ranges are increased, it gets closer to the expected behaviour. One possible explanation is that for smaller values, the dominant time doesn't come from the number of steps, but from the overhead of function calls and other runtime overheads and as the values increase, the iteration tends to shadow the constant overheads.

Vishy Ranganath said...
This comment has been removed by the author.
Vishy Ranganath said...
This comment has been removed by the author.
Vishy Ranganath said...

Hi Bill,