I said,
You can speed it up by a factor of four by converting to base 10**4. I think that's true as long as you're taking better advantage of what a machine can do with a single word in one cycle, but ...for that matter, why not generate all the digits in one pass? base = 10 ** 35 # < 32! c = 0 for j in range(32, 1, -1): # 32 down to 2. c = (c + base) / j # flooring int division. print "2." + str(c)
This reminds me of Minsky's two-register Turing machine...but it's a typical way to evaluate polynomials. In Python it looks simpler than the nested loop with factorial notation, but that's possible because Python switches to bignums automatically. For 35 digits, the short loop runs faster than the one-digit-at-a-time factorial-notation version. 35 digits, 32 terms: calc_e_fact_nota 0.00173 sec calc_e_one_pass 0.00005 sec That's unfair, though, racing the bignum math written in C against the inner loop in Python. But, as the number of digits goes up, even the Python factorial-notation version is faster than the bignum math. For instance: 100000 digits, 25206 terms: calc_e_fact_nota 1.04 sec calc_e_one_pass 19.09 sec --Steve