* Schroeppel, Richard <rschroe@sandia.gov> [Apr 26. 2006 10:01]:
You can use very rough approximate logarithms.
Yes the code is extremely robust, you can even (in the prefill phase) just write only zeros.
In my factor sieving code,
Guess why I am so eager to have fast sieving code! Which factoring algorithm do you use it in? I'll look into factoring in the near future.
I used base cbrt2, and my logs all fit in a 9-bit field. Computing these logs is pretty fast: find the highest bit of N, left normalize, compare to the normalized 2^(1/6) and 2^(3/6).
However fast your rough-log is, my trick gives you more: I search for all X<10^14. A logarithm costs, say, 250 cycles. A logarithm is computed _just_ with each finding. I find less than 50,000 X. So, per test, the cost is (in cycles, less than) 250*50000/10^14 == 1/8000000 Unless you rough-log computes more than 8 million rough-logs _per_cycle_ your code will be slower. But then, what is the set of primes you allow? Is it rather huge? Plus what is the density of findings? (mine: 50000/10^14)
The final quotient in your factorization of X^2+1 is either close to 1, or very far away; so you can tolerate a lot of rounding noise from subtracting approximate logs. On today's processors, there's likely no benefit to using bytes, but you still only need a couple of bits of mantissa precision for your logs. You can compute log(prime) with a table lookup on the high order few bits.
Note that the float instructions go completely in parallel with the integer instructions. Further, any conversion float to int (or back) is still quite expensive, such conversions should _never_ occur in loops. I tried different data types for the quantities I use, turns out unsigned long (machine word!) and double (machine word!) do best.
But as you say, "fast enough".
Rich
Thanks again for your very helpful advice!