[math-fun] A fast (written in assembly) library for continued fractions
I wrote for my own purposes the following X86 assembly code, intended to be called from GFORTRAN. Such a thing should only work if you have a similar configuration to mine, but anyway, it may interest you. I wanted a fast library for computing the length in the continued fraction expansion of sqrt(n) for any integer n that can fit in a REAL*8 gfortran type for statistical purposes. I also included a faster function that tells if this length is odd or even. Though I am not an assembly expert, there should be a significant speed improvement if you use this code rather than your own fortran functions, since the whole process is performed on the FPU stack with no load/store useless travels. It may help you, it may not. You know about these kinds of software... Instructions are at the beginning of the file, as comments. Read them carefully. Code is below. Regards. ------------------ FILE quad.s BEGINS BELOW -------------------- /* Compile your piece of code with something like gfortran -fno-underscoring mycode.f90 quad.s and add freely your own flags. Enjoy ! Of course, if you don't want to use the -fno-underscoring option for some reason, you can add an underscore at the end of the global functions below, like that: .globl initfpu_ initfpu_: it should work. Remember this piece of code has the following status: Seems to work for me, maybe it will also seem for you, maybe your computer will explode, maybe it will make the earth explode, maybe God will choose not to have created the world, think about it before using it. (Anyway, if you see some kind of bug, please let me know ;-) Remember that a) this is x86 assembly code; b) it uses the GNU AS (gas) syntax; c) it uses the gfortran calling conventions; (now you understand why you can't expect it will works ;-) Usage. In your program (or subroutine or whatever), declare the function you need like: logical :: oddperiod integer*8 :: period (the two subroutines don't need to be declared). Then use integer*4 :: oldfpustatus call initfpu(oldfpustatus) before any computation; if you don't do it, it won't work. When you don't need any more the library, call restorefpu(oldfpustatus) will restore the control register of the FPU. During all that time, you shouldn't be disturbed too much, for your own mathematical operations, but remember the rounding mode of the FPU has been set to "round down" (rather than the usual "nearest"). How does it work? The whole computation is performed on the stack of the FPU; only integer values are needed for these tasks, but they are stored as REAL*10, 80 bits reals. They are not converted to anything during the whole process, thus it should work for great integers stored in your own REAL*8, 64 bits reals. Of course, it may happen (I think it will never, but didn't check) that an intermediate result can not fit in such a REAL*10 register; in that case, read above what is written about the end of the world... */ /* call initfpu(integer*4) ===================== This subroutine should be called before any other call to the library; its purpose is to set the rounding control in the FPU control register to the "round down" mode. The subroutine puts in the integer variable the initial value of the FPU control register. If you want to use often this subroutine (which you shouldn't: speed being an issue, this subroutine is intended to be used once, then the functions billions of times, not the contrary ;-), maybe removing the "fnclex" can improve the speed, since I don't think it plays a great role here. */ .globl initfpu initfpu: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax fnstcw (%eax) fwait movw (%eax), %dx andb $0xf3, %dh orb $0x4, %dh movw %dx, 2(%eax) fnclex fldcw 2(%eax) fwait movw $0, 2(%eax) leave ret /* call restorefpu(integer*4) ======================== This subroutine should be used for restoring the initial state of the FPU control register; the integer variable must contain the right value (which should have been previously set by the subroutine initfpu). */ .globl restorefpu restorefpu: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax fclex fldcw (%eax) leave ret /* sqrtint( real double precision : REAL*8 ) ========================================= This function returns the integer part of the square root of the given real. The return value has itself the real type. */ .global sqrtint sqrtint: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax fldl (%eax) fsqrt frndint leave ret /* oddperiod( real double precision : REAL*8 ) =========================================== Despite the type of the real argument, the variable has to be some integer value stored as a double precision. The function has the logical type. It returns FALSE if the argument is a perfect square or if the continued fraction expansion of its square root has an even period length; it returns TRUE in the remaining case. oddperiod(dble(25)) --> FALSE oddperiod(dble(13)) --> TRUE oddperiod(dble(12)) --> FALSE If it may help, you can declare the function as an integer*4 type rather than a logical one (output will be 0 or 1). Though it seems to work with integer*8, I am not absolutely sure it *always* will (unless you add the following instruction movl $0, %edx at the beginning of the code (for instance between the current second and third lines of code), but really why would you do it? Maybe something in the calling conventions is such that the register EDX already contains 0, but, again, who cares? If you read the instructions for the period() function, you will learn about some 2^32 limitations, but you don't have to care about that here: the length is not kept in any register during the computation and the code should work as long as the values in the FPU stack are the right one. */ .global oddperiod oddperiod: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax fldln2 fldl (%eax) fld %st fsqrt frndint fld %st fld %st fmul %st, %st fld %st(3) fsubp %st, %st(1) fcom %st(4) fstsw %ax fwait sahf ja oddperiod0 ffree %st ffree %st(1) ffree %st(2) ffree %st(3) ffree %st(4) oddperiod2: movl $0, %eax leave ret oddperiod0: fld %st(1) fadd %st(3), %st fdiv %st(1), %st frndint fld %st(2) fld %st(1) fmul %st(3), %st fsubp %st, %st(1) fxch %st(3) fsub %st(3), %st fabs fcomp %st(6) fstsw %ax fwait sahf ja oddperiod1 fld %st(3) fadd %st, %st fsubp %st, %st(1) fcomp %st(5) fstsw %ax fwait ffree %st ffree %st(1) ffree %st(2) ffree %st(3) ffree %st(4) sahf ja oddperiod2 movl $1, %eax leave ret oddperiod1: ffree %st fincstp fld %st(1) fmul %st, %st fld %st(4) fsubp %st, %st(1) fdivp %st, %st(1) frndint jmp oddperiod0 /* period( real double precision : REAL*8 ) ======================================== Despite the type of the real argument, the variable has to be some integer value stored as a double precision. The function has integer*8 type, though the output can't be greater than 2^32-1 (this choice has been made for avoiding sign problem when the result is greater (or equal) then 2^31 and smaller then 2^32. The result is the length of period in the continued fraction expansion of the square root of the given integer. If speed is really for you an obsession, remember that it is highly unprobable that you use that library for periods between the two fatidic values. Maybe it will increase a little the speed to use integer*4 which allows to delete twice the line movl $0,%edx (not the first time it appears, but the second and third one). If you try such a thing and see some speed improvement, please let me know. */ .global period period: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax movl $0, %edx fldln2 fldl (%eax) fld %st fsqrt frndint fld %st fld %st fmul %st, %st fld %st(3) fsubp %st, %st(1) fcom %st(4) fstsw %ax fwait sahf ja period0 ffree %st ffree %st(1) ffree %st(2) ffree %st(3) ffree %st(4) movl %edx, %eax movl $0, %edx leave ret period0: inc %edx fld %st(1) fadd %st(3), %st fdiv %st(1), %st frndint fld %st(2) fld %st(1) fmul %st(3), %st fsubp %st, %st(1) fxch %st(3) fsub %st(3), %st fabs fcomp %st(6) fstsw %ax fwait sahf ja period1 fld %st(3) fadd %st, %st fsubp %st, %st(1) fcomp %st(5) fstsw %ax fwait ffree %st ffree %st(1) ffree %st(2) ffree %st(3) ffree %st(4) sahf jb period2 add %edx, %edx period2: movl %edx, %eax movl $0,%edx leave ret period1: ffree %st fincstp fld %st(1) fmul %st, %st fld %st(4) fsubp %st, %st(1) fdivp %st, %st(1) frndint jmp period0
participants (1)
-
Thomas Baruchel