New subject: [math-fun] arctan: update sieving and a fun relation

25 Apr 2006

      ...
...
For those still reading, I am using
% gcc --version
gcc (GCC) 3.3.5 20050117 (prerelease) (SUSE Linux)
which produces scalar but breathtakingly well
optimized (and often beautiful!) machine code.
Producing hand-coded assembler that beats the
gcc machine code is, if possible at all, extremely hard.

Machine is a AMD64 @2.2GHz, 939 socket, dual channel DDR 200,
memtransfer is 2GB/sec when far beyond cache.
Important CPU characteristics are

  Name: AMD Athlon(tm) 64 Processor 3400+
    Family: 15,  Model: 15,  Stepping: 0
  Level 1 cache (data):  64 kB,  2-way associative.
    64 bytes per line,  lines per tag: 1.
  Level 1 cache (instr):  64 kB,  2-way associative.
    64 bytes per line,  lines per tag: 1.
  Level 2 cache:  512 kB,  16-way associative
    64 bytes per line,  lines per tag: 1.
    fpu: x87 FPU
    sse: Streaming SIMD Extensions
    sse2: Streaming SIMD Extensions-2
    cmov: CMOV instruction (plus FPU FCMOVCC and FCOMI)

If you consider a x86 system, then go for AMD64.
The document http://swox.com/doc/x86-timing.pdf
tells you why, in every little detail.
...
...
have you tried to use the PathScale compiler www.pathscale.com ? Often it gives an improvement on the order of -30% in execution time compared to gcc. Also the Intel compiler (icc) should be faster than gcc.

If you want, I could compile for you on both compilers on an AMD64 (Opteron) machine and send you the binaries for benchmarking...

Christoph

RE: [math-fun] arctan: update sieving and a fun relation

Pacher Christoph

Joerg Arndt

tags

participants (2)