RE: [math-fun] arctan: update sieving and a fun relation
For those still reading, I am using
% gcc --version gcc (GCC) 3.3.5 20050117 (prerelease) (SUSE Linux) which produces scalar but breathtakingly well optimized (and often beautiful!) machine code. Producing hand-coded assembler that beats the gcc machine code is, if possible at all, extremely hard. Machine is a AMD64 @2.2GHz, 939 socket, dual channel DDR 200, memtransfer is 2GB/sec when far beyond cache. Important CPU characteristics are Name: AMD Athlon(tm) 64 Processor 3400+ Family: 15, Model: 15, Stepping: 0 Level 1 cache (data): 64 kB, 2-way associative. 64 bytes per line, lines per tag: 1. Level 1 cache (instr): 64 kB, 2-way associative. 64 bytes per line, lines per tag: 1. Level 2 cache: 512 kB, 16-way associative 64 bytes per line, lines per tag: 1. fpu: x87 FPU sse: Streaming SIMD Extensions sse2: Streaming SIMD Extensions-2 cmov: CMOV instruction (plus FPU FCMOVCC and FCOMI) If you consider a x86 system, then go for AMD64. The document http://swox.com/doc/x86-timing.pdf tells you why, in every little detail.
have you tried to use the PathScale compiler www.pathscale.com ? Often it gives an improvement on the order of -30% in execution time compared to gcc. Also the Intel compiler (icc) should be faster than gcc. If you want, I could compile for you on both compilers on an AMD64 (Opteron) machine and send you the binaries for benchmarking... Christoph
* Pacher Christoph <Christoph.Pacher@arcs.ac.at> [Apr 26. 2006 10:01]:
For those still reading, I am using
% gcc --version gcc (GCC) 3.3.5 20050117 (prerelease) (SUSE Linux) [...]
have you tried to use the PathScale compiler www.pathscale.com ? Often it gives an improvement on the order of -30% in execution time compared to gcc. Also the Intel compiler (icc) should be faster than gcc.
If you want, I could compile for you on both compilers on an AMD64 (Opteron) machine and send you the binaries for benchmarking...
Christoph
Thanks for the offer! I just got a pathscale (trial, 30 day) license. And, no, it produces slower code: // gcc: ./log-search > t 5.11s user 0.02s system 99% cpu 5.136 total // pathscale: ./log-search > t 5.36s user 0.04s system 99% cpu 5.415 total The machine code tells my why 8-) Tries various optimization level. pathscale also isn't as good with inlining. Tried (to try) intels compiler a while ago but its install script failed. Looked like done by one who has no idea about how to code solid scripts. Anyway, intel's compiler really optimizes for intel chips who are very different, see the pdf I cited in my last mail.
participants (2)
-
Joerg Arndt -
Pacher Christoph