Re: [math-fun] "memoization"; considered harmful

12 May 2013

      I was incredibly blown away by Gosper's weird use of memoization. I've
never seen anything like that before.

I coded an implementation in Lisp after I spent a while trying to grok the
Mathematica code. This code is not optimized whatsoever. Not even
optimization declaration!

Here are my benchmarks assuming my code is correct. :) This is being run on
a mangy Mac Book Air (not a Pro!). I include two separate numbers. The
first time is the time when the cache is cleared before the benchmark, the
second number denoted by NO-CLEAR, is a persistent cache across all
benchmarks. (So I actually ran 8 benchmarks in 2 runs.)

FIRST EXAMPLE: 0 to 1 by 1/6
...
(time (mapcar #'hilbert (range 0 1 1/6)))
Timing the evaluation of (MAPCAR (FUNCTION HILBERT) (RANGE 0 1 1/6))
User time             =        0.000      (NO-CLEAR: 0.00)
System time             =        0.000
Elapsed time             =        0.000
Allocation   = 8736 bytes
0 Page faults
(0 #C(1/2 1/2) #C(0 1) #C(1/2 1/2) #C(1 1) #C(1/2 1/2) 1)

SECOND EXAMPLE: 4^-6/2 to 1 by 4^-6
...
(time (bench 6 1))
Timing the evaluation of (BENCH 6 1)
User time             =        0.068      (NO-CLEAR: 0.013)
System time             =        0.000
Elapsed time             =        0.068
Allocation   = 18878392 bytes
5 Page faults
4096

THIRD EXAMPLE: 4^-7/2 to 1/4 by 4^-7
...
(time (bench 7 1/4))
Timing the evaluation of (BENCH 7 1/4)
User time             =        0.075      (NO-CLEAR: 0.014)
System time             =        0.000
Elapsed time             =        0.077
Allocation   = 21321040 bytes
4 Page faults
4096

FOURTH EXAMPLE: 1/4 - 4^-7/2 to 1/2 by 4^-7
...
(time (bench2 7))
Timing the evaluation of (BENCH2 7)
User time             =        0.086      (NO-CLEAR: 0.009)
System time             =        0.000
Elapsed time             =        0.087
Allocation   = 20872456 bytes
6 Page faults
4097

Much better than Mathematica. :)

Cheers,

Robert Smith

You're free to do as you wish with the following code pasted below. I also
keep a permanent record of it in the following repository, which also has
the benefit of being updated:

https://bitbucket.org/tarballs_are_good/lisp-random/src/f2d5fa6dbb366f720ed7...

--- SOURCE CODE ---

(defconstant I #C(0 1))
(defconstant 1+I #C(1 1))

(declaim (inline half))
(defun half (x)
  (/ x 2))

(defparameter *cache* (make-hash-table :test 'equalp))

(defun clear-cache ()
  (setf *cache* (make-hash-table :test 'equalp)))

(defun hilbert-aux (tt a1 a0)
  (labels ((hilbert-core (n t4)
             (cond
               ((= n 0) (* I (half (- 1 (hilbert-aux (- 1 t4)
                                                     (- (* I (half a1)))
                                                     (+ a0 (half (* a1
I))))))))
               ((= n 1) (half (+ I (hilbert-aux t4
                                                (half a1)
                                                 (+ a0 (half (* a1 I)))))))
               ((= n 2) (half (+ 1+I (hilbert-aux t4
                                                  (half a1)
                                                  (+ a0 (half (* a1
1+I)))))))
               ((= n 3) (1+ (half (* I (hilbert-aux (- 1 t4)
                                                    (half (* a1 I))
                                                    (+ a0 a1))))))
               ((= n 4) 1))))
    (let ((res (gethash tt *cache*)))
      (if res
          (funcall res a1 a0)
          (progn
            (setf (gethash tt *cache*)
                  (lambda (s1 s0)
                    (/ (- a0 s0) (- s1 a1))))
            (let* ((t4 (* 4 tt))
                   (n (floor t4))
                   (t4-n (- t4 n))
                   (new (hilbert-core n t4-n)))
              (setf (gethash tt *cache*)
                    (lambda (b1 b0)
                      (declare (ignore b1 b0))
                      new))
              new))))))

(defun hilbert (x)
  (clear-cache)
  (hilbert-aux x 1 0))

--- BENCHMARK CODE ---

(defun range (start end step)
  (loop :for x :from start :to end :by step
        :collect x))

(defun bench (n &optional (upper 1))
  (loop :with bound := (/ (expt 4 n))
        :for k :from (half bound) :to upper :by bound
        :count (hilbert k)))

(defun bench2 (n &optional (upper 1/2))
  (loop :with bound := (/ (expt 4 n))
        :for k :from (- 1/4 (half bound)) :to upper :by bound
        :count (hilbert k)))

On Sat, May 4, 2013 at 4:54 PM, Bill Gosper <billgosper@gmail.com> wrote:
...
Cris Moore <moore@santafe.edu> wrote:
Of course, "memoization" is a silly neologism, redolent of
1950's office work ;-)
Only for the elderly.
when I teach this subject, I use the
perfectly good word "memorization".
Indeed. "Memoization" has never achieved full cromulence, in my book.
I've never had a problem with just "caching" results.
Phil [Carmody]
---------
I agree it's ugly.  I disagree that it's superfluous.  Foisting its
meaning onto
"memorizing" and "caching" overloads them and deprives them of connotation,
so I fear we're stuck with "memoizing".  Which when typed into
Mathematica's
Help dialog will soon find you
tutorial/FunctionsThatRememberValuesTheyHaveFound
[...]
"There is of course a trade-off involved in remembering values.
It is faster to find a particular value, but it takes more memory
space to store all of them.
You should usually define functions to remember values only if the
total number of different
values that will be produced is comparatively small, or the expense of
recomputing them is very great."
This is an understatement on a par with
"Caution: Cigarette Smoking May be Hazardous to Your Health" (1966–1970).
The performance degradation increases steeply and linearly as the cache
grows,
rapidly becoming prohibitive.
E.g., the exact Hilbert spacefilling function
Clear[Hilbert];
Hilbert[t_, a1_: 1, a0_: 0] :=
 Hilbert[t, b1_: 1,
   b0_: 0] = (Hilbert[t, s1_: 1, s0_: 0] = ((a0 - s0)/(s1 - a1));
   Module[{t4 = 4*t, n}, n = Floor[t4]; t4 -= n; Switch[n,
     0, I*(1 - Hilbert[1 - t4, -I*a1/2, a0 + a1*I/2])/2,
     1, (I + Hilbert[t4, a1/2, a0 + a1*I/2])/2,
     2, (1 + I + Hilbert[t4, a1/2, a0 + a1*(1 + I)/2])/2,
     3, 1 + I*Hilbert[1 - t4, a1*I/2, a0 + a1]/2,
     4, 1]])
actually requires memoizing to find the fixed point of infinite recursions:
In[280]:= Hilbert/@Range[0, 1, 1/6]
Out[280]= {0, 1/2 + I/2, I, 1/2 + I/2, 1 + I, 1/2 + I/2, 1}
Note how 1/6, 1/2, and 5/6 all map to the center of the target square
[0,1]×[0,i].
So we can draw an "nth order Hilbert curve" as a polygon joining the
centers
of the 4^n subsquares of [0,1]×[0,i], whose inverse images are
{c,c+1,c+2,...,c+4^n-1}/4^n, for c=1/6, 1/2, or 5/6.  (3^4^n choices.)
For a pretty but traditional picture, (choosing c=1/2) try
Timing[Graphics[
  Polygon[{Re[#], Im[#]} & /@
    Join[{1, 0}, Hilbert /@ Range[4^-3/2, 1, 4^-3]]]]]
{0.027261,...}
Immediately rerunning this gave a timing of 0.004007, thanks to memoizing.
This choice c=1/2 restricts the calculation to dyadic rationals,
permitting a much simpler, non-memoizing, non-fixed-pointing Hilbert
with termination conditions just for 0 and 1.  But what's not to like?
Memoizing is working fine.  (Digression, for a non-traditional,
 non-dyadic-rational plot that really needs memoizing, try, e.g.,
Graphics[Polygon[{Re[#],Im[#]}&/@Append[Hilbert/@Range[4^-6,1,5/2^11],1]]])
But when Neil tried an 8th order dyadic curve, it would not finish.
Even after supper.  Clearing and redefining Hilbert, a mere 6th order
curve took
In[25]:= Timing[Do[Hilbert[k], {k, 4^-6/2, 1, 4^-6}]]
Out[25]= {106.747152, Null}
That's the same number of points (4^6) as one quarter of a 7th order:
In[27]:= Timing[Do[Hilbert[k], {k, 4^-7/2, 1/4, 4^-7}]]
Out[27]= {336.662000, Null}
which takes > thrice as long!  Since Neil only needed the c=1/2
dyadic case, he disgustedly wrote a finite state machine (two bits
of t in, one bit each of x and y out).  But let us grimly proceed.
The 2nd quadrant of 7th order,
In[32]:= Timing[Do[Hilbert[k], {k, 1/4 - 4^-7/2, 1/2, 4^-7}]]
Out[32]= {394.267917, Null}
is even slower, but should have been faster due to cached values from
the first quadrant!
Try something drastic--clear the cache and redefine the whole function
for every point:
In[33]:= Timing[
 Do[Clear[Hilbert];
  Hilbert[t_, a1_: 1, a0_: 0] :=
   Hilbert[t, b1_: 1,
     b0_: 0] = (Hilbert[t, s1_: 1, s0_: 0] = ((a0 - s0)/(s1 - a1));
     Module[{t4 = 4*t, n}, n = Floor[t4]; t4 -= n; Switch[n,
       0, I*(1 - Hilbert[1 - t4, -I*a1/2, a0 + a1*I/2])/2,
       1, (I + Hilbert[t4, a1/2, a0 + a1*I/2])/2,
       2, (1 + I + Hilbert[t4, a1/2, a0 + a1*(1 + I)/2])/2,
       3, 1 + I*Hilbert[1 - t4, a1*I/2, a0 + a1]/2,
       4, 1]]);
  Hilbert[k], {k, 4^-7/2, 1, 4^-7}]]
Out[33]= {18.388266, Null}
for all four quadrants!
I can't believe WRI can't easily, dramatically remediate this.
Neil made some damning running-time plots which I hope he will blog soon.
--rwg
(Saving and resuming this draft introduced many redundant line breaks.
 Apologies if GMail has undeleted them.)
_______________________________________________
math-fun mailing list
math-fun@mailman.xmission.com
http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun