Agreed. Blind men & elephants come from designing & evaluating at the same time. I'm trying to figure out what major mechanisms are required to achieve "essentially" optimal results w/o overdesigning. E.g., I've been resisting allowing "massive" simultaneous rearrangements, but they may be required in order that worst-case behavior doesn't dominate the average case behavior. Why I'm doing this: I've been unhappy with the standard RAM model which assumes constant memory access time for all memory accesses, which is complete hogwash. What I'm really trying to design/model here is a memory "cache", but a bizarre type of cache in that every cache line has a slightly different access time, with more "remote" elements being monotonically slower than "closer" elements. As in modern microprocessors, the speed of the fastest cache lines may be 3 orders of magnitude faster than the slowest cache lines. Furthermore, the cost of a cache line is directly correlated with its speed, so there are far more slower elements than faster ones. I haven't settled down on a particular cost model, because I wanted to play with different cost models. As usual, we want most of the "work" to be done in the closest/fastest cache lines, including any work of rearranging the order of the cache lines. This is because "small" rearrangements can be much faster than "large" rearrangements. These are all "exclusive" caches, meaning that a particular item lives in exactly one cache line. I think I'm close to deciding that some massive rearrangements -- e.g., replacing the first N elements of the fastest part of the cache with N sequential remote elements -- will be a necessary builtin feature. However, it is likely that the cost of such a massive rearrangement will have to be correlated with the "distance" as well as the number of items. Furthermore, it may be necessary and/or convenient to have the number of elements (N) involved in each transfer be correlated with the distance. There is a tradeoff here because a large transfer from "far" may bring in something you need, but also a lot of other stuff you did't want -- thus "polluting" the fast cache. This is reminiscent of 1950's/1960's style programming where large blocks of data have to be brought in/out by "channel operations"; some of these channels were even smart enough to transfer data *within* main memory w/o even touching an I/O device. At 12:44 PM 4/20/2018, Tomas Rokicki wrote:
I think we need some sort of explicit cost model here. Otherwise we are each blindly touching a different part of the elephant.