Re: [math-fun] Subset-lex: did we miss an order?
"Iterators" -- at least as currently envisioned -- is one of the worst ideas to ever hit computer science. The biggest problem with iterators is that they are _stateful_, which means that formal reasoning about them is at least 10x harder than reasoning about _stateless_ constructs like recursion. The earliest use of iterators that I can find was in Doug Ross's "AED-0" (Algol Extended for Design) language. Briefly, AED's "strings" (AED name for abstract sequences) allowed for looping/iterating over various types of sequences in an sequence-independent way. AED's iterators used callback mechanisms to allow what was then known as "co-routining", but is now simply called "call-backs". AED's implementation of strings was quite similar to C++, in that each type of string had a vector of subroutine addresses to string operations. Recursion, on the other hand, keeps all of its "state" in its arguments, even when that state is in the form of an abstract data type. If your compiler understands _tail recursion_, it can trivially convert simple tail-recursive loops into traditional C/Fortran loops, so there is NO overhead for expressing yourself recursively instead of iteratively. The major problem with doing tail recursions these days is the exceedingly bad idea of forcing the compiler to keep the useless stack frames around to support a very heavyweight exception mechanism in languages like C and even Common Lisp. (This heavyweight exception mechanism for C is now the target of hacking, because the gcc exception handler has a _Turing-complete_ programming language for walking the stack ! Hackers can easily take over this mechanism to do dastardly deeds.) Iterators can be trivially emulated using function _closures_. Since function closures are far more interesting than iterators, it was silly to waste all of that research horsepower for 25-30 years on iterators, instead of going directly to the big hammer: function closures. My entire 50-year career in computer science has been spent scratching my head about the silliness involved in the design of programming languages. Every computer language starts out as "simple", but is forced to add certain features in order to support reasonable programming. Fortran didn't think recursion was necessary. Oops! C didn't implement even "downward" closures, because the Bell Labs folks thought that Algol & PL/I's lexical scoping were too complicated. Oops! C/C++ didn't implement automatic garbage collection because they thought that the programmer knew better. We now know that most programmers haven't a clue about how hard memory management is, and we've now spent in excess of $1 trillion on memory management-related bugs, including the entire computer security field. Fortran, C/C++, etc., didn't bother with array bounds checking. Perhaps 90% of software bugs have to do with array bounds violations. (I may have been the only person to actually utilize PL/I's array bounds checking code; it found innumerable bugs.) Oops! The more you study programming languages, the more you appreciate the power and elegance of the lambda-calculus. Entire decades of computer science research has gone into the understanding of the power inherent in this extremely elegant system: * lexical scoping of variables * function closures * partial application of functions * lazy evaluation of argument expressions * arbitrary recursion via the Y combinator * tail recursion via the eta rule About the only feature of the lambda calculus not normally found in modern computer languages is the lazy evaluation of argument expressions. While this feature can be partially emulated using function closures, it can still provide exceedingly elegant implementation of algorithms like "spigot" algorithms, where the print function decides how many digits of precision to output, and "sucks" these digits out of the rest of the program. At 07:37 AM 6/10/2014, Fred Lunnon wrote:
I scratched my head over this, but now I think these "iterators" must be the same thing as loopless generators --- see http://en.wikipedia.org/wiki/Loopless_algorithm This topic has surfaced previously on math-fun, with respect to various combinatorial problems; two insights have arisen during my investigations.
Firstly, that a decent object-oriented / ADT programming environment greatly facilitates the implementation and useability of such tools.
Secondly, recursion -- hoary standby of so many programming courses -- and looplessness seem mutually incompatible. Converting one type of algorithm into the other is a process which somehow should be mechanical, but in practice always proves next to impossible in nontrivial situations.
Fred Lunnon
On 6/10/14, Joerg Arndt <arndt@jjj.de> wrote:
I missed this message...
I agree that a project "iterators" as suggested below appears worthwhile. The chances of adoption in other programs (say, Sage) appear low, however.
This is way I concentrate an finding new orders with interesting properties instead. "interesting properties" would be "changes only at end and of bounded length" for these subset-lex and SL-Gray orders.
It seems hard to believe that the Gray code for compositions isn't already known; I tried hard to find it in the literature, to no avail.
Best regards, jj
* rcs@xmission.com <rcs@xmission.com> [May 28. 2014 12:13]:
Once upon a time, I worked at a Lisp-based company. The main technical guru had built a package of "iterators" that fit into the Lisp loop sub-language. Simple iterators were to count through a range, to march down a list or array, and to run through special sets relevant to the company.
I later wrote a long list of iterator-combiners that would take cross-products of iterations in various orders, and handle instructions like "next", and perhaps "previous", "skip to next row", "goto position N", and so on. I never wrote any actual code.
I think this might be a worthwhile activity, at least as an intellectual exercise.
Rich
* Henry Baker <hbaker1@pipeline.com> [Jun 10. 2014 18:10]:
"Iterators" -- at least as currently envisioned -- is one of the worst ideas to ever hit computer science.
No. My search of those grid-filling curves would have been impossible without it. Even for iterators it would have taken > 7 years for _one_ of the searches near end of what I have done (which took 41 hours). Recursively: not likely in < 20 years. Read the timings in "did we miss an order?". Everybody and his dog now teaches parallelization, but few pay attention that prior to that your code should be non-crappy. That's why I do a course "HPC" that does "everything" _but_ parallelization. My record so far: a factor of 10k. The guy could do in minutes on his laptop what 200+ cores did in weeks on a cluster. I have seen a grown man cry.
[...]
(This heavyweight exception mechanism for C is now the target of hacking, because the gcc exception handler has a _Turing-complete_ programming language for walking the stack ! Hackers can easily take over this mechanism to do dastardly deeds.)
C++ has exceptions, C doesn't. C++ exceptions, to the best of my knowledge, are as cheap as it gets performance-wise.
[...]
C/C++ didn't implement automatic garbage collection because they thought that the programmer knew better.
Nope, rather the option of max performance was never abandoned. Good!
We now know that most programmers haven't a clue about how hard memory management is, and we've now spent in excess of $1 trillion on memory management-related bugs, including the entire computer security field.
Just keep those monkeys doing Java.
Fortran, C/C++, etc., didn't bother with array bounds checking.
I do not think that I want to check those bounds 10^9 times a second when I _know_ it is useless.
Perhaps 90% of software bugs have to do with array bounds violations. (I may have been the only person to actually utilize PL/I's array bounds checking code; it found innumerable bugs.) Oops!
Ach ja!: valgrind exists. (Yes, I did find a few bugs in my collection of high performance crappyness; valgring: good!). During runtime in a shipped product? Erm, please no. And then, maybe, yes: people are so used to Microsoft performance that they'd never notice.
The more you study programming languages, the more you appreciate the power and elegance of the lambda-calculus. Entire decades of computer science research has gone into the understanding of the power inherent in this extremely elegant system:
* lexical scoping of variables * function closures * partial application of functions * lazy evaluation of argument expressions * arbitrary recursion via the Y combinator * tail recursion via the eta rule
Performance? Am I the only person alive that is massively annoyed by the idiotic lack of performance these days? Today sucking at a _battery_, energy costs at least 1000 x over mains energy. Yes I owe a dumb phone (recent poll: all students had a smart phone, requiring one recharge a day. Good luck with your emergency call.)
About the only feature of the lambda calculus not normally found in modern computer languages is the lazy evaluation of argument expressions. While this feature can be partially emulated using function closures, it can still provide exceedingly elegant implementation of algorithms like "spigot" algorithms, where the print function decides how many digits of precision to output, and "sucks" these digits out of the rest of the program. [...]
Best, jj (aka performance mongrel) P.S.: if I WDS'ed a bit too much... it's past beer o'clock (the Australian apology).
Decent compiler optimizations can move most array-bounds checks out of loops. If yours can't, you need a better compiler. In no case should you eliminate an array-bounds check that your compiler (or you) can't rigorously prove isn't required. Part (most) of the problem with array bounds checking was the idiotic separation of the loop index from the array itself. for i=0 to n-1 do something with array[i] The compiler has to figure out for itself that i is within the bounds of array, and this is (in general) very difficult. However, if I say map(lambda(x) do something with x, array), then map _already knows_ that we are within bounds, so no further checking is required. And no, there is no performance penalty for using a mapping operation plus a lambda; the mapping gets open-coded, as does the lambda. Believe it or not, this sort of optimization is actually easier than the Presburger arithmetic necessary to handle many array index optimizations. So most of the stupid array checks are due to bad programming language design. Yes, there will be cases where the array index operations become more interesting; Knuth loves these sorts of things. Nevertheless, perhaps 95-98% of the array index operations can be done with various mapping & folding operations, which eliminate the need for run-time index checking. At 10:45 AM 6/10/2014, Joerg Arndt wrote:
Fortran, C/C++, etc., didn't bother with array bounds checking.
I do not think that I want to check those bounds 10^9 times a second when I _know_ it is useless.
Perhaps 90% of software bugs have to do with array bounds violations. (I may have been the only person to actually utilize PL/I's array bounds checking code; it found innumerable bugs.) Oops!
Ach ja!: valgrind exists. (Yes, I did find a few bugs in my collection of high performance crappyness; valgring: good!).
During runtime in a shipped product? Erm, please no. And then, maybe, yes: people are so used to Microsoft performance that they'd never notice.
Since a large fraction of current bugs arise from bounds check failures, another approach is to make the array pointer itself aware of the array bounds. This is a paper describing a low-overhead approach to doing this which we are implementing as part of a clean-slate processor/language design: http://www.crash-safe.org/sites/default/files/fatptr_ccs2013_0.pdf On Jun 10, 2014, at 2:40 PM, Henry Baker <hbaker1@pipeline.com> wrote:
Decent compiler optimizations can move most array-bounds checks out of loops. If yours can't, you need a better compiler.
In no case should you eliminate an array-bounds check that your compiler (or you) can't rigorously prove isn't required.
Part (most) of the problem with array bounds checking was the idiotic separation of the loop index from the array itself.
for i=0 to n-1 do something with array[i]
The compiler has to figure out for itself that i is within the bounds of array, and this is (in general) very difficult.
However, if I say map(lambda(x) do something with x, array), then map _already knows_ that we are within bounds, so no further checking is required. And no, there is no performance penalty for using a mapping operation plus a lambda; the mapping gets open-coded, as does the lambda. Believe it or not, this sort of optimization is actually easier than the Presburger arithmetic necessary to handle many array index optimizations.
So most of the stupid array checks are due to bad programming language design.
Yes, there will be cases where the array index operations become more interesting; Knuth loves these sorts of things.
Nevertheless, perhaps 95-98% of the array index operations can be done with various mapping & folding operations, which eliminate the need for run-time index checking.
At 10:45 AM 6/10/2014, Joerg Arndt wrote:
Fortran, C/C++, etc., didn't bother with array bounds checking.
I do not think that I want to check those bounds 10^9 times a second when I _know_ it is useless.
Perhaps 90% of software bugs have to do with array bounds violations. (I may have been the only person to actually utilize PL/I's array bounds checking code; it found innumerable bugs.) Oops!
Ach ja!: valgrind exists. (Yes, I did find a few bugs in my collection of high performance crappyness; valgring: good!).
During runtime in a shipped product? Erm, please no. And then, maybe, yes: people are so used to Microsoft performance that they'd never notice.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Very interesting paper, Tom! Thanks for the link. You didn't mention one advantage of the 7090/7040 index architecture: Since indices were hardware _subtracted_ from the pointer instead of added, (upper) bounds checking was easy: "negative" indices were out of bounds!! But you could achieve the same thing with additive index registers: arrange the index register to hold a "negative" index, such that any HW overflows into positive territory indicate a buffer overflow. Example. Suppose we have 15-bit index registers and byte addressing. We want to reference a byte array of ten bytes "bytearray[i]". The "base" of our bytearray is (@bytearray)-32758, so we are really accessing (@bytearray-32758)[i+32758]. If i gets too big, our indexing HW will overflow. From a systems security point of view, the interface to your memory management unit is critical; even one screwup, and the entire system will be compromised. The entire system needs a threat model, and a _formal proof_ that the system is secure. Since this is hardware, we don't have any opportunity to retrofit; we'll have to throw the hardware out if it has any security flaws. The one thing that Defcon-type conferences have shown time & again is that even the smallest chink in the armor can be escalated to a full- fledged pwn. The details of the memory management unit, the instruction fetch unit, the TLU, and particularly the various kinds of hardware trapping, can conspire to enable hacking. I'll have more thoughts later when I have had more time to think about your low-fat pointer scheme. At 11:54 AM 6/10/2014, Tom Knight wrote:
Since a large fraction of current bugs arise from bounds check failures, another approach is to make the array pointer itself aware of the array bounds. This is a paper describing a low-overhead approach to doing this which we are implementing as part of a clean-slate processor/language design: http://www.crash-safe.org/sites/default/files/fatptr_ccs2013_0.pdf
Not only that, map generalizes to any collection---in fact to any functor at all. It allows for much better modularity in the code. On Tue, Jun 10, 2014 at 12:40 PM, Henry Baker <hbaker1@pipeline.com> wrote:
Decent compiler optimizations can move most array-bounds checks out of loops. If yours can't, you need a better compiler.
In no case should you eliminate an array-bounds check that your compiler (or you) can't rigorously prove isn't required.
Part (most) of the problem with array bounds checking was the idiotic separation of the loop index from the array itself.
for i=0 to n-1 do something with array[i]
The compiler has to figure out for itself that i is within the bounds of array, and this is (in general) very difficult.
However, if I say map(lambda(x) do something with x, array), then map _already knows_ that we are within bounds, so no further checking is required. And no, there is no performance penalty for using a mapping operation plus a lambda; the mapping gets open-coded, as does the lambda. Believe it or not, this sort of optimization is actually easier than the Presburger arithmetic necessary to handle many array index optimizations.
So most of the stupid array checks are due to bad programming language design.
Yes, there will be cases where the array index operations become more interesting; Knuth loves these sorts of things.
Nevertheless, perhaps 95-98% of the array index operations can be done with various mapping & folding operations, which eliminate the need for run-time index checking.
At 10:45 AM 6/10/2014, Joerg Arndt wrote:
Fortran, C/C++, etc., didn't bother with array bounds checking.
I do not think that I want to check those bounds 10^9 times a second when I _know_ it is useless.
Perhaps 90% of software bugs have to do with array bounds violations. (I may have been the only person to actually utilize PL/I's array bounds checking code; it found innumerable bugs.) Oops!
Ach ja!: valgrind exists. (Yes, I did find a few bugs in my collection of high performance crappyness; valgring: good!).
During runtime in a shipped product? Erm, please no. And then, maybe, yes: people are so used to Microsoft performance that they'd never notice.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
-- Mike Stay - metaweta@gmail.com http://www.cs.auckland.ac.nz/~mike http://reperiendi.wordpress.com
* Henry Baker <hbaker1@pipeline.com> [Jun 11. 2014 15:53]:
[...]
Part (most) of the problem with array bounds checking was the idiotic separation of the loop index from the array itself.
for i=0 to n-1 do something with array[i]
Using C++11's mechanism ("for_each" in about every other language): Let X be a container (like an array or std::vector) for (auto t : X) do_something_with(t); The re-spawned keyword "auto" means "whatever type, take the correct one". I dare say this is as clean as it gets. Best, jj
[...]
Off list message that I need to reply to (the non-innocent shall remain non-nameful): STD == Standard Template Library Contagiously yours, jj Other off-list messages: _please_ be patient, I am seriously underwater, later, thanks for your patience. * Joerg Arndt <arndt@jjj.de> [Jun 11. 2014 18:53]:
* Henry Baker <hbaker1@pipeline.com> [Jun 11. 2014 15:53]:
[...]
Part (most) of the problem with array bounds checking was the idiotic separation of the loop index from the array itself.
for i=0 to n-1 do something with array[i]
Using C++11's mechanism ("for_each" in about every other language):
Let X be a container (like an array or std::vector)
for (auto t : X) do_something_with(t);
The re-spawned keyword "auto" means "whatever type, take the correct one".
I dare say this is as clean as it gets.
Best, jj
[...]
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
participants (4)
-
Henry Baker -
Joerg Arndt -
Mike Stay -
Tom Knight