Clinton Parker & I developed a microcode compiler for the Xerox Alto microcode at the University of Rochester. The biggest part of the speed up was eliminating the slow instruction fetch for the usual non-microcode op codes. Now, modern _instruction caches_ already eliminate instruction fetch as a bottleneck, so this particular advantage of microcode has been eliminated. Clinton then utilized this microcode compiler to do an on-the-fly text compression/decompression scheme which allowed the entire text for his "FLASH" text editor to live in main memory. Since the ASCII texts for most computer program files were quite small, FLASH could edit his program code in amazingly fast time, probably 10X (??) faster than the heavyweight WYSIWYG multi-font standard text editor for the Alto. http://home.pipeline.com/~hbaker1/Micro-SPL.txt home.pipeline.com/~hbaker1/MicroSPL.ps.gz The development of a diode matrix ROM as a uniform scheme for representing random logic, and the subsequent development of branching capabilities were truly a revolution in computer design. It was then a small step to allowing the reprogramming of the microcode ROM with various schemes, including the extremely cool "writable" control store of the IBM System 360 Model 30, whose ROM was made out of specially designed _punch cards_ in which a standard IBM card punch was used to "program" the control store. Unfortunately, one then had to take each card by hand and install it into the control "memory". The high level language EULER was implemented in microcode for the Model 30 using this technique. https://en.wikipedia.org/wiki/Microcode https://research.microsoft.com/en-us/um/people/gbell/Computer_Structures__Re... https://en.wikipedia.org/wiki/Euler_%28programming_language%29 I believe that FPGA's are programmed by a serial link which performs sort of a Hamiltonian path through all of the logic. Shifting the programming bits through this long path isn't a fast process; Since FPGA's aren't expected to be reprogrammed very often, the FPGA is optimized for high performance once the program is installed, and the speed of getting the program installed isn't a major goal. At 10:50 AM 6/23/2014, Tom Knight wrote:
Almost all of the writable control store microprocessors were very poorly programmable. Most used the feature to allow bug repair of the microcode, and could barely do that. When I designed the initial Lisp machine, there was, to my knowledge, just the Alto and a modified PDP-11/45 at CMU with WCS. Neither machine had general shifter/maskers or a way to do multi-way branches on a register field.
I do agree with you, however, that the general capabilities were poorly used. We could, for example, have defined instructions on the PDP-10 which used the UUO traps. They would have been slow, but quite capable. In some sense, the ITS system calls behaved very much like instructions of this form.
On Jun 23, 2014, at 1:38 PM, Michael Greenwald <mbgreen@seas.upenn.edu> wrote:
On 2014-06-23 10:34, Michael Greenwald wrote:
On 2014-06-23 07:07, Henry Baker wrote:
FYI -- Lots of potential math applications... https://communities.intel.com/community/itpeernetwork/datastack/blog/2014/06... I'm curious whether you (or anyone here) thinks this is really so different than programmable micro-code -- e.g. writable control store?
To clarify: I meant "different" in terms of impact or use by customers.
[I know that Lisp Machines (well, Symbolics' machines until the XL- series) had this capability, and so did some vaxen (-WCS versions). I've heard that early IBM 360's or 370's had this, too, but I have no direct experience with modifying the microcode on those]. So I wonder how revolutionary will this Intel move really be? My impression (no hard data, though) is that users are often really excited by the idea, but very, very, few actually use this capability in practice.
And, I could just have the wrong impression: maybe writable control store *was* used a lot more than I believe.
Disrupting the Data Center to Create the Digital Services Economy Posted by Diane Bryant in The Data Stack on Jun 18, 2014 12:56:03 PM .... But what we find even more exciting is our next innovation in processor design that can dramatically increase application performance through fully custom accelerators. **** We are integrating our industry leading Xeon processor with a coherent FPGA in a single package, socket compatible to our standard Xeon E5 processor offerings.**** Why are we excited by this announcement? The FPGA provides our customers a programmable, high performance coherent acceleration capability to turbo-charge their critical algorithms. And with down-the-wire reprogramability, the algorithms can be changed as new workloads emerge and compute demands fluctuate. Based on industry benchmarks FPGA-based accelerators can deliver >10X performance gains. By integrating the FPGA with the Xeon processor, we estimate that customers will see an additional 2X in performance thanks to the low latency, coherent interface. Our new Xeon+FPGA solution provides yet another customized option, one more tool for customers to use to improve their critical data center metric of Performance/TCO. It highlights our commitment to delivering the very best solutions across all data center workloads and our passion to lead in the transformation of the industry to cloud services. ....