Re: [math-fun] Intel to incorporate FPGA in new Xeon processor

23 Jun 2014

      Clinton Parker & I developed a microcode compiler for the
Xerox Alto microcode at the University of Rochester.

The biggest part of the speed up was eliminating the slow
instruction fetch for the usual non-microcode op codes. 
Now, modern _instruction caches_ already eliminate
instruction fetch as a bottleneck, so this particular
advantage of microcode has been eliminated.

Clinton then utilized this microcode compiler to do an
on-the-fly text compression/decompression scheme which
allowed the entire text for his "FLASH" text editor to
live in main memory.  Since the ASCII texts for most
computer program files were quite small, FLASH could
edit his program code in amazingly fast time, probably
10X (??) faster than the heavyweight WYSIWYG multi-font
standard text editor for the Alto.

http://home.pipeline.com/~hbaker1/Micro-SPL.txt

home.pipeline.com/~hbaker1/MicroSPL.ps.gz

The development of a diode matrix ROM as a uniform
scheme for representing random logic, and the
subsequent development of branching capabilities
were truly a revolution in computer design.  It
was then a small step to allowing the reprogramming
of the microcode ROM with various schemes, including
the extremely cool "writable" control store of the
IBM System 360 Model 30, whose ROM was made out of
specially designed _punch cards_ in which a standard
IBM card punch was used to "program" the control
store.

Unfortunately, one then had to take each card by
hand and install it into the control "memory".

The high level language EULER was implemented
in microcode for the Model 30 using this technique.

https://en.wikipedia.org/wiki/Microcode

https://research.microsoft.com/en-us/um/people/gbell/Computer_Structures__Re...

https://en.wikipedia.org/wiki/Euler_%28programming_language%29

I believe that FPGA's are programmed by a serial
link which performs sort of a Hamiltonian path
through all of the logic.  Shifting the programming
bits through this long path isn't a fast process;
Since FPGA's aren't expected to be reprogrammed very
often, the FPGA is optimized for high performance
once the program is installed, and the speed of
getting the program installed isn't a major goal.

At 10:50 AM 6/23/2014, Tom Knight wrote:
...
Almost all of the writable control store microprocessors were very poorly programmable. Most used the feature to allow bug repair of the microcode, and could barely do that. When I designed the initial Lisp machine, there was, to my knowledge, just the Alto and a modified PDP-11/45 at CMU with WCS. Neither machine had general shifter/maskers or a way to do multi-way branches on a register field.
I do agree with you, however, that the general capabilities were poorly used. We could, for example, have defined instructions on the PDP-10 which used the UUO traps. They would have been slow, but quite capable. In some sense, the ITS system calls behaved very much like instructions of this form.
On Jun 23, 2014, at 1:38 PM, Michael Greenwald <mbgreen@seas.upenn.edu> wrote:
...
On 2014-06-23 10:34, Michael Greenwald wrote:
...
On 2014-06-23 07:07, Henry Baker wrote:
...
FYI -- Lots of potential math applications...
https://communities.intel.com/community/itpeernetwork/datastack/blog/2014/06...
I'm curious whether you (or anyone here) thinks this is really so
different than programmable
micro-code -- e.g. writable control store?
To clarify: I meant "different" in terms of impact or use by customers.
...
[I know that Lisp Machines
(well, Symbolics' machines
until the XL- series) had this capability, and so did  some vaxen
(-WCS versions).  I've
heard that early IBM 360's or 370's had this, too, but I have
no direct experience with modifying the microcode on those].
So I wonder how revolutionary will this Intel move really be?  My
impression (no hard data,
though) is that users are often really excited by the idea, but very,
very, few actually
use this capability in practice.
And, I could just have the wrong impression: maybe writable control store *was*
used a lot more than I believe.
...
...
Disrupting the Data Center to Create the Digital Services Economy
Posted by Diane Bryant in The Data Stack on Jun 18, 2014 12:56:03 PM
....
But what we find even more exciting is our next innovation in
processor design that can dramatically increase application
performance through fully custom accelerators.  **** We are
integrating our industry leading Xeon processor with a coherent FPGA
in a single package, socket compatible to our standard Xeon E5
processor offerings.****
Why are we excited by this announcement?  The FPGA provides our
customers a programmable, high performance coherent acceleration
capability to turbo-charge their critical algorithms.  And with
down-the-wire reprogramability, the algorithms can be changed as new
workloads emerge and compute demands fluctuate.   Based on industry
benchmarks FPGA-based accelerators can deliver >10X performance gains.
By integrating the FPGA with the Xeon processor, we estimate that
customers will see an additional 2X in performance thanks to the low
latency, coherent interface.
Our new Xeon+FPGA solution provides yet another customized option, one
more tool for customers to use to improve their critical data center
metric of Performance/TCO.   It highlights our commitment to
delivering the very best solutions across all data center workloads
and our passion to lead in the transformation of the industry to cloud
services.
....