From what I understand, NAND flash memory consists of rather large "eraseblocks"/"gcblocks" of perhaps 64KBytes, which can be read in 512byte blocks (?) and written in 512byte blocks (?), but the "write" operation here is really an AND operation, since an erase operation on such a block resets the block to all 1's.
Obviously, the usual "FTL" and file system layers cover all of this up -- for good reasons, including bad block handling and "wear leveling", to avoid erasing a block too many times. The question here is whether the ability to perform a simple logic operation -- AND in this case -- can be put to good use. (Let's ignore for the moment that these devices are slower than RAM memories.) The most obvious use which comes to mind is that of a bitvector for keeping track of which "eraseblocks" are free; the most obvious convention would be that a "1" bit indicates that the block is free! Thus, a single 64KB eraseblock could keep track of the free/allocated status of 32GB of flash memory. Note that simply _reading_ or _AND'ing to memory_ don't cause any "wear"; the wear is caused by _erasing_ a complete "eraseblock". So in order to make good use of these blocks, we have to arrange data structures to put stuff together in the same "eraseblock" that all gets erased at the same time, and keep stuff in separate eraseblocks that needs to be independently erased. Next, we notice that algorithms which live in a lattice and which compute LUB's or GLB's can be mapped to such a memory -- e.g., lattice elements the are long bitvectors. So if each bitvector occupied a separate eraseblock, then an optimizing algorithm that computed an LUB, for example, could be mapped to such a flash memory, and only cause "wear" at the end of each computation when the bitvectors are reset. We could also do Connection Machine-style bit-serial processing on 2^19 bit-processors, but for wear-levelling, we would assume a WORM (write-once, read-many) memory, and then recycle the eraseblocks only after all of them had been used at least once. (Note that in CM-style processing, each bitvector i indicates the state of the i'th memory bit in all 2^19 processors in an SIMD array of processors. Also, any ALU ops are "free" because they can be performed by the processor accessing the NAND flash memory.) Can we do faster bit-matrix multiplication? Vaughan Pratt wrote a paper about long bitvector machines, but I'm not sure how much of this would carry (!) over to this model. Can any other lattices other than Boolean lattices be emulated with these long AND operations? Any other ideas?
participants (2)
-
Henry Baker -
Whitfield Diffie