linux-kernel - Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 19 Mar 2007 17:32:05 -0500
From:	Matt Mackall <mpm@...enic.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Josh Boyer <jwboyer@...ux.vnet.ibm.com>,
	Artem Bityutskiy <dedekind@...radead.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Frank Haverkamp <haver@...t.ibm.com>,
	Christoph Hellwig <hch@...radead.org>,
	David Woodhouse <dwmw2@...radead.org>
Subject: Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images

On Mon, Mar 19, 2007 at 10:05:29PM +0100, Thomas Gleixner wrote:
> On Mon, 2007-03-19 at 14:54 -0500, Matt Mackall wrote:
> > > (UBI also has static volumes which LVM doesn't but that is an aside.)
> > 
> > If a static volume is simply a non-dynamic volume, then device mapper
> > can do that too. And countless other things. Which is not an aside.
> > UBI growing to do all the things that device mapper does is exactly
> > the thing we should be seeking to avoid.
> 
> No it can't and device mapper sits on top of block devices. FLASH is no
> block device. Period.

Which of the following two properties does it lack?

- discrete blocks
- non-sequential access to blocks

When you do the obvious s/blocks/eraseblocks/, this appears to be
true.

Saying "but I can't do I/O smaller than the blocksize" doesn't change
this any more than it would for disks.

Saying "but I can do smaller I/O efficiently in some circumstances"
also doesn't change it.

In historical UNIX, some tapes were block devices too. Because they
supported seek().

> Device mapper can not provide a simple easy to decode scheme for boot
> loaders. We need to be able to boot out of 512 - 2048 byte of NAND FLASH
> and be able to find the kernel or second stage boot loader in this
> unordered device.
> 
> And no, fixed addresses do not work. Do you want to implement device
> mapper into your Initialial Bootloader stage ?

This is exactly the same problem as booting on a desktop PC. But
somehow LILO manages. My first Linux box had a hell of a lot less disk
than the platform I bootstrapped (and wrote NAND drivers for) last
month had in NAND.

> > > That's why I suggested fixing the MTD layers that present block devices
> > > first in the part of my reply that you cut off.  It seems to me that
> > > you're really after getting flash to look like a block device, which
> > > would enable device mapper to be used for something similar to UBI.
> > > That's fine, but until someone does that work UBI fills a need, has
> > > users, and has an existing implementation.
> > 
> > False starts that get mainlined delay or prevent things getting done
> > right. The question is and remains "is UBI the right way to do
> > things?" Not "is UBI the easiest way to do things?" or "is UBI
> > something people have already adopted?"
> > 
> > If the right way is instead to extend the block layer and device
> > mapper to encompass the quirks of NAND in a sensible fashion, then UBI
> > should not go in.
> 
> No, block layer on top of FLASH needs 80% of the functionality of UBI in
> the first place.

Incorrect. A block-based filesystem on top of flash needs this
functionality. But a block device suitable to device mapper layering
(which then provides the functionality) does not.

> You need to implement a clever journalling block device
> emulator in order to keep the data alive and the FLASH not weared out
> within no time. You need the wear levelling, otherwise you can throw
> away your FLASH in no time.

And that's why it's in my picture.

> > Let me draw a picture so we have something to argue about:
> > 
> >                      iSCSI/nbd(6)
> >                           |
> > filesystem {        swap  |  ext3        ext3     jffs2
> >                       \   |   |            |       /
> >                /       \  | dm-crypt->snapshot(5) /
> > device mapper -|        \ \   |                  /
> >                |         partitioning           /
> >                |              |          partitioning(4)
> >                |        wear leveling(3)  /
> >                |              |          /
> >                |      block concatenation
> >                |       |    |    |     |
> >                \      bad block remapping(2)   
> >                        |    |    |     |
> > MTD raw block {     raw block devices with no smarts(1)
> >                       /     |     \      \
> > hardware {         NAND    NAND   NAND   NAND
> > 
> > Notes:
> > 1. This would provide a block device that allowed writing pages and
> >    a secondary method for erasing whole blocks as well as a method for
> >    querying/setting out of band information.
> 
> Forget about OOB data. OOB data is reserved for ECC. Please read the
> recommendations of the NAND FLASH manufacturers. NAND gets less reliable
> with higher density devices and smaller processes.
> 
> > 2. This would hide erase blocks either by using an embedded table or
> >    out of band info. This could stack on top of block concatenation if
> >    desired.
> 
> Hide erase blocks ? UBI does not hide anything. It maps logical
> eraseblocks, which are exposed to the clients to arbitrary physical
> eraseblocks on the FLASH device in order to provide across device wear
> levelling.

Sorry, I meant hiding bad blocks here. That's why this layer was
labeled "bad block remapping".

> > 3. This would provide wear leveling, and probably simultaneously
> >    provide relatively efficient and safe access to write sector 
> >    and page-sized I/O. Below this level, things had better be
> >    comfortable with the limitations of NAND if they want to work well.
> 
> I don't see how this provides across device wear levelling.

Because the layer immediately beneath it ("block concatenation") takes
N devices and presents one logical device.

> > 4. JFFS2 has its own wear-leving scheme, as do several other
> >    filesystems, so they probably want to bypass this piece of the stack.
> 
> JFFS2 on top of UBI delegates the wear levelling to UBI, as JFFS2s own
> wear levelling sucks. 

Ok, fine. How about LogFS, then?
 
> > 5. We don't reimplement higher pieces of the stack (dm-crypt,
> >    snapshot, etc.).
> 
> Why should we reimplement that ?

So that you can get encryption and snapshot, etc.?

> > 6. We make some things possible that simply aren't otherwise.
> >
> > And this picture isn't even interesting yet. Imagine a dm-cache layer
> > that caches data read from disks in high-speed flash. Or using
> > dm-mirror to mirror writes to local flash over NBD or to a USB drive.
> > Neither of these can be done 'right' in a stack split between device
> > mapper and UBI.
> 
> Err. Implement a clever block layer on top of UBI and use all the
> goodies you want including device mapper.

If I wanted to have both device mapper and device mapper's little
brother in my kernel, I wouldn't have started this thread.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/