linux-kernel - Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070319195442.GT4892@waste.org>
Date:	Mon, 19 Mar 2007 14:54:42 -0500
From:	Matt Mackall <mpm@...enic.com>
To:	Josh Boyer <jwboyer@...ux.vnet.ibm.com>
Cc:	Artem Bityutskiy <dedekind@...radead.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Frank Haverkamp <haver@...t.ibm.com>,
	Christoph Hellwig <hch@...radead.org>,
	David Woodhouse <dwmw2@...radead.org>
Subject: Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images

On Mon, Mar 19, 2007 at 01:16:28PM -0500, Josh Boyer wrote:
> On Mon, 2007-03-19 at 12:08 -0500, Matt Mackall wrote:
> > 
> > > > If the end goal is to end up with something that looks like a block
> > > > device (which seems to be implied by adding transparent wear leveling
> > > 
> > > Nope, not the end goal.  It's more about wear-leveling across the entire
> > > flash chip than it is presenting a "block like" device.
> > 
> > It seems to be about spanning devices and repartitioning as well.
> > Hence the analogy with LVM.
> 
> Yes, it can span multiple MTDs which spreads the wear-leveling even
> more.  Yes, it can create/resize/remove volumes.  It does that
> differently than LVM, but the ideas are related.  I don't see the issue
> here I guess.

The issue is 14000 lines of patch to make a parallel subsystem.

> (UBI also has static volumes which LVM doesn't but that is an aside.)

If a static volume is simply a non-dynamic volume, then device mapper
can do that too. And countless other things. Which is not an aside.
UBI growing to do all the things that device mapper does is exactly
the thing we should be seeking to avoid.
 
> > > > and bad block remapping), then I don't see any reason it can't be done
> > > > in device mapper. The 'smarts' of mtdblock could in fact be pulled up
> > > 
> > > There is nothing smart about mtdblock.  And mtdblock has nothing to do
> > > with UBI.
> > 
> > Note the scare quotes. Device mapper runs on top of a block device.
> > And mtdblock is currently the block interface that MTD exports. And it
> > has 'smarts' that hide handling of sub-eraseblock I/O. I'm clearly
> > talking about an approach that doesn't involve UBI at all.
> 
> Ok, but what I'm saying is that using device mapper on top of mtdblock
> is not a good solution.  mtdblock caches writes within an eraseblock to
> a DRAM buffer of eraseblock size.  If you get a power failure before
> that is flushed out, you lose an entire eraseblock's worth of data.

Sigh. That's precisely why I talked about moving said smarts. This is
nothing that a higher level remapping layer can't address.
 
> That's why I suggested fixing the MTD layers that present block devices
> first in the part of my reply that you cut off.  It seems to me that
> you're really after getting flash to look like a block device, which
> would enable device mapper to be used for something similar to UBI.
> That's fine, but until someone does that work UBI fills a need, has
> users, and has an existing implementation.

False starts that get mainlined delay or prevent things getting done
right. The question is and remains "is UBI the right way to do
things?" Not "is UBI the easiest way to do things?" or "is UBI
something people have already adopted?"

If the right way is instead to extend the block layer and device
mapper to encompass the quirks of NAND in a sensible fashion, then UBI
should not go in.

Let me draw a picture so we have something to argue about:

                     iSCSI/nbd(6)
                          |
filesystem {        swap  |  ext3        ext3     jffs2
                      \   |   |            |       /
               /       \  | dm-crypt->snapshot(5) /
device mapper -|        \ \   |                  /
               |         partitioning           /
               |              |          partitioning(4)
               |        wear leveling(3)  /
               |              |          /
               |      block concatenation
               |       |    |    |     |
               \      bad block remapping(2)   
                       |    |    |     |
MTD raw block {     raw block devices with no smarts(1)
                      /     |     \      \
hardware {         NAND    NAND   NAND   NAND

Notes:
1. This would provide a block device that allowed writing pages and
   a secondary method for erasing whole blocks as well as a method for
   querying/setting out of band information.
2. This would hide erase blocks either by using an embedded table or
   out of band info. This could stack on top of block concatenation if
   desired.
3. This would provide wear leveling, and probably simultaneously
   provide relatively efficient and safe access to write sector 
   and page-sized I/O. Below this level, things had better be
   comfortable with the limitations of NAND if they want to work well.
4. JFFS2 has its own wear-leving scheme, as do several other
   filesystems, so they probably want to bypass this piece of the stack.
5. We don't reimplement higher pieces of the stack (dm-crypt,
   snapshot, etc.).
6. We make some things possible that simply aren't otherwise.

And this picture isn't even interesting yet. Imagine a dm-cache layer
that caches data read from disks in high-speed flash. Or using
dm-mirror to mirror writes to local flash over NBD or to a USB drive.
Neither of these can be done 'right' in a stack split between device
mapper and UBI.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/