[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201103221456.32151.arnd@arndb.de>
Date: Tue, 22 Mar 2011 14:56:31 +0100
From: Arnd Bergmann <arnd@...db.de>
To: Andreas Dilger <adilger@...ger.ca>
Cc: Andrei Warkentin <andreiw@...orola.com>, linux-mmc@...r.kernel.org,
linux-ext4@...r.kernel.org
Subject: Re: [RFC 4/5] MMC: Adjust unaligned write accesses.
On Tuesday 22 March 2011, Andreas Dilger wrote:
> On 2011-03-21, at 8:05 PM, Arnd Bergmann wrote:
> > On Monday 21 March 2011 19:03:09 Andreas Dilger wrote:
> >> Note that mballoc was specifically designed to handle allocation
> >> requests that are aligned on RAID stripe boundaries, so it should
> >> be able to handle this for MMC as well. What is needed is to tell
> >> the filesystem what the underlying alignment is. That can be done
> >> at format time with mke2fs or afterward with tune2fs by using the
> >> "-E stripe_width" option.
> >
> > Ah, that sounds useful. So would I set the stripe_width to the
> > erase block size, and the block group size to a multiple of that?
>
> When you write "block group size" do you mean the ext4 block group?
Yes.
> Then yes it would help. You could also consider setting the flex_bg
> size to a multiple of this, so that the bitmap blocks are grouped as
> a multiple of this size. However, they may not be aligned correctly,
> which needs extra effort that isn't obvious.
>
> I think it would be nice to have mke2fs take the stripe_width and/or
> flex_bg factor into account when sizing/aligning the bitmaps, but it
> doesn't yet.
A few more questions:
* On cards that can only write to a single erase block at a time,
should I make the block group size the same as the as the erase
block? I suppose writing both block bitmaps, inode and data to
separate erase blocks would create multiple eraseblock
read-modify-write cycles for every single file otherwise.
* Is it guaranteed that inode bitmap, inode, block bitmap and
blocks are always written in low-to-high sector order within
one ext4 block group? A lot of the drives will do a garbage-collect
step (adding hundreds of miliseconds) every time you move back
inside of the eraseblock.
* Is there any way to make ext4 use effective blocks larger
than 4 KB? The most common size for a NAND flash page is 16
KB right (effectively, ignoring what the hardware does), so
it would be good to never write smaller.
* Calling TRIM on SD cards is probably counterproductive unless
you trim entire erase blocks. Is that even possible with ext4,
assuming that we use block group == erase block?
* Is there a way to put the journal into specific parts of the
drive? Almost all SD cards have an area in the second 4 MB
(more for larger cards) that can be written using random access
without forcing garbage collection on other parts.
> > Does this also work in (rare) cases where the erase block size is
> > not a power of two?
>
> It does (or is supposed to), but that isn't code that is exercised
> very much (most installations use a power-of-two size).
Ok. Recently, cheap TLC (three-level cell, 3-bit MLC) NAND is
becoming popular. I've seen erase block sizes of 6 MiB, 1376 KiB
(4096 / 3, rounded up) and 4128 KiB (1376 * 3) because of this, in
place of the common 4096 KiB. The SD card standard specifies
values of 12 MB and 24 MB aside from the usual power-of-two values
up to 64 MB for large cards (>32GB), while smaller cards are allowed
only up to 4 MB erase blocks and need to be power-of-two. Many
cards do not use the size they claim in their registers.
Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists