[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFzh_8HExAxaivwVFyyHTJu2R3Y=QL2r7T_7ht+yYPfzLw@mail.gmail.com>
Date: Thu, 29 Nov 2012 10:12:30 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Chris Mason <chris.mason@...ionio.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Chris Mason <clmason@...ionio.com>,
Mikulas Patocka <mpatocka@...hat.com>,
Jens Axboe <axboe@...nel.dk>,
Jeff Chua <jeff.chua.linux@...il.com>,
Lai Jiangshan <laijs@...fujitsu.com>, Jan Kara <jack@...e.cz>,
lkml <linux-kernel@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH] Introduce a method to catch mmap_region (was: Recent
kernel "mount" slow)
On Thu, Nov 29, 2012 at 9:51 AM, Chris Mason <chris.mason@...ionio.com> wrote:
>
> The bigger question is do we have users that expect to be able to set
> the blocksize after mmaping the block device (no writes required)? I
> actually feel a little bad for taking up internet bandwidth asking, but
> it is a change in behaviour.
Yeah, it is. That said, I don't think people will really notice.
Nobody mmap's block devices outside of some databases, afaik, and
nobody sane mounts a partition at the same time a DB is using it. So I
think the new EBUSY check is *ugly*, but I don't realistically believe
that it is a problem. The ugliness of the locking is why I'm not a
huge fan of it, but if it works I can live with it.
But yes, the mmap tests are new with the locking, and could in theory
be problematic if somebody reports that it breaks anything.
And like the locking, they'd just go away if we just do the
fs/buffer.c approach instead. Because doing things in fs/buffer.c
simply means that we don't care (and serialization is provided by the
page lock on a per-page basis, which is what mmap relies on anyway).
So doing the per-page fs/buffer.c approach (along with the
"ACCESS_ONCE()" on inode->i_blkbits to make sure we get *one*
consistent value, even if we don't care *which* value it is) would
basically revert to all the old semantics. The only thing it would
change is that we wouldn't see oopses.
(And in theory, it would allow us to actively mix-and-match different
block sizes for a block device, but realistically I don't think there
are any actual users of that - although I could imagine that a
filesystem would use a smaller block size for file tail-blocks etc,
and still want to use the fs/buffer.c code, so it's *possible* that it
would be useful, but filesystems have been able to do things like that
by just doing their buffers by hand anyway, so it's not really
fundamentally new, just a possible generalization of code)
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists