linux-kernel - Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFyXdtVJ=SG9bU4qfggCR6DPvz4vOFYxKJZx_WdyFv+3Fw@mail.gmail.com>
Date:	Wed, 28 Nov 2012 12:03:23 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Mikulas Patocka <mpatocka@...hat.com>
Cc:	Jens Axboe <axboe@...nel.dk>,
	Jeff Chua <jeff.chua.linux@...il.com>,
	Lai Jiangshan <laijs@...fujitsu.com>, Jan Kara <jack@...e.cz>,
	lkml <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] Introduce a method to catch mmap_region (was: Recent
 kernel "mount" slow)

On Wed, Nov 28, 2012 at 11:50 AM, Mikulas Patocka <mpatocka@...hat.com> wrote:
>
> mmap_region() doesn't care about the block size. But a lot of
> page-in/page-out code does.

That seems a bogus argument.

mmap() is in *no* way special. The exact same thing happens for
regular read/write. Yet somehow the mmap code is special-cased, while
the normal read-write code is not.

I suspect it might be *easier* to trigger some issues with mmap, but
that still isn't a good enough reason to special-case it. We don't add
locking to one please just because that one place shows some race
condition more easily. We fix the locking.

So for example, maybe the code that *actually* cares about the buffer
size (the stuff that allocates buffers in fs/buffer.c) needs to take
that new percpu read lock. Basically, any caller of
"alloc_page_buffers()/create_empty_buffers()" or whatever.

I also wonder whether we need it *at*all*. I suspect that we could
easily have multiple block-sizes these days for the same block device.
It *used* to be (millions of years ago, when dinosaurs roamed the
earth) that the block buffers were global and shared with all users of
a partition. But that hasn't been true since we started using the page
cache, and I suspect that some of the block size changing issues are
simply entirely stale.

Yeah, yeah, there could be some coherency issues if people write to
the block device through different block sizes, but I think we have
those coherency issues anyway. The page-cache is not coherent across
different mapping inodes anyway.

So I really suspect that some of this is "legacy logic". Or at least
perhaps _should_ be.

                    Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/