lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFyXS0nkDB95c3pcdg1ctrNnUGO8aDuRNhKSEg3JrOqWhA@mail.gmail.com>
Date:	Wed, 28 Nov 2012 12:13:27 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Mikulas Patocka <mpatocka@...hat.com>
Cc:	Jens Axboe <axboe@...nel.dk>,
	Jeff Chua <jeff.chua.linux@...il.com>,
	Lai Jiangshan <laijs@...fujitsu.com>, Jan Kara <jack@...e.cz>,
	lkml <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] Introduce a method to catch mmap_region (was: Recent
 kernel "mount" slow)

On Wed, Nov 28, 2012 at 12:03 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> mmap() is in *no* way special. The exact same thing happens for
> regular read/write. Yet somehow the mmap code is special-cased, while
> the normal read-write code is not.

I just double-checked, because it's been a long time since I actually
looked at the code.

But yeah, block device read/write uses the pure page cache functions.
IOW, it has the *exact* same IO engine as mmap() would have.

So here's my suggestion:

 - get rid of *all* the locking in aio_read/write and the splice paths
 - get rid of all the stupid mmap games

 - instead, add them to the functions that actually use
"blkdev_get_block()" and "blkdev_get_blocks()" and nowhere else.

   That's a fairly limited number of functions:
blkdev_{read,write}page(), blkdev_direct_IO() and
blkdev_write_{begin,end}()

Doesn't that sounds simpler? And more logical: it protects the actual
places that use the block size of the device.

I dunno. Maybe there is some fundamental reason why the above is
broken, but it seems to be a much simpler approach. Sure, you need to
guarantee that the people who get the write-lock cannot possibly cause
IO while holding it, but since the only reason to get the write lock
would be to change the block size, that should be pretty simple, no?

Yeah, yeah, I'm probably missing something fundamental, but the above
sounds like the simple approach to fixing things. Aiming for having
the block size read-lock be taken by the things that pass in the
block-size itself.

It would be nice for things to be logical and straightforward.

                   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ