lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFxYCxsRM==WMGihQGOA7f5u2X_8boJypsxS2v=WMnr70g@mail.gmail.com>
Date:	Wed, 28 Nov 2012 18:58:40 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Mikulas Patocka <mpatocka@...hat.com>
Cc:	Jens Axboe <axboe@...nel.dk>,
	Jeff Chua <jeff.chua.linux@...il.com>,
	Lai Jiangshan <laijs@...fujitsu.com>, Jan Kara <jack@...e.cz>,
	lkml <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] Introduce a method to catch mmap_region (was: Recent
 kernel "mount" slow)

On Wed, Nov 28, 2012 at 6:04 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> Interesting. The code *has* the block size (it's in "bh->b_size"), but
> it actually then uses the inode blocksize instead, and verifies the
> two against each other. It could just have used the block size
> directly (and then used the inode i_blkbits only when no buffers
> existed), avoiding that dependency entirely..

Looking more at this code, that really would be the nicest solution.

There's two cases for the whole get_block() thing:

 - filesystems. The block size will not change randomly, and
"get_block()" seriously depends on the block size.

 - the raw device. The block size *will* change, but to simplify the
problem, "get_block()" is a 1:1 mapping, so it doesn't even care about
the block size because it will always return "bh->b_blocknr = nr".

So we *could* just say that all the fs/buffer.c code should use
"inode->i_blkbits" for creating buffers (because that's the size new
buffers should always use), but use "bh->b_size" for any *existing*
buffer use.

And looking at it, it's even simple. Except for one *very* annoying
thing: several users really don't want the size of the buffer, they
really do want the *shift* of the buffer size.

In fact, that single issue seems to be the reason why
"inode->i_blkbits" is really used in fs/buffer.c.

Otherwise it would be fairly trivial to just make the pattern be just a simple

        if (!page_has_buffers(page))
                create_empty_buffers(page, 1 << inode->i_blkbits, 0);
        head = page_buffers(page);
        blocksize = head->b_size;

and just use the blocksize that way, without any other games. All
done, no silly WARN_ON() to verify against some global block-size, and
the fs/buffer.c code would be perfectly simple, and would have no
problem at all with multiple different blocksizes in different pages
(the page lock serializes the buffers and thus the blocksize at the
per-page level).

But the fact that the code wants to do things like

        block = (sector_t)page->index << (PAGE_CACHE_SHIFT - bbits);

seriously seems to be the main thing that keeps us using
'inode->i_blkbits'. Calculating bbits from bh->b_size is just costly
enough to hurt (not everywhere, but on some machines).

Very annoying.

                       Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ