linux-kernel - Re: [PATCH] fs/buffer.c: remove per-CPU buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZJoyz3ho7eR1ljHV@dread.disaster.area>
Date:   Tue, 27 Jun 2023 10:52:31 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Marcelo Tosatti <mtosatti@...hat.com>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>,
        Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
        Frederic Weisbecker <frederic@...nel.org>,
        Valentin Schneider <vschneid@...hat.com>,
        Leonardo Bras <leobras@...hat.com>,
        Yair Podemsky <ypodemsk@...hat.com>, P J P <ppandit@...hat.com>
Subject: Re: [PATCH] fs/buffer.c: remove per-CPU buffer_head lookup cache

On Tue, Jun 27, 2023 at 01:13:25AM +0100, Matthew Wilcox wrote:
> On Tue, Jun 27, 2023 at 09:30:09AM +1000, Dave Chinner wrote:
> > On Mon, Jun 26, 2023 at 07:47:42PM +0100, Matthew Wilcox wrote:
> > > On Mon, Jun 26, 2023 at 03:04:53PM -0300, Marcelo Tosatti wrote:
> > > > Upon closer investigation, it was found that in current codebase, lookup_bh_lru
> > > > is slower than __find_get_block_slow:
> > > > 
> > > >  114 ns per __find_get_block
> > > >  68 ns per __find_get_block_slow
> > > > 
> > > > So remove the per-CPU buffer_head caching.
> > > 
> > > LOL.  That's amazing.  I can't even see why it's so expensive.  The
> > > local_irq_disable(), perhaps?  Your test case is the best possible
> > > one for lookup_bh_lru() where you're not even doing the copy.
> > 
> > I think it's even simpler than that.
> > 
> > i.e. the lookaside cache is being missed, so it's a pure cost and
> > the code is always having to call __find_get_block_slow() anyway.
> 
> How does that happen?
> 
> __find_get_block(struct block_device *bdev, sector_t block, unsigned size)
> {
>         struct buffer_head *bh = lookup_bh_lru(bdev, block, size);
> 
>         if (bh == NULL) {
>                 /* __find_get_block_slow will mark the page accessed */
>                 bh = __find_get_block_slow(bdev, block);
>                 if (bh)
>                         bh_lru_install(bh);
> 
> The second (and all subsequent) calls to __find_get_block() should find
> the BH in the LRU.
> 
> > IMO, this is an example of how lookaside caches are only a benefit
> > if the working set of items largely fits in the lookaside cache and
> > the cache lookup itself is much, much slower than a lookaside cache
> > miss.
> 
> But the test code he posted always asks for the same buffer each time.
> So it should find it in the lookaside cache?

Oh.

	for (i = 0; ....) {
		bh = __find_get_block(bdev, 1, 512);

That's a '1' being passed to __find_get_block, not 'i'.

/me goes and gets more coffee.

Maybe it's CONFIG_PREEMPT_RT=y doing something to the locks that
isn't obvious here...

-Dave.
-- 
Dave Chinner
david@...morbit.com