[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZJoppezn+EiLQvUm@casper.infradead.org>
Date: Tue, 27 Jun 2023 01:13:25 +0100
From: Matthew Wilcox <willy@...radead.org>
To: Dave Chinner <david@...morbit.com>
Cc: Marcelo Tosatti <mtosatti@...hat.com>,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>,
Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
Frederic Weisbecker <frederic@...nel.org>,
Valentin Schneider <vschneid@...hat.com>,
Leonardo Bras <leobras@...hat.com>,
Yair Podemsky <ypodemsk@...hat.com>, P J P <ppandit@...hat.com>
Subject: Re: [PATCH] fs/buffer.c: remove per-CPU buffer_head lookup cache
On Tue, Jun 27, 2023 at 09:30:09AM +1000, Dave Chinner wrote:
> On Mon, Jun 26, 2023 at 07:47:42PM +0100, Matthew Wilcox wrote:
> > On Mon, Jun 26, 2023 at 03:04:53PM -0300, Marcelo Tosatti wrote:
> > > Upon closer investigation, it was found that in current codebase, lookup_bh_lru
> > > is slower than __find_get_block_slow:
> > >
> > > 114 ns per __find_get_block
> > > 68 ns per __find_get_block_slow
> > >
> > > So remove the per-CPU buffer_head caching.
> >
> > LOL. That's amazing. I can't even see why it's so expensive. The
> > local_irq_disable(), perhaps? Your test case is the best possible
> > one for lookup_bh_lru() where you're not even doing the copy.
>
> I think it's even simpler than that.
>
> i.e. the lookaside cache is being missed, so it's a pure cost and
> the code is always having to call __find_get_block_slow() anyway.
How does that happen?
__find_get_block(struct block_device *bdev, sector_t block, unsigned size)
{
struct buffer_head *bh = lookup_bh_lru(bdev, block, size);
if (bh == NULL) {
/* __find_get_block_slow will mark the page accessed */
bh = __find_get_block_slow(bdev, block);
if (bh)
bh_lru_install(bh);
The second (and all subsequent) calls to __find_get_block() should find
the BH in the LRU.
> IMO, this is an example of how lookaside caches are only a benefit
> if the working set of items largely fits in the lookaside cache and
> the cache lookup itself is much, much slower than a lookaside cache
> miss.
But the test code he posted always asks for the same buffer each time.
So it should find it in the lookaside cache?
Powered by blists - more mailing lists