linux-kernel - Re: [PATCH] fs/buffer.c: remove per-CPU buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZJoP4e6VPkJnvSAh@casper.infradead.org>
Date:   Mon, 26 Jun 2023 23:23:29 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Marcelo Tosatti <mtosatti@...hat.com>
Cc:     linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>,
        Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
        Frederic Weisbecker <frederic@...nel.org>,
        Dave Chinner <david@...morbit.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Leonardo Bras <leobras@...hat.com>,
        Yair Podemsky <ypodemsk@...hat.com>, P J P <ppandit@...hat.com>
Subject: Re: [PATCH] fs/buffer.c: remove per-CPU buffer_head lookup cache

On Mon, Jun 26, 2023 at 03:04:53PM -0300, Marcelo Tosatti wrote:
> Upon closer investigation, it was found that in current codebase, lookup_bh_lru
> is slower than __find_get_block_slow:
> 
>  114 ns per __find_get_block
>  68 ns per __find_get_block_slow
> 
> So remove the per-CPU buffer_head caching.
> 
> Test program:
> 
> #define NRLOOPS 200000
> static int __init example_init(void)
> {
>         ktime_t s, e;
>         s64 delta;
>         int i, suc;
> 
>         bdev = blkdev_get_by_path("/dev/loop0", FMODE_READ, NULL);
>         if (IS_ERR(bdev)) {
>                 printk(KERN_ERR "failed to load /dev/loop0\n");
>                 return -ENODEV;
>         }
> 
>         suc = 0;
>         delta = 0;
>         for (i=0; i < NRLOOPS; i++) {
>                 struct buffer_head *bh;
> 
>                 s = ktime_get();
>                 bh = __find_get_block(bdev, 1, 512);
>                 e = ktime_get();
>                 if (bh) {
>                                 suc++;
>                                 __brelse(bh);
>                 }
>                 delta = delta + ktime_to_ns(ktime_sub(e, s));
> 
>         }
>         printk(KERN_ERR "%lld ns per __find_get_block (suc=%d)\n", delta/NRLOOPS, suc);
> 
>         suc = 0;
>         delta = 0;
>         for (i=0; i < NRLOOPS; i++) {
>                 struct buffer_head *bh;
> 
>                 s = ktime_get();
>                 bh = __find_get_block_slow(bdev, 1);
>                 e = ktime_get();
>                 if (bh) {
>                         suc++;
>                         __brelse(bh);
>                 }
>                 delta = delta + ktime_to_ns(ktime_sub(e, s));
>         }
>         printk(KERN_ERR "%lld ns per __find_get_block_slow (suc=%d)\n", delta/NRLOOPS, suc);

It occurs to me that this is close to being the best-case scenario for
page-cache lookup as well as for lru lookup.  Can you re-run it with
block 4UL * 1024 * 1024 * 1024 instead of block 1?