lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZQCLdzmtVcjxZWXt@casper.infradead.org>
Date:   Tue, 12 Sep 2023 17:01:59 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Feng Tang <feng.tang@...el.com>
Cc:     Chuck Lever III <chuck.lever@...cle.com>,
        "Sang, Oliver" <oliver.sang@...el.com>,
        "oe-lkp@...ts.linux.dev" <oe-lkp@...ts.linux.dev>,
        lkp <lkp@...el.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Christian Brauner <brauner@...nel.org>,
        linux-mm <linux-mm@...ck.org>,
        "Huang, Ying" <ying.huang@...el.com>,
        "Yin, Fengwei" <fengwei.yin@...el.com>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>
Subject: Re: [linus:master] [shmem]  a2e459555c:  aim9.disk_src.ops_per_sec
 -19.0% regression

On Tue, Sep 12, 2023 at 11:14:42PM +0800, Feng Tang wrote:
> > Well that's the problem. Since I can't run the reproducer, there's
> > nothing I can do to troubleshoot the problem myself.
> 
> We dug more into the perf and other profiling data from 0Day server
> running this case, and it seems that the new simple_offset_add()
> called by shmem_mknod() brings extra cost related with slab,
> specifically the 'radix_tree_node', which cause the regression.
> 
> Here is some slabinfo diff for commit a2e459555c5f and its parent:
> 
> 	23a31d87645c6527 a2e459555c5f9da3e619b7e47a6 
> 	---------------- --------------------------- 
>  
>      26363           +40.2%      36956        slabinfo.radix_tree_node.active_objs
>     941.00           +40.4%       1321        slabinfo.radix_tree_node.active_slabs
>      26363           +40.3%      37001        slabinfo.radix_tree_node.num_objs
>     941.00           +40.4%       1321        slabinfo.radix_tree_node.num_slabs

I can't find the benchmark source, but my suspicion is that this
creates and deletes a lot of files in a directory.  The 'stable
directory offsets' series uses xa_alloc_cyclic(), so we'll end up
with a very sparse radix tree.  ie it'll look something like this:

0 - "."
1 - ".."
6 - "d"
27 - "y"
4000 - "fzz"
65537 - "czzz"
643289767 - "bzzzzzz"

(i didn't work out the names precisely here, but this is approximately
what you'd get if you create files a-z, aa-zz, aaa-zzz, etc and delete
almost all of them)

The radix tree does not handle this well.  It'll allocate one node for:

entries 0-63 (covers the first 4 entries)
entries 0-4095
entries 3968-4031 (the first 5)
entries 0-262143
entries 65536-69631
entries 65536-65599 (the first 6)
entries 0-16777215
entries 0-1073741823
entries 637534208-654311423
entries 643039232-643301375
entries 643289088-643293183
entries 643289728-643289791 (all 7)

That ends up being 12 nodes (you get 7 nodes per page) to store 7
pointers.  Admittedly to get here, you have to do 643289765 creations
and nearly as many deletions, so are we going to see it in a
non-benchmark situation?

The maple tree is more resilient against this kind of shenanigan, but
we're not there in terms of supporting the kind of allocation you
want.  For this kind of allocation pattern, you'd get all 7 pointers
in a single 256-byte node.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ