lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221223024321.itxwvcdyckepnyiz@revolver>
Date:   Fri, 23 Dec 2022 02:45:17 +0000
From:   Liam Howlett <liam.howlett@...cle.com>
To:     "Yin, Fengwei" <fengwei.yin@...el.com>
CC:     Yang Shi <shy828301@...il.com>, Yujie Liu <yujie.liu@...el.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "oe-lkp@...ts.linux.dev" <oe-lkp@...ts.linux.dev>,
        "lkp@...el.com" <lkp@...el.com>,
        Nathan Chancellor <nathan@...nel.org>,
        "Huang, Ying" <ying.huang@...el.com>,
        Rik van Riel <riel@...riel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "feng.tang@...el.com" <feng.tang@...el.com>,
        "zhengjun.xing@...ux.intel.com" <zhengjun.xing@...ux.intel.com>
Subject: Re: [linus:master] [mm] 0ba09b1733: will-it-scale.per_thread_ops
 -21.1% regression in mmap1 benchmark

* Yin, Fengwei <fengwei.yin@...el.com> [221221 20:19]:
> 
> 
> On 12/22/2022 12:45 AM, Yang Shi wrote:
> >> We caught two mmap1 regressions on mailine, please see the data below:
> >>
> >> 830b3c68c1fb1 Linux 6.1                                                              2085 2355 2088
> >> 76dcd734eca23 Linux 6.1-rc8                                                          2093 2082 2094 2073 2304 2088
> >> 0ba09b1733878 Revert "mm: align larger anonymous mappings on THP boundaries"         2124 2286 2086 2114 2065 2081
> >> 23393c6461422 char: tpm: Protect tpm_pm_suspend with locks                           2756 2711 2689 2696 2660 2665
> >> b7b275e60bcd5 Linux 6.1-rc7                                                          2670 2656 2720 2691 2667
> >> ...
> >> 9abf2313adc1c Linux 6.1-rc1                                                          2725 2717 2690 2691 2710
> >> 3b0e81a1cdc9a mmap: change zeroing of maple tree in __vma_adjust()                   2736 2781 2748
> >> 524e00b36e8c5 mm: remove rb tree.                                                    2747 2744 2747
> >> 0c563f1480435 proc: remove VMA rbtree use from nommu
> >> d0cf3dd47f0d5 damon: convert __damon_va_three_regions to use the VMA iterator
> >> 3499a13168da6 mm/mmap: use maple tree for unmapped_area{_topdown}
> >> 7fdbd37da5c6f mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree
> >> f39af05949a42 mm: add VMA iterator
> >> d4af56c5c7c67 mm: start tracking VMAs with maple tree
> >> e15e06a839232 lib/test_maple_tree: add testing for maple tree                        4638 4628 4502
> >> 9832fb87834e2 mm/demotion: expose memory tier details via sysfs                      4625 4509 4548
> >> 4fe89d07dcc28 Linux 6.0                                                              4385 4205 4348 4228 4504
> >>
> >>
> >> The first regression was between v6.0 and v6.1-rc1. The score dropped
> >> from 4600 to 2700, and bisected to the patches switching from rb tree to
> >> maple tree. This was reported at
> >> https://lore.kernel.org/oe-lkp/202212191714.524e00b3-yujie.liu@intel.com/
> >> Thanks for the explanation that it is an expected regression as a trade
> >> off to benefit read performance.
> >>
> >> The second regression was between v6.1-rc7 and v6.1-rc8. The score
> >> dropped from 2700 to 2100, and bisected to this "Revert "mm: align larger
> >> anonymous mappings on THP boundaries"" commit.
> > So it means "mm: align larger anonymous mappings on THP boundaries"
> > actually improved the mmap1 benchmark? But it caused regression for
> > other usecase, for example, building kernel with clang, which is a
> > regression for a real life usecase.
> Yes. The patch "mm: align larger anonymous mappings on THP boundaries"
> can improve the mmap1 benchmark.
> 

If the aligned VMAs cannot be merged, then they do not need to be split
on freeing.  This means we are just allocating a new vma, write it in
the tree, removing it from the tree, free the vma.  We can do this 4600
times a second, apparently.

If the VMAs do get merged, we will go through __vma_adjust() to expand a
boundary, write it to the tree, allocate a new vma, __vma_adjust() the
vma boundary back, insert the new VMA that covers the boundary area,
remove the new vma from the tree, free the vma.  We can only do this
2700 times a second.  Note this is writing 3 times to the tree in this
loop vs 2 in the other option.

So yes, merging/splitting is more work and always has been.  We are
doing this to avoid having too many VMAs though.  There really isn't a
good reason an application would do this for any meaningful number of
iterations.

> For building kernel regression, looks like it's not related with the
> patch "mm: align larger anonymous mappings on THP boundaries" directly.
> It's another existing behavior more visible with the patch.
> https://lore.kernel.org/all/a4bcddad-e56f-cedc-891a-916e86d9a02c@intel.com/
> 

Having a snapshot of the VMA layout would help here since the THP
boundary alignment may be changing if the VMAs can be merged or not.  I
suspect it is not able to merge and is fragmenting the VMA space which
would speed up this benchmark at the expense of having more VMAs.

Thanks,
Liam

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ