lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <404ae0cb-9c70-4aa1-99ef-b5e90c500140@lucifer.local>
Date: Thu, 7 Aug 2025 18:07:39 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Dev Jain <dev.jain@....com>
Cc: David Hildenbrand <david@...hat.com>,
        kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
        lkp@...el.com, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Barry Song <baohua@...nel.org>, Pedro Falcato <pfalcato@...e.de>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Bang Li <libang.li@...group.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        bibo mao <maobibo@...ngson.cn>, Hugh Dickins <hughd@...gle.com>,
        Ingo Molnar <mingo@...nel.org>, Jann Horn <jannh@...gle.com>,
        Lance Yang <ioworker0@...il.com>,
        Liam Howlett <liam.howlett@...cle.com>,
        Matthew Wilcox <willy@...radead.org>, Peter Xu <peterx@...hat.com>,
        Qi Zheng <zhengqi.arch@...edance.com>,
        Ryan Roberts <ryan.roberts@....com>, Vlastimil Babka <vbabka@...e.cz>,
        Yang Shi <yang@...amperecomputing.com>, Zi Yan <ziy@...dia.com>,
        linux-mm@...ck.org
Subject: Re: [linus:master] [mm] f822a9a81a:
 stress-ng.bigheap.realloc_calls_per_sec 37.3% regression

On Thu, Aug 07, 2025 at 10:34:43PM +0530, Dev Jain wrote:
>
> On 07/08/25 9:46 pm, Lorenzo Stoakes wrote:
> > On Thu, Aug 07, 2025 at 05:10:17PM +0100, Lorenzo Stoakes wrote:
> > > On Thu, Aug 07, 2025 at 09:36:38PM +0530, Dev Jain wrote:
> > >
> > > > > > > commit:
> > > > > > >     94dab12d86 ("mm: call pointers to ptes as ptep")
> > > > > > >     f822a9a81a ("mm: optimize mremap() by PTE batching")
> > > > > > >
> > > > > > > 94dab12d86cf77ff f822a9a81a31311d67f260aea96
> > > > > > > ---------------- ---------------------------
> > > > > > >            %stddev     %change         %stddev
> > > > > > >                \          |                \
> > > > > > >        13777 ± 37%     +45.0%      19979 ± 27%
> > > > > > > numa-vmstat.node1.nr_slab_reclaimable
> > > > > > >       367205            +2.3%     375703 vmstat.system.in
> > > > > > >        55106 ± 37%     +45.1%      79971 ± 27%
> > > > > > > numa-meminfo.node1.KReclaimable
> > > > > > >        55106 ± 37%     +45.1%      79971 ± 27%
> > > > > > > numa-meminfo.node1.SReclaimable
> > > > > > >       559381           -37.3%     350757
> > > > > > > stress-ng.bigheap.realloc_calls_per_sec
> > > > > > >        11468            +1.2%      11603 stress-ng.time.system_time
> > > > > > >       296.25            +4.5%     309.70 stress-ng.time.user_time
> > > > > > >         0.81 ±187%    -100.0%       0.00 perf-sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
> > > > > > >         9.36 ±165%    -100.0%       0.00 perf-sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
> > > > > > >         0.81 ±187%    -100.0%       0.00 perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
> > > > > > >         9.36 ±165%    -100.0%       0.00 perf-sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
> > > Hm is lack of zap some kind of clue here?
> > >
> > > > > > >         5.50 ± 17%    +390.9%      27.00 ± 56% perf-c2c.DRAM.local
> > > > > > >       388.50 ± 10%    +114.7%     834.17 ± 33% perf-c2c.DRAM.remote
> > > > > > >         1214 ± 13%    +107.3%       2517 ± 31% perf-c2c.HITM.local
> > > > > > >       135.00 ± 19%    +130.9%     311.67 ± 32% perf-c2c.HITM.remote
> > > > > > >         1349 ± 13%    +109.6%       2829 ± 31% perf-c2c.HITM.total
> > > > > > Yeah this also looks pretty consistent too...
> > > > > It almost looks like some kind of NUMA effects?
> > > > >
> > > > > I would have expected that it's the overhead of the vm_normal_folio(),
> > > > > but not sure how that corresponds to the SLAB + local vs. remote stats.
> > > > > Maybe they are just noise?
> > > > Is there any way of making the robot test again? As you said, the only
> > > > suspect is vm_normal_folio(), nothing seems to pop up...
> > > >
> > > Not sure there's much point in that, these tests are run repeatedly and
> > > statistical analysis taken from them so what would another run accomplish unless
> > > there's something very consistently wrong with the box that happens only to
> > > trigger at your commit?
> > >
> > > Cheers, Lorenzo
> > Let me play around on my test box roughly and see if I can repro
>
> So I tested with
> ./stress-ng --timeout 1 --times --verify --metrics --no-rand-seed --oom-avoid --bigheap 20
> extracted the number out of the line containing the output "realloc calls per sec", did an
> avg and standard deviation over 20 runs. Before the patch:
>
> Average realloc calls/sec: 196907.380000
> Standard deviation        : 12685.721021
>
> After the patch:
>
> Average realloc calls/sec: 187894.300500
> Standard deviation        : 12494.153533
>
> which is 5% approx.
>

Are you testing that on x86-64 bare metal?

Anyway this is _not_ what I get.

I am testing on my test box, and seeing a _very significant_ regression as reported.

I am narrowing down the exact cause and will report back. Non-NUMA box, recent
uArch, dedicated machine.

Cheers, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ