[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b4e14b1-c212-4207-aa4d-aa5610148abd@arm.com>
Date: Thu, 7 Aug 2025 22:41:21 +0530
From: Dev Jain <dev.jain@....com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: David Hildenbrand <david@...hat.com>,
kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
lkp@...el.com, linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>, Barry Song <baohua@...nel.org>,
Pedro Falcato <pfalcato@...e.de>,
Anshuman Khandual <anshuman.khandual@....com>,
Bang Li <libang.li@...group.com>, Baolin Wang
<baolin.wang@...ux.alibaba.com>, bibo mao <maobibo@...ngson.cn>,
Hugh Dickins <hughd@...gle.com>, Ingo Molnar <mingo@...nel.org>,
Jann Horn <jannh@...gle.com>, Lance Yang <ioworker0@...il.com>,
Liam Howlett <liam.howlett@...cle.com>, Matthew Wilcox
<willy@...radead.org>, Peter Xu <peterx@...hat.com>,
Qi Zheng <zhengqi.arch@...edance.com>, Ryan Roberts <ryan.roberts@....com>,
Vlastimil Babka <vbabka@...e.cz>, Yang Shi <yang@...amperecomputing.com>,
Zi Yan <ziy@...dia.com>, linux-mm@...ck.org
Subject: Re: [linus:master] [mm] f822a9a81a:
stress-ng.bigheap.realloc_calls_per_sec 37.3% regression
On 07/08/25 10:37 pm, Lorenzo Stoakes wrote:
> On Thu, Aug 07, 2025 at 10:34:43PM +0530, Dev Jain wrote:
>> On 07/08/25 9:46 pm, Lorenzo Stoakes wrote:
>>> On Thu, Aug 07, 2025 at 05:10:17PM +0100, Lorenzo Stoakes wrote:
>>>> On Thu, Aug 07, 2025 at 09:36:38PM +0530, Dev Jain wrote:
>>>>
>>>>>>>> commit:
>>>>>>>> 94dab12d86 ("mm: call pointers to ptes as ptep")
>>>>>>>> f822a9a81a ("mm: optimize mremap() by PTE batching")
>>>>>>>>
>>>>>>>> 94dab12d86cf77ff f822a9a81a31311d67f260aea96
>>>>>>>> ---------------- ---------------------------
>>>>>>>> %stddev %change %stddev
>>>>>>>> \ | \
>>>>>>>> 13777 ± 37% +45.0% 19979 ± 27%
>>>>>>>> numa-vmstat.node1.nr_slab_reclaimable
>>>>>>>> 367205 +2.3% 375703 vmstat.system.in
>>>>>>>> 55106 ± 37% +45.1% 79971 ± 27%
>>>>>>>> numa-meminfo.node1.KReclaimable
>>>>>>>> 55106 ± 37% +45.1% 79971 ± 27%
>>>>>>>> numa-meminfo.node1.SReclaimable
>>>>>>>> 559381 -37.3% 350757
>>>>>>>> stress-ng.bigheap.realloc_calls_per_sec
>>>>>>>> 11468 +1.2% 11603 stress-ng.time.system_time
>>>>>>>> 296.25 +4.5% 309.70 stress-ng.time.user_time
>>>>>>>> 0.81 ±187% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
>>>>>>>> 9.36 ±165% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
>>>>>>>> 0.81 ±187% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
>>>>>>>> 9.36 ±165% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
>>>> Hm is lack of zap some kind of clue here?
>>>>
>>>>>>>> 5.50 ± 17% +390.9% 27.00 ± 56% perf-c2c.DRAM.local
>>>>>>>> 388.50 ± 10% +114.7% 834.17 ± 33% perf-c2c.DRAM.remote
>>>>>>>> 1214 ± 13% +107.3% 2517 ± 31% perf-c2c.HITM.local
>>>>>>>> 135.00 ± 19% +130.9% 311.67 ± 32% perf-c2c.HITM.remote
>>>>>>>> 1349 ± 13% +109.6% 2829 ± 31% perf-c2c.HITM.total
>>>>>>> Yeah this also looks pretty consistent too...
>>>>>> It almost looks like some kind of NUMA effects?
>>>>>>
>>>>>> I would have expected that it's the overhead of the vm_normal_folio(),
>>>>>> but not sure how that corresponds to the SLAB + local vs. remote stats.
>>>>>> Maybe they are just noise?
>>>>> Is there any way of making the robot test again? As you said, the only
>>>>> suspect is vm_normal_folio(), nothing seems to pop up...
>>>>>
>>>> Not sure there's much point in that, these tests are run repeatedly and
>>>> statistical analysis taken from them so what would another run accomplish unless
>>>> there's something very consistently wrong with the box that happens only to
>>>> trigger at your commit?
>>>>
>>>> Cheers, Lorenzo
>>> Let me play around on my test box roughly and see if I can repro
>> So I tested with
>> ./stress-ng --timeout 1 --times --verify --metrics --no-rand-seed --oom-avoid --bigheap 20
>> extracted the number out of the line containing the output "realloc calls per sec", did an
>> avg and standard deviation over 20 runs. Before the patch:
>>
>> Average realloc calls/sec: 196907.380000
>> Standard deviation : 12685.721021
>>
>> After the patch:
>>
>> Average realloc calls/sec: 187894.300500
>> Standard deviation : 12494.153533
>>
>> which is 5% approx.
>>
> Are you testing that on x86-64 bare metal?
Qemu VM on x86-64.
>
> Anyway this is _not_ what I get.
>
> I am testing on my test box, and seeing a _very significant_ regression as reported.
>
> I am narrowing down the exact cause and will report back. Non-NUMA box, recent
> uArch, dedicated machine.
Oops. Thanks for testing. Lemme stare at my patch for some more time :)
>
> Cheers, Lorenzo
Powered by blists - more mailing lists