linux-kernel - Re: [linus:master] [mm] f822a9a81a: stress-ng.bigheap.realloc_calls_per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <96e931a6-c70e-4a11-9e8c-c5a08da7f512@arm.com>
Date: Thu, 7 Aug 2025 21:36:38 +0530
From: Dev Jain <dev.jain@....com>
To: David Hildenbrand <david@...hat.com>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
 Andrew Morton <akpm@...ux-foundation.org>, Barry Song <baohua@...nel.org>,
 Pedro Falcato <pfalcato@...e.de>,
 Anshuman Khandual <anshuman.khandual@....com>,
 Bang Li <libang.li@...group.com>, Baolin Wang
 <baolin.wang@...ux.alibaba.com>, bibo mao <maobibo@...ngson.cn>,
 Hugh Dickins <hughd@...gle.com>, Ingo Molnar <mingo@...nel.org>,
 Jann Horn <jannh@...gle.com>, Lance Yang <ioworker0@...il.com>,
 Liam Howlett <liam.howlett@...cle.com>, Matthew Wilcox
 <willy@...radead.org>, Peter Xu <peterx@...hat.com>,
 Qi Zheng <zhengqi.arch@...edance.com>, Ryan Roberts <ryan.roberts@....com>,
 Vlastimil Babka <vbabka@...e.cz>, Yang Shi <yang@...amperecomputing.com>,
 Zi Yan <ziy@...dia.com>, linux-mm@...ck.org
Subject: Re: [linus:master] [mm] f822a9a81a:
 stress-ng.bigheap.realloc_calls_per_sec 37.3% regression


On 07/08/25 3:51 pm, David Hildenbrand wrote:
> On 07.08.25 10:27, Lorenzo Stoakes wrote:
>> On Thu, Aug 07, 2025 at 04:17:09PM +0800, kernel test robot wrote:
>>>
>>>
>>> Hello,
>>>
>>> kernel test robot noticed a 37.3% regression of 
>>> stress-ng.bigheap.realloc_calls_per_sec on:
>>>
>>
>> Dev - could you please investigate and provide a fix for this as a
>> priority? As these numbers are quite scary (unless they're somehow super
>> synthetic or not meaningful or something).
>>
>>>
>>> commit: f822a9a81a31311d67f260aea96005540b18ab07 ("mm: optimize 
>>> mremap() by PTE batching")
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>
>>> [still regression on      linus/master 
>>> 186f3edfdd41f2ae87fc40a9ccba52a3bf930994]
>>> [still regression on linux-next/master 
>>> b9ddaa95fd283bce7041550ddbbe7e764c477110]
>>>
>>> testcase: stress-ng
>>> config: x86_64-rhel-9.4
>>> compiler: gcc-12
>>> test machine: 192 threads 2 sockets Intel(R) Xeon(R) Platinum 8468V  
>>> CPU @ 2.4GHz (Sapphire Rapids) with 384G memory
>>> parameters:
>>>
>>>     nr_threads: 100%
>>>     testtime: 60s
>>>     test: bigheap
>>>     cpufreq_governor: performance
>>>
>>>
>>>
>>>
>>> If you fix the issue in a separate patch/commit (i.e. not just a new 
>>> version of
>>> the same patch/commit), kindly add following tags
>>> | Reported-by: kernel test robot <oliver.sang@...el.com>
>>> | Closes: 
>>> https://lore.kernel.org/oe-lkp/202508071609.4e743d7c-lkp@intel.com
>>>
>>>
>>> Details are as below:
>>> --------------------------------------------------------------------------------------------------> 
>>>
>>>
>>>
>>> The kernel config and materials to reproduce are available at:
>>> https://download.01.org/0day-ci/archive/20250807/202508071609.4e743d7c-lkp@intel.com 
>>>
>>>
>>> ========================================================================================= 
>>>
>>> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: 
>>>
>>> gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/igk-spr-2sp1/bigheap/stress-ng/60s
>>>
>>> commit:
>>>    94dab12d86 ("mm: call pointers to ptes as ptep")
>>>    f822a9a81a ("mm: optimize mremap() by PTE batching")
>>>
>>> 94dab12d86cf77ff f822a9a81a31311d67f260aea96
>>> ---------------- ---------------------------
>>>           %stddev     %change         %stddev
>>>               \          |                \
>>>       13777 ± 37%     +45.0%      19979 ± 27% 
>>> numa-vmstat.node1.nr_slab_reclaimable
>>>      367205            +2.3%     375703 vmstat.system.in
>>>       55106 ± 37%     +45.1%      79971 ± 27% 
>>> numa-meminfo.node1.KReclaimable
>>>       55106 ± 37%     +45.1%      79971 ± 27% 
>>> numa-meminfo.node1.SReclaimable
>>>      559381           -37.3%     350757 
>>> stress-ng.bigheap.realloc_calls_per_sec
>>>       11468            +1.2%      11603 stress-ng.time.system_time
>>>      296.25            +4.5%     309.70 stress-ng.time.user_time
>>>        0.81 ±187%    -100.0%       0.00 
>>> perf-sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
>>>        9.36 ±165%    -100.0%       0.00 
>>> perf-sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
>>>        0.81 ±187%    -100.0%       0.00 
>>> perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
>>>        9.36 ±165%    -100.0%       0.00 
>>> perf-sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
>>>        5.50 ± 17%    +390.9%      27.00 ± 56% perf-c2c.DRAM.local
>>>      388.50 ± 10%    +114.7%     834.17 ± 33% perf-c2c.DRAM.remote
>>>        1214 ± 13%    +107.3%       2517 ± 31% perf-c2c.HITM.local
>>>      135.00 ± 19%    +130.9%     311.67 ± 32% perf-c2c.HITM.remote
>>>        1349 ± 13%    +109.6%       2829 ± 31% perf-c2c.HITM.total
>>
>> Yeah this also looks pretty consistent too...
>
> It almost looks like some kind of NUMA effects?
>
> I would have expected that it's the overhead of the vm_normal_folio(), 
> but not sure how that corresponds to the SLAB + local vs. remote 
> stats. Maybe they are just noise?
Is there any way of making the robot test again? As you said, the only 
suspect is vm_normal_folio(), nothing seems to pop up...