[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e4b6sjeh22uqhxhxudsbanlnyo2potwowuy7mkrp6tvxnftjn4@mcjyes2s3eu6>
Date: Wed, 17 Dec 2025 12:52:55 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Uladzislau Rezki <urezki@...il.com>
Cc: Oliver Sang <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
lkp@...el.com, linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>, Michal Hocko <mhocko@...e.com>, Baoquan He <bhe@...hat.com>,
Alexander Potapenko <glider@...gle.com>, Andrey Ryabinin <ryabinin.a.a@...il.com>,
Marco Elver <elver@...gle.com>, Michal Hocko <mhocko@...nel.org>, linux-mm@...ck.org
Subject: Re: [linus:master] [mm/vmalloc] 9c47753167:
stress-ng.bigheap.realloc_calls_per_sec 21.3% regression
On Wed, Dec 17, 2025 at 12:04:20PM +0100, Uladzislau Rezki wrote:
> Hello, Oliver.
>
> > > >
> > > > Hello,
> > > >
> > > > kernel test robot noticed a 21.3% regression of stress-ng.bigheap.realloc_calls_per_sec on:
> > > >
> > > >
> > > > commit: 9c47753167a6a585d0305663c6912f042e131c2d ("mm/vmalloc: defer freeing partly initialized vm_struct")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > >
> > > > [still regression on linus/master c9b47175e9131118e6f221cc8fb81397d62e7c91]
> > > > [still regression on linux-next/master 008d3547aae5bc86fac3eda317489169c3fda112]
> > > >
> > > > testcase: stress-ng
> > > > config: x86_64-rhel-9.4
> > > > compiler: gcc-14
> > > > test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P CPU @ 2.4GHz (Granite Rapids) with 256G memory
> > > > parameters:
> > > >
> > > > nr_threads: 100%
> > > > testtime: 60s
> > > > test: bigheap
> > > > cpufreq_governor: performance
> > > >
> > > >
> > > >
> > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > the same patch/commit), kindly add following tags
> > > > | Reported-by: kernel test robot <oliver.sang@...el.com>
> > > > | Closes: https://lore.kernel.org/oe-lkp/202512121138.986f6a6b-lkp@intel.com
> > > >
> > > >
> >
> > [...]
> >
> > > >
> > > Could you please test below patch and confirm if it solves regression:
> >
> > we directly apply the patch upon 9c47753167, so our test branch looks like below
> >
> > * f7991e8a0136cb <---- below patch from you
> > * 9c47753167a6a5 mm/vmalloc: defer freeing partly initialized vm_struct
> > * 86e968d8ca6dc8 mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area()
> >
> > but found it has little performance impacts
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> > gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp3/bigheap/stress-ng/60s
> >
> > 86e968d8ca6dc823 9c47753167a6a585d0305663c69 f7991e8a0136cb0fdf35f11e28a
> > ---------------- --------------------------- ---------------------------
> > %stddev %change %stddev %change %stddev
> > \ | \ | \
> > 48320196 -10.9% 43072080 -10.8% 43116499 stress-ng.bigheap.ops
> > 785159 -9.8% 708390 -9.7% 708644 stress-ng.bigheap.ops_per_sec
> > 879805 -21.3% 692805 -20.7% 697312 stress-ng.bigheap.realloc_calls_per_sec
> >
> Thank you for testing. I had same expectations. No difference.
> Honestly i can not figure out how:
>
> * 9c47753167a6a5 mm/vmalloc: defer freeing partly initialized vm_struct
> * 86e968d8ca6dc8 mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area()
>
> can effect performance. I am not doing anything related to performance.
> I would like to ask you if you could test one more thing. I see that
>
> [still regression on linus/master c9b47175e9131118e6f221cc8fb81397d62e7c91]
>
> contains also below patch:
>
> <snip>
> commit a0615780439938e8e61343f1f92a4c54a71dc6a5
> mm/vmalloc: request large order pages from buddy allocator
> <snip>
>
> where we try to use larger order for vmalloc. Could you please revert
> it and rerun same tests?
>
This being stress-ng it is not doing what you think it is doing.
Profile shows increased contention on swapinfo spinlock:
%stddev %change %stddev
\ | \
40.08 ± 2% +9.8 49.92 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.si_swapinfo.do_sysinfo.__do_sys_sysinfo
The spinlock and the data it operates on are not annotated.
The commit deferring freeing adds 2 global vars which most likely
shifted things around to add cacheline bouncing.
That's a 3rd case in last few weeks that I know of.
I asked gcc people to do osmething about it, so far no takers: https://gcc.gnu.org/pipermail/gcc/2024-October/245004.html
Powered by blists - more mailing lists