[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+fM0k+=Qw0M0fso1f-Ya8--5+==gtcWqCpo=Gu-ca1Ow@mail.gmail.com>
Date: Sat, 12 Mar 2022 15:26:12 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: kernel test robot <oliver.sang@...el.com>,
Mel Gorman <mgorman@...hsingularity.net>,
0day robot <lkp@...el.com>, Michal Hocko <mhocko@...nel.org>,
Shakeel Butt <shakeelb@...gle.com>,
Wei Xu <weixugc@...gle.com>, Greg Thelen <gthelen@...gle.com>,
Hugh Dickins <hughd@...gle.com>,
David Rientjes <rientjes@...gle.com>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
"Huang, Ying" <ying.huang@...el.com>,
"Tang, Feng" <feng.tang@...el.com>, zhengjun.xing@...ux.intel.com,
fengwei.yin@...el.com, Eric Dumazet <eric.dumazet@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-mm <linux-mm@...ck.org>
Subject: Re: [mm/page_alloc] 8212a964ee: vm-scalability.throughput 30.5% improvement
On Sat, Mar 12, 2022 at 10:59 AM Vlastimil Babka <vbabka@...e.cz> wrote:
>
> On 3/12/22 16:43, kernel test robot wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed a 30.5% improvement of vm-scalability.throughput due to commit:
> >
> >
> > commit: 8212a964ee020471104e34dce7029dec33c218a9 ("Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held")
> > url: https://github.com/0day-ci/linux/commits/Mel-Gorman/Re-PATCH-v2-mm-page_alloc-call-check_new_pages-while-zone-spinlock-is-not-held/20220309-203504
> > patch link: https://lore.kernel.org/lkml/20220309123245.GI15701@techsingularity.net
>
> Heh, that's weird. I would expect some improvement from Eric's patch,
> but this seems to be actually about Mel's "mm/page_alloc: check
> high-order pages for corruption during PCP operations" applied directly
> on 5.17-rc7 per the github url above. This was rather expected to make
> performance worse if anything, so maybe the improvement is due to some
> unexpected side-effect of different inlining decisions or cache alignment...
>
I doubt this has anything to do with inlining or cache alignment.
I am not familiar with the benchmark, but its name
(anon-w-rand-hugetlb) hints at hugetlb ?
After Mel fix, we go over 512 'struct page' to perform sanity checks,
thus loading into cpu caches the 512 cache lines.
This caching is done while no lock is held.
If after this huge page allocation some mm operation needs to access
these 512 struct pages,
while holding a lock, then sure, there is a huge gain.
Powered by blists - more mailing lists