[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <50a55a42-6d79-4e3c-992c-26a96dc12d81@redhat.com>
Date: Wed, 16 Apr 2025 11:16:15 +0200
From: David Hildenbrand <david@...hat.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Andy Lutomirks^H^Hski <luto@...nel.org>, Borislav Betkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, Ingo Molnar <mingo@...hat.com>,
Jann Horn <jannh@...gle.com>, Johannes Weiner <hannes@...xchg.org>,
Jonathan Corbet <corbet@....net>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Lance Yang <ioworker0@...il.com>, Liam Howlett <liam.howlett@...cle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Matthew Wilcow <willy@...radead.org>, Michal Koutn <mkoutny@...e.com>,
Muchun Song <muchun.song@...ux.dev>, tejun heo <tj@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>, Vlastimil Babka <vbabka@...e.cz>,
Zefan Li <lizefan.x@...edance.com>, linux-mm@...ck.org
Subject: Re: [linus:master] [mm/rmap] 6af8cb80d3: vm-scalability.throughput
7.8% regression
On 16.04.25 10:07, David Hildenbrand wrote:
> On 16.04.25 09:01, kernel test robot wrote:
>>
>>
>> Hello,
>>
>> kernel test robot noticed a 7.8% regression of vm-scalability.throughput on:
>>
>>
>> commit: 6af8cb80d3a9a6bbd521d8a7c949b4eafb7dba5d ("mm/rmap: basic MM owner tracking for large folios (!hugetlb)")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>>
>> testcase: vm-scalability
>> config: x86_64-rhel-9.4
>> compiler: gcc-12
>> test machine: 256 threads 2 sockets GENUINE INTEL(R) XEON(R) (Sierra Forest) with 128G memory
>> parameters:
>>
>> runtime: 300s
>> size: 8T
>> test: anon-cow-seq
>> cpufreq_governor: performance
>>
>
> This should be the scenario with THP enabled. At first, I thought the
> problem would be contention on the per-folio spinlock, but what makes me
> scratch my head is the following:
>
> 13401 -16.5% 11190 proc-vmstat.thp_fault_alloc
> ... 3430623 -16.5% 2864565 proc-vmstat.thp_split_pmd
>
>
> If we allocate less THP, performance of the benchmark will obviously be
> worse with less THPs.
>
> We allocated 2211 less THPs and had 566058 less THP PMD->PTE remappings.
>
> 566058 / 2211 = 256, which is exactly the number of threads ->
> vm-scalability fork'ed child processes.
>
> So it was in fact the benchmark that was effectively using 16.5% less THPs.
>
> I don't see how this patch would affect the allocation of THPs in any
> way (and I don't think it does).
Thinking about this some more: Assuming both runs execute the same test
executions, we would expect the number of allocated THPs to not change
(unless we really have fragmentation that results in less THP getting
allocated).
Assuming we run into a timeout after 300s and abort the test earlier, we
could end up with a difference in executions and, therefore THP allocations.
I recall that usually we try to have the same benchmark executions and
not run into the timeout (otherwise some of these stats, like THP
allocations are completely unreliable).
Maybe
7.968e+09 -16.5% 6.652e+09 vm-scalability.workload
indicates that we ended up with less executions? At least the
"repro-script" seems to indicate that we always execute a fixed number
of executions, but maybe the repo-script is aborted by the benchmark
framework.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists