linux-kernel - Re: [PATCH v5 0/2] sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ec69adf2-4eb5-4e38-804f-804d1dde0e84@oracle.com>
Date: Thu, 24 Apr 2025 00:46:34 -0700
From: Libo Chen <libo.chen@...cle.com>
To: Venkat Rao Bagalkote <venkat88@...ux.ibm.com>, akpm@...ux-foundation.org,
        rostedt@...dmis.org, peterz@...radead.org, mgorman@...e.de,
        mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
        tj@...nel.org, llong@...hat.com
Cc: sraithal@....com, kprateek.nayak@....com, raghavendra.kt@....com,
        yu.c.chen@...el.com, tim.c.chen@...el.com, vineethr@...ux.ibm.com,
        chris.hyser@...cle.com, daniel.m.jordan@...cle.com,
        lorenzo.stoakes@...cle.com, mkoutny@...e.com, linux-mm@...ck.org,
        cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 0/2] sched/numa: Skip VMA scanning on memory pinned to
 one NUMA node via cpuset.mems



On 4/24/25 00:05, Venkat Rao Bagalkote wrote:
> 
> On 24/04/25 8:15 am, Libo Chen wrote:
>> v1->v2:
>> 1. add perf improvment numbers in commit log. Yet to find perf diff on
>> will-it-scale, so not included here. Plan to run more workloads.
>> 2. add tracepoint.
>> 3. To peterz's comment, this will make it impossible to attract tasks to
>> those memory just like other VMA skippings. This is the current
>> implementation, I think we can improve that in the future, but at the
>> moment it's probabaly better to keep it consistent.
>>
>> v2->v3:
>> 1. add enable_cpuset() based on Mel's suggestion but again I think it's
>> redundant.
>> 2. print out nodemask with %*p.. format in the tracepoint.
>>
>> v3->v4:
>> 1. fix an unsafe dereference of a pointer to content not on ring buffer,
>> namely mem_allowed_ptr in the tracepoint.
>>
>> v4->v5:
>> 1. add BUILD_BUG_ON() in TP_fast_assign() to guard against future
>> changes (particularly in size) in nodemask_t.
>>
>> Libo Chen (2):
>>    sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
>>      cpuset.mems
>>    sched/numa: Add tracepoint that tracks the skipping of numa balancing
>>      due to cpuset memory pinning
>>
>>   include/trace/events/sched.h | 33 +++++++++++++++++++++++++++++++++
>>   kernel/sched/fair.c          |  9 +++++++++
>>   2 files changed, 42 insertions(+)
>>
> Hello Libo,
> 
> 
> For some reason I am not able to apply this patch. I am trying to test the boot warning[1].
> 
> I am trying to apply on top of next-20250423. Below is the error. Am I missing anything?
> 
> [1]: https://urldefense.com/v3/__https://lore.kernel.org/all/20250422205740.02c4893a@canb.auug.org.au/__;!!ACWV5N9M2RV99hQ!IQpY9WDL1O3ppDekb1PpaTYJ98ehOXL6dNIkx02MPN84bCieT18zCh7WSouHctEGpwG2rtpZB42l7b5mkMFb$
> Error:
> 
> git am -i v5_20250423_libo_chen_sched_numa_skip_vma_scanning_on_memory_pinned_to_one_numa_node_via_cpuset_mems.mbx
> Commit Body is:
> --------------------------
> sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems
> 
> When the memory of the current task is pinned to one NUMA node by cgroup,
> there is no point in continuing the rest of VMA scanning and hinting page
> faults as they will just be overhead. With this change, there will be no
> more unnecessary PTE updates or page faults in this scenario.
> 
> We have seen up to a 6x improvement on a typical java workload running on
> VMs with memory and CPU pinned to one NUMA node via cpuset in a two-socket
> AARCH64 system. With the same pinning, on a 18-cores-per-socket Intel
> platform, we have seen 20% improvment in a microbench that creates a
> 30-vCPU selftest KVM guest with 4GB memory, where each vCPU reads 4KB
> pages in a fixed number of loops.
> 
> Signed-off-by: Libo Chen <libo.chen@...cle.com>
> Tested-by: Chen Yu <yu.c.chen@...el.com>
> Tested-by: K Prateek Nayak <kprateek.nayak@....com>
> --------------------------
> Apply? [y]es/[n]o/[e]dit/[v]iew patch/[a]ccept all: a
> Applying: sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems
> error: patch failed: kernel/sched/fair.c:3329
> error: kernel/sched/fair.c: patch does not apply
> Patch failed at 0001 sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems
> 
> 
Hi Venkat,

I just did git am -i t.mbox on top of next-20250423, not sure why but the second patch was ahead of the
first patch in apply order, have you made sure the second patch was not applied before the first one?

- Libo
> Regards,
> 
> Venkat.
> 
> 
>