lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <46291879-83d4-4e03-9c3a-74872e44b0d6@linux.ibm.com>
Date: Thu, 17 Apr 2025 09:09:59 +0530
From: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
To: Chris Hyser <chris.hyser@...cle.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        "longman@...hat.com" <longman@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Subject: Re: [PATCH 1/2] sched/numa: Add ability to override task's
 numa_preferred_nid.

On 17/04/25 02:43, Chris Hyser wrote:
>> From: Madadi Vineeth Reddy
>> Sent: Wednesday, April 16, 2025 3:00 AM
>> To: Chris Hyser
>> Cc: Peter Zijlstra; Mel Gorman; longman@...hat.com; linux-kernel@...r.kernel.org; Madadi Vineeth Reddy
>> Subject: Re: [PATCH 1/2] sched/numa: Add ability to override task's numa_preferred_nid.
>>
>>
>> Hi Chris,
>>
>> On 15/04/25 07:05, Chris Hyser wrote:
>>> From: chris hyser <chris.hyser@...cle.com>
>>>
>>
>> [..snip..]
>>
>>> The following results were from TPCC runs on an Oracle Database. The system
>>> was a 2-node Intel machine with a database running on each node with local
>>> memory allocations. No tasks or memory were pinned.
>>>
>>> There are four scenarios of interest:
>>>
>>> - Auto NUMA Balancing OFF.
>>>      base value
>>>
>>> - Auto NUMA Balancing ON.
>>>      1.2% - ANB ON better than ANB OFF.
>>>
>>> - Use the prctl(), ANB ON, parameters set to prevent faulting.
>>>      2.4% - prctl() better then ANB OFF.
>>>      1.2% - prctl() better than ANB ON.
>>>
>>> - Use the prctl(), ANB parameters normal.
>>>      3.1% - prctl() and ANB ON better than ANB OFF.
>>>      1.9% - prctl() and ANB ON better than just ANB ON.
>>>      0.7% - prctl() and ANB ON better than prctl() and ANB ON/faulting off
>>>
>>
>> Are you using prctl() to set the preferred node id for all the tasks of your run?
>> If yes, then how `prctl() and ANB ON better than prctl() and ANB ON/faulting off`
>> case happens?
> 
> Not every task in the system (including some DB tasks) has a prctl() set preferred node as the expected preference is not always known. So that is part of it, however the bigger influence even with a prctl() set preferred node, is that faulting drives physical page migration.  You only want to migrate pages that the task is accessing. The fault tells you it was accessed and what node it is currently in allowing a migration decision to be made.
> 

Yes, understood.

>> IIUC, when setting preferred node in numa_preferred_nid_force, the original
>> numa_preferred_nid which is derived from page faults will be a nop which should
>> be an overhead.
> 
> As mentioned above faulting drives physical page migration with the usual trade-off between faulting overhead and the benefits of consolidating pages on the same node. 
> 
> One issue I've seen repeatably is that if you monitor a task (numa fields in /proc/<pid>/sched) some tasks keep changing their preferred node. This makes sense since spatial access locality can change over time, but you also see the migrated page count going up independent of which node is currently preferred. So on a two node system, there are pages being migrated back and forth (not necessarily the same ones). One possible effect of forcing the preferred node is that it isn't changing and migrated pages should be going the same way. 
> 
>> Let me know if my understanding is correct. Also, can you tell how to set the
>> parameters of ANB to prevent faulting.
> 
> Basically, I set the sampling periods to a large number of seconds. Sampling frequency then is 1/large is ~0. Monitoring the task again, it should show no NUMA faults and no pages migrated. 
> 
> kernel.numa_balancing : 1
> scan_period_max_ms: 4294967295
> scan_period_min_ms: 4294967295
> scan_delay_ms: 4294967295
>

Got it. Thanks for the explanation.

Thanks,
Madadi Vineeth Reddy
 
> -chrish


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ