linux-kernel - Re: [PATCH v1] sched/fair: fix inconsistency in update_task_scan

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <251a903f-e5c3-3cb1-1af7-9ce469fdc7e9@126.com>
Date:   Wed, 26 Apr 2023 09:46:26 +0800
From:   zhaomengmeng <zhaomzhao@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org,
        Zhao Mengmeng <zhaomengmeng@...inos.cn>
Subject: Re: [PATCH v1] sched/fair: fix inconsistency in
 update_task_scan_period

On 2023/4/25 18:24, Peter Zijlstra wrote:
> On Tue, Apr 25, 2023 at 06:02:04AM -0400, zhaomzhao@....com wrote:
>> From: Zhao Mengmeng <zhaomengmeng@...inos.cn>
>>
>> During calculate numa_scan_period diff, the actual code
>> and the comment are inconsistent. The comment says it is
>> using shared faults ratio, but code uses private faults
>> ratio. This patch fixes it.
> 
> So for some reason you think the comment is correct ? You also don't
> discuss the performance changes caused by changing the code.

Sorry for the repeated mail and thanks for your comments.

First, I'd like to make some additions. When I was studying the 
numa-balancing code and found this inconsistency, I feel like the same 
as you: the comment is out-dated and not matching the code , so I was 
intended to change the comment. But after some git digging work,  I 
change my mind.

Function update_task_scan_period was added in "04bb2f9475054: 
sched/numa: Adjust scan rate in task_numa_placement", which set up the 
basis. Though the details logic inside the function may change a lot, 
this function's outer-side effect keep the same.
"""
Increase the scan period (slow down scanning) if the majority of our 
memory is
already on our local node, or if the majority of the page accesses are 
shared
with other processes
"""
Later, in commit "37ec97deb3a8c: sched/numa: Slow down scan rate if 
shared faults dominate", the author claims that
"""
    However, with the current code, all a high ratio of shared accesses
    does is slow down the rate at which scanning is made faster.

    This patch changes things so either lots of shared accesses or
    lots of local accesses will slow down scanning, and numa scanning
    is sped up only when there are lots of private faults on remote
    memory pages.
"""
But the actual code logic doesn't reflect his intention in commit log, 
which resulting the latest code used by kernel. Based on the this, I 
change the code and make this patch. That is *the reason I think the 
comment is correct*.

Second, the performance. I'm sorry that it is the shortage of my work. 
Which benchmark shall I use, like autonuma-benchmark?  I'm lack of 
physical server with multi numa node for testing,  is it enough to offer 
a virtual machine testing result? Besides, this patch just meant to 
achieve the initial design, not for optimization, is a performance 
result necessary?

Best regards.