linux-kernel - Re: [PATCH V2 2/3] sched/numa: Enhance vma scanning logic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ccba1a65-fe4f-89d5-a32b-2efba30a1350@amd.com>
Date:   Tue, 7 Feb 2023 12:11:47 +0530
From:   Raghavendra K T <raghavendra.kt@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Ingo Molnar <mingo@...hat.com>, Mel Gorman <mgorman@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>, rppt@...nel.org,
        Bharata B Rao <bharata@....com>,
        Disha Talreja <dishaa.talreja@....com>
Subject: Re: [PATCH V2 2/3] sched/numa: Enhance vma scanning logic

On 2/4/2023 11:44 PM, Raghavendra K T wrote:
> On 2/3/2023 4:45 PM, Peter Zijlstra wrote:
>> On Wed, Feb 01, 2023 at 01:32:21PM +0530, Raghavendra K T wrote:
[...]
> 
>>> +        if (!vma_is_accessed(vma))
>>> +            continue;
>>> +
>>>           do {
>>>               start = max(start, vma->vm_start);
>>>               end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE);
>>
>>
>> This feels wrong, specifically we track numa_scan_offset per mm, now, if
>> we divide the threads into two dis-joint groups each only using their
>> own set of vmas (in fact quite common for workloads with proper data
>> partitioning) it is possible to consistently sample one set of threads
>> and thus not scan the other set of vmas.
>>
>> It seems somewhat unlikely, but not impossible to create significant
>> unfairness.
>>
> 
> Agree, But that is the reason why we want to allow first few
> unconditional scans Or am I missing something?
> 

Thinking further, may be we can summarize the different aspects of 
thread/ two disjoint set case itself into:

1) Unfairness because of way in which threads gets opportunity
to scan.

2) Disjoint set of vmas in the partition set could be of different sizes

3) Disjoint set of vmas could be associated with different number of
threads

Each of above can potentially help or make some thread do heavy lifting

but (2), and (3). is what I think we are trying to be Okay with by
making sure tasks mostly do not scan others' vmas

(1) could be a real issue (though I know that there are many places we
  have corrected the issue by introducing offset in p->numa_next_scan)
but how the distribution looks now practically, I take it as a TODO and
post.