lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <933d1843-be6c-cbd4-ffb2-b0adcbeeccd5@amd.com>
Date:   Wed, 18 Jan 2023 10:13:18 +0530
From:   Bharata B Rao <bharata@....com>
To:     Mel Gorman <mgorman@...e.de>,
        Raghavendra K T <raghavendra.kt@....com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Peter Xu <peterx@...hat.com>,
        David Hildenbrand <david@...hat.com>,
        xu xin <cgel.zte@...il.com>, Yu Zhao <yuzhao@...gle.com>,
        Colin Cross <ccross@...gle.com>, Arnd Bergmann <arnd@...db.de>,
        Hugh Dickins <hughd@...gle.com>,
        Disha Talreja <dishaa.talreja@....com>,
        Sean Christopherson <seanjc@...gle.com>, jhubbard@...dia.com,
        ligang.bdlg@...edance.com, "Kalra, Ashish" <Ashish.Kalra@....com>
Subject: Re: [RFC PATCH V1 1/1] sched/numa: Enhance vma scanning logic

On 1/17/2023 8:29 PM, Mel Gorman wrote:
> Note that the cc list is excessive for the topic.

(Wasn't sure about pruning the CC list mid-thread, hence continuing with it)

<snip>

> 
> This is a build-tested only prototype to illustrate how VMA could track
> NUMA balancing state. It starts with applying the scan delay to every VMA
> instead of every task to avoid scanning new or very short-lived VMAs. I
> went back to my old notes on how I hoped to reduce excessive scanning in
> NUMA balancing and it happened to be second on my list and straight-forward
> to prototype in a few minutes.

While on the topic of improving NUMA balancer scanning relevancy, the following
additional points may be worth noting:

Recently there have been reports about NUMA balancing induced scanning and
subsequent MMU notifier invalidations causing problems in different scenarios.

1. Currently NUMA balancing won't check at scan time, if a page (or a VMA )is
not migratable since the page (or the address range) is pinned. It will go ahead
with MMU invalidation notifications and changes the PTE protection to PAGE_NONE
only to realize later that the pinned pages can't be migrated before reinstalling
the original PTE.

This was found to cause issues to SEV guests whose pages are completely pinned.
This was discussed here - https://lore.kernel.org/all/20220927000729.498292-1-Ashish.Kalra@amd.com/

We could probably use page_maybe_dma_pinned() to determine if the page is long
term pinned and avoid MMU invalidation and protection change for such a page.
However then we would have to do per-page invalidations (as against one time
PMD range invalidation that is done currently) which is probably not desirable.

Also MMU invalidations are expected to be issued under sleepable context (mostly
except in the OOM notification which uses nonblock verion, AFAICS). This makes it
difficult to check the pinned state of the page prior to MMU invalidation. Some of
this is discussed here: https://lore.kernel.org/linux-arm-kernel/YuEMkKY2RU%2F2KiZW@monolith.localdoman/

This current patchset where we attempt to restrict scanning to relevant VMAs may
help the above case partially, but any ideas on addressing this issue
comprehensively? It would have been ideal if we could identify such non-migratable
pages (long term pinned) clearly and avoid them entirely from scanning and protection
change. 

2. Applications that run on GPUs may like to avoid the NUMA balancing activity
completely and they benefit from per-process enabling/disabling of NUMA balancing.
The patchset (which has a different use case for per-process control) that helps
this is here - https://lore.kernel.org/all/49ed07b1-e167-7f94-9986-8e86fb60bb09@nvidia.com/

Improvements to increase the relevant scanning can help this case to an extent
but per-process NUMA balancing control should be a useful control to have.

Regards,
Bharata.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ