linux-kernel - Re: [PATCH v2 4/8] mm/gup: don't implicitly set FOLL_HONOR_NUMA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230802152847.c3pz5o4pfsmkuv3u@techsingularity.net>
Date:   Wed, 2 Aug 2023 16:28:47 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     David Hildenbrand <david@...hat.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, kvm@...r.kernel.org,
        linux-kselftest@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        liubo <liubo254@...wei.com>, Peter Xu <peterx@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Hugh Dickins <hughd@...gle.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        John Hubbard <jhubbard@...dia.com>,
        Mel Gorman <mgorman@...e.de>, Shuah Khan <shuah@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH v2 4/8] mm/gup: don't implicitly set FOLL_HONOR_NUMA_FAULT

On Tue, Aug 01, 2023 at 02:48:40PM +0200, David Hildenbrand wrote:
> Commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page faults from
> gup/gup_fast") from 2012 documented as the primary reason why we would want
> to handle NUMA hinting faults from GUP:
> 
>   KVM secondary MMU page faults will trigger the NUMA hinting page
>   faults through gup_fast -> get_user_pages -> follow_page ->
>   handle_mm_fault.
> 
> That is still the case today, and relevant KVM code has been converted to
> manually set FOLL_HONOR_NUMA_FAULT. So let's stop setting
> FOLL_HONOR_NUMA_FAULT for all GUP users and cross fingers that not that
> many other ones that really require such handling for autonuma remain.
> 
> Possible interaction with MMU notifiers:
> 
>  Assume a driver obtains a page using get_user_pages() to map it into
>  a secondary MMU, and uses the MMU notifier framework to get notified on
>  changes.
> 
>  Assume get_user_pages() succeeded on a PROT_NONE-mapped page (because
>  FOLL_HONOR_NUMA_FAULT is not set) in an accessible VMA and the page is
>  mapped into a secondary MMU. Once user space would turn that mapping
>  inaccessible using mprotect(PROT_NONE), the actual PTE in the page table
>  might not change. If the MMU notifier would be smart and optimize for that
>  case "why notify if the PTE didn't change", that could be problematic.
> 
>  At least change_pmd_range() with MMU_NOTIFY_PROTECTION_VMA for now does an
>  unconditional mmu_notifier_invalidate_range_start() ->
>  mmu_notifier_invalidate_range_end() and should be fine.
> 
>  Note that even if a PTE in an accessible VMA is pte_protnone(), the
>  underlying page might be accessed by a secondary MMU that does not set
>  FOLL_HONOR_NUMA_FAULT, and test_young() MMU notifiers would return "true".
> 
> Signed-off-by: David Hildenbrand <david@...hat.com>

Also seems sane but a large portion of its correctness also depends on
patch 3 being correct.

-- 
Mel Gorman
SUSE Labs