[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZtDBvAbcT-QI65Vt@x1n>
Date: Thu, 29 Aug 2024 14:45:16 -0400
From: Peter Xu <peterx@...hat.com>
To: David Hildenbrand <david@...hat.com>
Cc: Jason Gunthorpe <jgg@...dia.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, Gavin Shan <gshan@...hat.com>,
Catalin Marinas <catalin.marinas@....com>, x86@...nel.org,
Ingo Molnar <mingo@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Thomas Gleixner <tglx@...utronix.de>,
Alistair Popple <apopple@...dia.com>, kvm@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
Sean Christopherson <seanjc@...gle.com>,
Oscar Salvador <osalvador@...e.de>, Borislav Petkov <bp@...en8.de>,
Zi Yan <ziy@...dia.com>, Axel Rasmussen <axelrasmussen@...gle.com>,
Yan Zhao <yan.y.zhao@...el.com>, Will Deacon <will@...nel.org>,
Kefeng Wang <wangkefeng.wang@...wei.com>,
Alex Williamson <alex.williamson@...hat.com>
Subject: Re: [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start()
On Thu, Aug 29, 2024 at 08:35:49AM +0200, David Hildenbrand wrote:
> Fortunately, we did an excellent job at documenting vm_normal_page():
>
> * There are 2 broad cases. Firstly, an architecture may define a pte_special()
> * pte bit, in which case this function is trivial. Secondly, an architecture
> * may not have a spare pte bit, which requires a more complicated scheme,
> * described below.
> *
> * A raw VM_PFNMAP mapping (ie. one that is not COWed) is always considered a
> * special mapping (even if there are underlying and valid "struct pages").
> * COWed pages of a VM_PFNMAP are always normal.
> *
> * The way we recognize COWed pages within VM_PFNMAP mappings is through the
> * rules set up by "remap_pfn_range()": the vma will have the VM_PFNMAP bit
> * set, and the vm_pgoff will point to the first PFN mapped: thus every special
> * mapping will always honor the rule
> *
> * pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)
> *
> * And for normal mappings this is false.
> *
>
> remap_pfn_range_notrack() will currently handle that for us:
>
> if (is_cow_mapping(vma->vm_flags)) {
> if (addr != vma->vm_start || end != vma->vm_end)
> return -EINVAL;
> }
>
> Even if [1] would succeed, the is_cow_mapping() check will return NULL and it will
> all work as expected, even without pte_special().
IMHO referencing vm_pgoff is ambiguous, and could be wrong, if without a
clear contract.
For example, consider when the driver setup a MAP_PRIVATE + VM_PFNMAP vma,
vm_pgoff to be not the "base PFN" but some random value, then for a COWed
page it's possible the calculation accidentally satisfies "pfn ==
vma->vm_pgoff + off". Then it could wrongly return NULL rather than the
COWed anonymous page here. This is extremely unlikely, but just to show
why it's wrong to reference it at all.
>
> Because VM_PFNMAP is easy: in a !COW mapping, everything is special.
Yes it's safe for vfio-pci, as vfio-pci doesn't have private mappings. But
still, I don't think it's clear enough now on how VM_PFNMAP should be
mapped.
--
Peter Xu
Powered by blists - more mailing lists