[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zs8zBT1aDh1v9Eje@x1n>
Date: Wed, 28 Aug 2024 10:24:05 -0400
From: Peter Xu <peterx@...hat.com>
To: David Hildenbrand <david@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Gavin Shan <gshan@...hat.com>,
Catalin Marinas <catalin.marinas@....com>, x86@...nel.org,
Ingo Molnar <mingo@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Thomas Gleixner <tglx@...utronix.de>,
Alistair Popple <apopple@...dia.com>, kvm@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
Sean Christopherson <seanjc@...gle.com>,
Oscar Salvador <osalvador@...e.de>,
Jason Gunthorpe <jgg@...dia.com>, Borislav Petkov <bp@...en8.de>,
Zi Yan <ziy@...dia.com>, Axel Rasmussen <axelrasmussen@...gle.com>,
Yan Zhao <yan.y.zhao@...el.com>, Will Deacon <will@...nel.org>,
Kefeng Wang <wangkefeng.wang@...wei.com>,
Alex Williamson <alex.williamson@...hat.com>
Subject: Re: [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start()
On Wed, Aug 28, 2024 at 09:44:04AM +0200, David Hildenbrand wrote:
> On 26.08.24 22:43, Peter Xu wrote:
> > Teach folio_walk_start() to recognize special pmd/pud mappings, and fail
> > them properly as it means there's no folio backing them.
> >
> > Cc: David Hildenbrand <david@...hat.com>
> > Signed-off-by: Peter Xu <peterx@...hat.com>
> > ---
> > mm/pagewalk.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> > index cd79fb3b89e5..12be5222d70e 100644
> > --- a/mm/pagewalk.c
> > +++ b/mm/pagewalk.c
> > @@ -753,7 +753,7 @@ struct folio *folio_walk_start(struct folio_walk *fw,
> > fw->pudp = pudp;
> > fw->pud = pud;
> > - if (!pud_present(pud) || pud_devmap(pud)) {
> > + if (!pud_present(pud) || pud_devmap(pud) || pud_special(pud)) {
> > spin_unlock(ptl);
> > goto not_found;
> > } else if (!pud_leaf(pud)) {
> > @@ -783,7 +783,7 @@ struct folio *folio_walk_start(struct folio_walk *fw,
> > fw->pmdp = pmdp;
> > fw->pmd = pmd;
> > - if (pmd_none(pmd)) {
> > + if (pmd_none(pmd) || pmd_special(pmd)) {
> > spin_unlock(ptl);
> > goto not_found;
> > } else if (!pmd_leaf(pmd)) {
>
> As raised, this is not the right way to to it. You should follow what
> CONFIG_ARCH_HAS_PTE_SPECIAL and vm_normal_page() does.
>
> It's even spelled out in vm_normal_page_pmd() that at the time it was
> introduced there was no pmd_special(), so there was no way to handle that.
I can try to do something like that, but even so it'll be mostly cosmetic
changes, and AFAICT there's no real functional difference.
Meanwhile, see below comment.
>
>
>
> diff --git a/mm/memory.c b/mm/memory.c
> index f0cf5d02b4740..272445e9db147 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -672,15 +672,29 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
> {
> unsigned long pfn = pmd_pfn(pmd);
> - /*
> - * There is no pmd_special() but there may be special pmds, e.g.
> - * in a direct-access (dax) mapping, so let's just replicate the
> - * !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_page() here.
> - */
This one is correct; I overlooked this comment which can be obsolete. I
can either refine this patch or add one patch on top to refine the comment
at least.
> + if (IS_ENABLED(CONFIG_ARCH_HAS_PMD_SPECIAL)) {
We don't yet have CONFIG_ARCH_HAS_PMD_SPECIAL, but I get your point.
> + if (likely(!pmd_special(pmd)))
> + goto check_pfn;
> + if (vma->vm_ops && vma->vm_ops->find_special_page)
> + return vma->vm_ops->find_special_page(vma, addr);
Why do we ever need this? This is so far destined to be totally a waste of
cycles. I think it's better we leave that until either xen/gntdev.c or any
new driver start to use it, rather than keeping dead code around.
> + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> + return NULL;
> + if (is_huge_zero_pmd(pmd))
> + return NULL;
This is meaningless too until we make huge zero pmd apply special bit
first, which does sound like to be outside the scope of this series.
> + if (pmd_devmap(pmd))
> + /* See vm_normal_page() */
> + return NULL;
When will it be pmd_devmap() if it's already pmd_special()?
> + return NULL;
And see this one.. it's after:
if (xxx)
return NULL;
if (yyy)
return NULL;
if (zzz)
return NULL;
return NULL;
Hmm?? If so, what's the difference if we simply check pmd_special and
return NULL..
> + }
> +
> + /* !CONFIG_ARCH_HAS_PMD_SPECIAL case follows: */
> +
> if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
> if (vma->vm_flags & VM_MIXEDMAP) {
> if (!pfn_valid(pfn))
> return NULL;
> + if (is_huge_zero_pmd(pmd))
> + return NULL;
I'd rather not touch here as this series doesn't change anything for
MIXEDMAP yet..
> goto out;
> } else {
> unsigned long off;
> @@ -692,6 +706,11 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
> }
> }
> + /*
> + * For historical reasons, these might not have pmd_special() set,
> + * so we'll check them manually, in contrast to vm_normal_page().
> + */
> +check_pfn:
> if (pmd_devmap(pmd))
> return NULL;
> if (is_huge_zero_pmd(pmd))
>
>
>
> We should then look into mapping huge zeropages also with pmd_special.
> pmd_devmap we'll leave alone until removed. But that's indeoendent of your series.
This does look reasonable to match what we do with pte zeropage. Could you
remind me what might be the benefit when we switch to using special bit for
pmd zero pages?
>
> I wonder if CONFIG_ARCH_HAS_PTE_SPECIAL is sufficient and we don't need additional
> CONFIG_ARCH_HAS_PMD_SPECIAL.
The hope is we can always reuse the bit in the pte to work the same for
pmd/pud.
Now we require arch to select ARCH_SUPPORTS_HUGE_PFNMAP to say "pmd/pud has
the same special bit defined".
>
> As I said, if you need someone to add vm_normal_page_pud(), I can handle that.
I'm pretty confused why we need that for this series alone.
If you prefer vm_normal_page_pud() to be defined and check pud_special()
there, I can do that. But again, I don't yet see how that can make a
functional difference considering the so far very limited usage of the
special bit, and wonder whether we can do that on top when it became
necessary (and when we start to have functional requirement of such).
Thanks,
--
Peter Xu
Powered by blists - more mailing lists