[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240814221031.GA2032816@nvidia.com>
Date: Wed, 14 Aug 2024 19:10:31 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Peter Xu <peterx@...hat.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Oscar Salvador <osalvador@...e.de>,
Axel Rasmussen <axelrasmussen@...gle.com>,
linux-arm-kernel@...ts.infradead.org, x86@...nel.org,
Will Deacon <will@...nel.org>, Gavin Shan <gshan@...hat.com>,
Paolo Bonzini <pbonzini@...hat.com>, Zi Yan <ziy@...dia.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Catalin Marinas <catalin.marinas@....com>,
Ingo Molnar <mingo@...hat.com>,
Alistair Popple <apopple@...dia.com>,
Borislav Petkov <bp@...en8.de>,
David Hildenbrand <david@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>, kvm@...r.kernel.org,
Dave Hansen <dave.hansen@...ux.intel.com>,
Alex Williamson <alex.williamson@...hat.com>,
Yan Zhao <yan.y.zhao@...el.com>,
Oliver Upton <oliver.upton@...ux.dev>,
Marc Zyngier <maz@...nel.org>
Subject: Re: [PATCH 00/19] mm: Support huge pfnmaps
On Wed, Aug 14, 2024 at 01:54:04PM -0700, Sean Christopherson wrote:
> +Marc and Oliver
>
> On Wed, Aug 14, 2024, Jason Gunthorpe wrote:
> > On Wed, Aug 14, 2024 at 07:35:01AM -0700, Sean Christopherson wrote:
> > > On Wed, Aug 14, 2024, Jason Gunthorpe wrote:
> > > > On Fri, Aug 09, 2024 at 12:08:50PM -0400, Peter Xu wrote:
> > > > > Overview
> > > > > ========
> > > > >
> > > > > This series is based on mm-unstable, commit 98808d08fc0f of Aug 7th latest,
> > > > > plus dax 1g fix [1]. Note that this series should also apply if without
> > > > > the dax 1g fix series, but when without it, mprotect() will trigger similar
> > > > > errors otherwise on PUD mappings.
> > > > >
> > > > > This series implements huge pfnmaps support for mm in general. Huge pfnmap
> > > > > allows e.g. VM_PFNMAP vmas to map in either PMD or PUD levels, similar to
> > > > > what we do with dax / thp / hugetlb so far to benefit from TLB hits. Now
> > > > > we extend that idea to PFN mappings, e.g. PCI MMIO bars where it can grow
> > > > > as large as 8GB or even bigger.
> > > >
> > > > FWIW, I've started to hear people talk about needing this in the VFIO
> > > > context with VMs.
> > > >
> > > > vfio/iommufd will reassemble the contiguous range from the 4k PFNs to
> > > > setup the IOMMU, but KVM is not able to do it so reliably.
> > >
> > > Heh, KVM should very reliably do the exact opposite, i.e. KVM should never create
> > > a huge page unless the mapping is huge in the primary MMU. And that's very much
> > > by design, as KVM has no knowledge of what actually resides at a given PFN, and
> > > thus can't determine whether or not its safe to create a huge page if KVM happens
> > > to realize the VM has access to a contiguous range of memory.
> >
> > Oh? Someone told me recently x86 kvm had code to reassemble contiguous
> > ranges?
>
> Nope. KVM ARM does (see get_vma_page_shift()) but I strongly suspect that's only
> a win in very select use cases, and is overall a non-trivial loss.
Ah that ARM behavior was probably what was being mentioned then! So
take my original remark as applying to this :)
> > I don't quite understand your safety argument, if the VMA has 1G of
> > contiguous physical memory described with 4K it is definitely safe for
> > KVM to reassemble that same memory and represent it as 1G.
>
> That would require taking mmap_lock to get the VMA, which would be a net negative,
> especially for workloads that are latency sensitive.
You can aggregate if the read and aggregating logic are protected by
mmu notifiers, I think. A invalidation would still have enough
information to clear the aggregate shadow entry. If you get a sequence
number collision then you'd throw away the aggregation.
But yes, I also think it would be slow to have aggregation logic in
KVM. Doing in the main mmu is much better.
Jason
Powered by blists - more mailing lists