[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zr9jIKp_vWyfCzQs@x1n>
Date: Fri, 16 Aug 2024 10:33:04 -0400
From: Peter Xu <peterx@...hat.com>
To: Kefeng Wang <wangkefeng.wang@...wei.com>
Cc: Jason Gunthorpe <jgg@...dia.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Sean Christopherson <seanjc@...gle.com>,
Oscar Salvador <osalvador@...e.de>,
Axel Rasmussen <axelrasmussen@...gle.com>,
linux-arm-kernel@...ts.infradead.org, x86@...nel.org,
Will Deacon <will@...nel.org>, Gavin Shan <gshan@...hat.com>,
Paolo Bonzini <pbonzini@...hat.com>, Zi Yan <ziy@...dia.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Catalin Marinas <catalin.marinas@....com>,
Ingo Molnar <mingo@...hat.com>,
Alistair Popple <apopple@...dia.com>,
Borislav Petkov <bp@...en8.de>,
David Hildenbrand <david@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>, kvm@...r.kernel.org,
Dave Hansen <dave.hansen@...ux.intel.com>,
Alex Williamson <alex.williamson@...hat.com>,
Yan Zhao <yan.y.zhao@...el.com>
Subject: Re: [PATCH 00/19] mm: Support huge pfnmaps
On Fri, Aug 16, 2024 at 11:05:33AM +0800, Kefeng Wang wrote:
>
>
> On 2024/8/16 3:20, Peter Xu wrote:
> > On Wed, Aug 14, 2024 at 09:37:15AM -0300, Jason Gunthorpe wrote:
> > > > Currently, only x86_64 (1G+2M) and arm64 (2M) are supported.
> > >
> > > There is definitely interest here in extending ARM to support the 1G
> > > size too, what is missing?
> >
> > Currently PUD pfnmap relies on THP_PUD config option:
> >
> > config ARCH_SUPPORTS_PUD_PFNMAP
> > def_bool y
> > depends on ARCH_SUPPORTS_HUGE_PFNMAP && HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> >
> > Arm64 unfortunately doesn't yet support dax 1G, so not applicable yet.
> >
> > Ideally, pfnmap is too simple comparing to real THPs and it shouldn't
> > require to depend on THP at all, but we'll need things like below to land
> > first:
> >
> > https://lore.kernel.org/r/20240717220219.3743374-1-peterx@redhat.com
> >
> > I sent that first a while ago, but I didn't collect enough inputs, and I
> > decided to unblock this series from that, so x86_64 shouldn't be affected,
> > and arm64 will at least start to have 2M.
> >
> > >
> > > > The other trick is how to allow gup-fast working for such huge mappings
> > > > even if there's no direct sign of knowing whether it's a normal page or
> > > > MMIO mapping. This series chose to keep the pte_special solution, so that
> > > > it reuses similar idea on setting a special bit to pfnmap PMDs/PUDs so that
> > > > gup-fast will be able to identify them and fail properly.
> > >
> > > Make sense
> > >
> > > > More architectures / More page sizes
> > > > ------------------------------------
> > > >
> > > > Currently only x86_64 (2M+1G) and arm64 (2M) are supported.
> > > >
> > > > For example, if arm64 can start to support THP_PUD one day, the huge pfnmap
> > > > on 1G will be automatically enabled.
>
> A draft patch to enable THP_PUD on arm64, only passed with DEBUG_VM_PGTABLE,
> we may test pud pfnmaps on arm64.
Thanks, Kefeng. It'll be great if this works already, as simple.
Might be interesting to know whether it works already if you have some
few-GBs GPU around on the systems.
Logically as long as you have HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD selected
below, 1g pfnmap will be automatically enabled when you rebuild the kernel.
You can double check that by looking for this:
CONFIG_ARCH_SUPPORTS_PUD_PFNMAP=y
And you can try to observe the mappings by enabling dynamic debug for
vfio_pci_mmap_huge_fault(), then map the bar with vfio-pci and read
something from it.
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a2f8ff354ca6..ff0d27c72020 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -184,6 +184,7 @@ config ARM64
> select HAVE_ARCH_THREAD_STRUCT_WHITELIST
> select HAVE_ARCH_TRACEHOOK
> select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> + select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if PGTABLE_LEVELS > 2
> select HAVE_ARCH_VMAP_STACK
> select HAVE_ARM_SMCCC
> select HAVE_ASM_MODVERSIONS
> diff --git a/arch/arm64/include/asm/pgtable.h
> b/arch/arm64/include/asm/pgtable.h
> index 7a4f5604be3f..e013fe458476 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -763,6 +763,25 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
> #define pud_valid(pud) pte_valid(pud_pte(pud))
> #define pud_user(pud) pte_user(pud_pte(pud))
> #define pud_user_exec(pud) pte_user_exec(pud_pte(pud))
> +#define pud_dirty(pud) pte_dirty(pud_pte(pud))
> +#define pud_devmap(pud) pte_devmap(pud_pte(pud))
> +#define pud_wrprotect(pud) pte_pud(pte_wrprotect(pud_pte(pud)))
> +#define pud_mkold(pud) pte_pud(pte_mkold(pud_pte(pud)))
> +#define pud_mkwrite(pud) pte_pud(pte_mkwrite_novma(pud_pte(pud)))
> +#define pud_mkclean(pud) pte_pud(pte_mkclean(pud_pte(pud)))
> +#define pud_mkdirty(pud) pte_pud(pte_mkdirty(pud_pte(pud)))
> +
> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> +static inline int pud_trans_huge(pud_t pud)
> +{
> + return pud_val(pud) && pud_present(pud) && !(pud_val(pud) &
> PUD_TABLE_BIT);
> +}
> +
> +static inline pud_t pud_mkdevmap(pud_t pud)
> +{
> + return pte_pud(set_pte_bit(pud_pte(pud), __pgprot(PTE_DEVMAP)));
> +}
> +#endif
>
> static inline bool pgtable_l4_enabled(void);
>
> @@ -1137,10 +1156,20 @@ static inline int pmdp_set_access_flags(struct
> vm_area_struct *vma,
> pmd_pte(entry), dirty);
> }
>
> +static inline int pudp_set_access_flags(struct vm_area_struct *vma,
> + unsigned long address, pud_t *pudp,
> + pud_t entry, int dirty)
> +{
> + return __ptep_set_access_flags(vma, address, (pte_t *)pudp,
> + pud_pte(entry), dirty);
> +}
> +
> +#ifndef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> static inline int pud_devmap(pud_t pud)
> {
> return 0;
> }
> +#endif
>
> static inline int pgd_devmap(pgd_t pgd)
> {
> @@ -1213,6 +1242,13 @@ static inline int pmdp_test_and_clear_young(struct
> vm_area_struct *vma,
> {
> return __ptep_test_and_clear_young(vma, address, (pte_t *)pmdp);
> }
> +
> +static inline int pudp_test_and_clear_young(struct vm_area_struct *vma,
> + unsigned long address,
> + pud_t *pudp)
> +{
> + return __ptep_test_and_clear_young(vma, address, (pte_t *)pudp);
> +}
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> static inline pte_t __ptep_get_and_clear(struct mm_struct *mm,
> @@ -1433,6 +1469,7 @@ static inline void update_mmu_cache_range(struct
> vm_fault *vmf,
> #define update_mmu_cache(vma, addr, ptep) \
> update_mmu_cache_range(NULL, vma, addr, ptep, 1)
> #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
> +#define update_mmu_cache_pud(vma, address, pud) do { } while (0)
>
> #ifdef CONFIG_ARM64_PA_BITS_52
> #define phys_to_ttbr(addr) (((addr) | ((addr) >> 46)) & TTBR_BADDR_MASK_52)
> --
> 2.27.0
--
Peter Xu
Powered by blists - more mailing lists