[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPYmKFvSXRo5d3T3OwJ6tZEer-8o=G3uUVrd7mXOFjGsfVPy3w@mail.gmail.com>
Date: Thu, 7 Dec 2023 14:07:15 +0800
From: Xu Lu <luxu.kernel@...edance.com>
To: paul.walmsley@...ive.com, palmer@...belt.com,
aou@...s.berkeley.edu, ardb@...nel.org, anup@...infault.org,
atishp@...shpatra.org
Cc: dengliang.1214@...edance.com, xieyongji@...edance.com,
lihangjing@...edance.com, songmuchun@...edance.com,
punit.agrawal@...edance.com, linux-kernel@...r.kernel.org,
linux-riscv@...ts.infradead.org
Subject: Re: [RFC PATCH V1 00/11] riscv: Introduce 64K base page
A gentle ping.
On Thu, Nov 23, 2023 at 2:57 PM Xu Lu <luxu.kernel@...edance.com> wrote:
>
> Some existing architectures like ARM supports base page larger than 4K
> as their MMU supports more page sizes. Thus, besides hugetlb page and
> transparent huge page, there is another way for these architectures to
> enjoy the benefits of fewer TLB misses without worrying about cost of
> splitting and merging huge pages. However, on architectures with only
> 4K MMU, larger base page is unavailable now.
>
> This patch series attempts to break through the limitation of MMU and
> supports larger base page on RISC-V, which only supports 4K page size
> now.
>
> The key idea to implement larger base page based on 4K MMU is to
> decouple the MMU page from the base page in view of kernel mm, which we
> denote as software page. In contrary to software page, we denote the MMU
> page as hardware page. Below is the difference between these two kinds
> of pages.
>
> 1. Kernel memory management module manages, allocates and maps memory at
> a granularity of software page, which should not be restricted by
> MMU and can be larger than hardware page.
>
> 2. Architecture page table operations should be carried out from MMU's
> perspective and page table entries are encoded at a granularity of
> hardware page, which is 4K on RISC-V MMU now.
>
> The main work to decouple these two kinds of pages lies in architecture
> code. For example, we turn the pte_t struct to an array of page table
> entries to match it with software page which can be larger than hardware
> page, and adapt the page table operations accordingly. For 64K software
> base page, the pte_t struct now contains 16 contiguous page table
> entries which point to 16 contiguous 4K hardware pages.
>
> To achieve the benefits of large base page, we applies Svnapot for each
> base page's mapping. The Svnapot extension on RISC-V is like contiguous
> PTE on ARM64. It allows ptes of a naturally aligned power-of 2 size
> memory range be encoded in the same format to save the TLB space.
>
> This patch series is the first version and is based on v6.7-rc1. This
> version supports both bare metal and virtualization scenarios.
>
> In the next versions, we will continue on the following works:
>
> 1. Reduce the memory usage of page table page as it only uses 4K space
> while costs a whole base page.
>
> 2. When IMSIC interrupt file is smaller than 64K, extra isolation
> measures for the interrupt file are needed. (S)PMP and IOPMP may be good
> choices.
>
> 3. More consideration is needed to make this patch series collaborate
> with folios better.
>
> 4. Support 64K base page on IOMMU.
>
> 5. The performance test is on schedule to verify the actual performance
> improvement and the decrease in TLB miss rate.
>
> Thanks in advance for comments.
>
> Xu Lu (11):
> mm: Fix misused APIs on huge pte
> riscv: Introduce concept of hardware base page
> riscv: Adapt pte struct to gap between hw page and sw page
> riscv: Adapt pte operations to gap between hw page and sw page
> riscv: Decouple pmd operations and pte operations
> riscv: Distinguish pmd huge pte and napot huge pte
> riscv: Adapt satp operations to gap between hw page and sw page
> riscv: Apply Svnapot for base page mapping
> riscv: Adjust fix_btmap slots number to match variable page size
> riscv: kvm: Adapt kvm to gap between hw page and sw page
> riscv: Introduce 64K page size
>
> arch/Kconfig | 1 +
> arch/riscv/Kconfig | 28 +++
> arch/riscv/include/asm/fixmap.h | 3 +-
> arch/riscv/include/asm/hugetlb.h | 71 ++++++-
> arch/riscv/include/asm/page.h | 16 +-
> arch/riscv/include/asm/pgalloc.h | 21 ++-
> arch/riscv/include/asm/pgtable-32.h | 2 +-
> arch/riscv/include/asm/pgtable-64.h | 45 +++--
> arch/riscv/include/asm/pgtable.h | 282 +++++++++++++++++++++++-----
> arch/riscv/kernel/efi.c | 2 +-
> arch/riscv/kernel/head.S | 4 +-
> arch/riscv/kernel/hibernate.c | 3 +-
> arch/riscv/kvm/mmu.c | 198 +++++++++++++------
> arch/riscv/mm/context.c | 7 +-
> arch/riscv/mm/fault.c | 1 +
> arch/riscv/mm/hugetlbpage.c | 42 +++--
> arch/riscv/mm/init.c | 25 +--
> arch/riscv/mm/kasan_init.c | 7 +-
> arch/riscv/mm/pageattr.c | 2 +-
> fs/proc/task_mmu.c | 2 +-
> include/asm-generic/hugetlb.h | 7 +
> include/asm-generic/pgtable-nopmd.h | 1 +
> include/linux/pgtable.h | 6 +
> mm/hugetlb.c | 2 +-
> mm/migrate.c | 5 +-
> mm/mprotect.c | 2 +-
> mm/rmap.c | 10 +-
> mm/vmalloc.c | 3 +-
> 28 files changed, 616 insertions(+), 182 deletions(-)
>
> --
> 2.20.1
>
Powered by blists - more mailing lists