[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <53fb3b26-4a28-48a2-8403-a9b8d2fe6c24@bytedance.com>
Date: Tue, 10 Dec 2024 16:57:04 +0800
From: Qi Zheng <zhengqi.arch@...edance.com>
To: akpm@...ux-foundation.org
Cc: david@...hat.com, jannh@...gle.com, hughd@...gle.com,
willy@...radead.org, muchun.song@...ux.dev, vbabka@...nel.org,
peterx@...hat.com, mgorman@...e.de, catalin.marinas@....com,
will@...nel.org, dave.hansen@...ux.intel.com, luto@...nel.org,
peterz@...radead.org, x86@...nel.org, lorenzo.stoakes@...cle.com,
zokeefe@...gle.com, rientjes@...gle.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 00/11] synchronously scan and reclaim empty user PTE
pages
Hi Andrew,
I have sent patch[1][2][3] to fix recently reported issues:
[1].
https://lore.kernel.org/lkml/20241210084156.89877-1-zhengqi.arch@bytedance.com/
(Fix warning, need to be folded into [PATCH v4 02/11])
[2].
https://lore.kernel.org/lkml/20241206112348.51570-1-zhengqi.arch@bytedance.com/
(Fix uninitialized symbol, need to be folded into [PATCH v4 09/11])
[3].
https://lore.kernel.org/lkml/20241210084431.91414-1-zhengqi.arch@bytedance.com/
(fix UAF, need to be placed before [PATCH v4 11/11])
If you need me to re-post a complete v5, please let me know.
Thanks,
Qi
On 2024/12/4 19:09, Qi Zheng wrote:
> Changes in v4:
> - update the process_addrs.rst in [PATCH v4 01/11]
> (suggested by Lorenzo Stoakes)
> - fix [PATCH v3 4/9] and move it after [PATCH v3 5/9]
> (pointed by David Hildenbrand)
> - change to use any_skipped instead of rechecking pte_none() to detect empty
> user PTE pages (suggested by David Hildenbrand)
> - rebase onto the next-20241203
>
> Changes in v3:
> - recheck pmd state instead of pmd_same() in retract_page_tables()
> (suggested by Jann Horn)
> - recheck dst_pmd entry in move_pages_pte() (pointed by Jann Horn)
> - introduce new skip_none_ptes() (suggested by David Hildenbrand)
> - minor changes in [PATCH v2 5/7]
> - remove tlb_remove_table_sync_one() if CONFIG_PT_RECLAIM is enabled.
> - use put_page() instead of free_page_and_swap_cache() in
> __tlb_remove_table_one_rcu() (pointed by Jann Horn)
> - collect the Reviewed-bys and Acked-bys
> - rebase onto the next-20241112
>
> Changes in v2:
> - fix [PATCH v1 1/7] (Jann Horn)
> - reset force_flush and force_break to false in [PATCH v1 2/7] (Jann Horn)
> - introduce zap_nonpresent_ptes() and do_zap_pte_range()
> - check pte_none() instead of can_reclaim_pt after the processing of PTEs
> (remove [PATCH v1 3/7] and [PATCH v1 4/7])
> - reorder patches
> - rebase onto the next-20241031
>
> Changes in v1:
> - replace [RFC PATCH 1/7] with a separate serise (already merge into mm-unstable):
> https://lore.kernel.org/lkml/cover.1727332572.git.zhengqi.arch@bytedance.com/
> (suggested by David Hildenbrand)
> - squash [RFC PATCH 2/7] into [RFC PATCH 4/7]
> (suggested by David Hildenbrand)
> - change to scan and reclaim empty user PTE pages in zap_pte_range()
> (suggested by David Hildenbrand)
> - sent a separate RFC patch to track the tlb flushing issue, and remove
> that part form this series ([RFC PATCH 3/7] and [RFC PATCH 6/7]).
> link: https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@bytedance.com/
> - add [PATCH v1 1/7] into this series
> - drop RFC tag
> - rebase onto the next-20241011
>
> Changes in RFC v2:
> - fix compilation errors in [RFC PATCH 5/7] and [RFC PATCH 7/7] reproted by
> kernel test robot
> - use pte_offset_map_nolock() + pmd_same() instead of check_pmd_still_valid()
> in retract_page_tables() (in [RFC PATCH 4/7])
> - rebase onto the next-20240805
>
> Hi all,
>
> Previously, we tried to use a completely asynchronous method to reclaim empty
> user PTE pages [1]. After discussing with David Hildenbrand, we decided to
> implement synchronous reclaimation in the case of madvise(MADV_DONTNEED) as the
> first step.
>
> So this series aims to synchronously free the empty PTE pages in
> madvise(MADV_DONTNEED) case. We will detect and free empty PTE pages in
> zap_pte_range(), and will add zap_details.reclaim_pt to exclude cases other than
> madvise(MADV_DONTNEED).
>
> In zap_pte_range(), mmu_gather is used to perform batch tlb flushing and page
> freeing operations. Therefore, if we want to free the empty PTE page in this
> path, the most natural way is to add it to mmu_gather as well. Now, if
> CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, mmu_gather will free page table
> pages by semi RCU:
>
> - batch table freeing: asynchronous free by RCU
> - single table freeing: IPI + synchronous free
>
> But this is not enough to free the empty PTE page table pages in paths other
> that munmap and exit_mmap path, because IPI cannot be synchronized with
> rcu_read_lock() in pte_offset_map{_lock}(). So we should let single table also
> be freed by RCU like batch table freeing.
>
> As a first step, we supported this feature on x86_64 and selectd the newly
> introduced CONFIG_ARCH_SUPPORTS_PT_RECLAIM.
>
> For other cases such as madvise(MADV_FREE), consider scanning and freeing empty
> PTE pages asynchronously in the future.
>
> This series is based on next-20241112 (which contains the series [2]).
>
> Note: issues related to TLB flushing are not new to this series and are tracked
> in the separate RFC patch [3]. And more context please refer to this
> thread [4].
>
> Comments and suggestions are welcome!
>
> Thanks,
> Qi
>
> [1]. https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
> [2]. https://lore.kernel.org/lkml/cover.1727332572.git.zhengqi.arch@bytedance.com/
> [3]. https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@bytedance.com/
> [4]. https://lore.kernel.org/lkml/6f38cb19-9847-4f70-bbe7-06881bb016be@bytedance.com/
>
> Qi Zheng (11):
> mm: khugepaged: recheck pmd state in retract_page_tables()
> mm: userfaultfd: recheck dst_pmd entry in move_pages_pte()
> mm: introduce zap_nonpresent_ptes()
> mm: introduce do_zap_pte_range()
> mm: skip over all consecutive none ptes in do_zap_pte_range()
> mm: zap_install_uffd_wp_if_needed: return whether uffd-wp pte has been
> re-installed
> mm: do_zap_pte_range: return any_skipped information to the caller
> mm: make zap_pte_range() handle full within-PMD range
> mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)
> x86: mm: free page table pages by RCU instead of semi RCU
> x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64
>
> Documentation/mm/process_addrs.rst | 4 +
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/tlb.h | 20 +++
> arch/x86/kernel/paravirt.c | 7 +
> arch/x86/mm/pgtable.c | 10 +-
> include/linux/mm.h | 1 +
> include/linux/mm_inline.h | 11 +-
> include/linux/mm_types.h | 4 +-
> mm/Kconfig | 15 ++
> mm/Makefile | 1 +
> mm/internal.h | 19 +++
> mm/khugepaged.c | 45 +++--
> mm/madvise.c | 7 +-
> mm/memory.c | 253 ++++++++++++++++++-----------
> mm/mmu_gather.c | 9 +-
> mm/pt_reclaim.c | 71 ++++++++
> mm/userfaultfd.c | 51 ++++--
> 17 files changed, 397 insertions(+), 132 deletions(-)
> create mode 100644 mm/pt_reclaim.c
>
Powered by blists - more mailing lists