lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <53fb3b26-4a28-48a2-8403-a9b8d2fe6c24@bytedance.com>
Date: Tue, 10 Dec 2024 16:57:04 +0800
From: Qi Zheng <zhengqi.arch@...edance.com>
To: akpm@...ux-foundation.org
Cc: david@...hat.com, jannh@...gle.com, hughd@...gle.com,
 willy@...radead.org, muchun.song@...ux.dev, vbabka@...nel.org,
 peterx@...hat.com, mgorman@...e.de, catalin.marinas@....com,
 will@...nel.org, dave.hansen@...ux.intel.com, luto@...nel.org,
 peterz@...radead.org, x86@...nel.org, lorenzo.stoakes@...cle.com,
 zokeefe@...gle.com, rientjes@...gle.com, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 00/11] synchronously scan and reclaim empty user PTE
 pages

Hi Andrew,

I have sent patch[1][2][3] to fix recently reported issues:

[1]. 
https://lore.kernel.org/lkml/20241210084156.89877-1-zhengqi.arch@bytedance.com/
(Fix warning, need to be folded into [PATCH v4 02/11])

[2]. 
https://lore.kernel.org/lkml/20241206112348.51570-1-zhengqi.arch@bytedance.com/
(Fix uninitialized symbol, need to be folded into [PATCH v4 09/11])

[3]. 
https://lore.kernel.org/lkml/20241210084431.91414-1-zhengqi.arch@bytedance.com/
(fix UAF, need to be placed before [PATCH v4 11/11])

If you need me to re-post a complete v5, please let me know.

Thanks,
Qi


On 2024/12/4 19:09, Qi Zheng wrote:
> Changes in v4:
>   - update the process_addrs.rst in [PATCH v4 01/11]
>     (suggested by Lorenzo Stoakes)
>   - fix [PATCH v3 4/9] and move it after [PATCH v3 5/9]
>     (pointed by David Hildenbrand)
>   - change to use any_skipped instead of rechecking pte_none() to detect empty
>     user PTE pages (suggested by David Hildenbrand)
>   - rebase onto the next-20241203
> 
> Changes in v3:
>   - recheck pmd state instead of pmd_same() in retract_page_tables()
>     (suggested by Jann Horn)
>   - recheck dst_pmd entry in move_pages_pte() (pointed by Jann Horn)
>   - introduce new skip_none_ptes() (suggested by David Hildenbrand)
>   - minor changes in [PATCH v2 5/7]
>   - remove tlb_remove_table_sync_one() if CONFIG_PT_RECLAIM is enabled.
>   - use put_page() instead of free_page_and_swap_cache() in
>     __tlb_remove_table_one_rcu() (pointed by Jann Horn)
>   - collect the Reviewed-bys and Acked-bys
>   - rebase onto the next-20241112
> 
> Changes in v2:
>   - fix [PATCH v1 1/7] (Jann Horn)
>   - reset force_flush and force_break to false in [PATCH v1 2/7] (Jann Horn)
>   - introduce zap_nonpresent_ptes() and do_zap_pte_range()
>   - check pte_none() instead of can_reclaim_pt after the processing of PTEs
>     (remove [PATCH v1 3/7] and [PATCH v1 4/7])
>   - reorder patches
>   - rebase onto the next-20241031
> 
> Changes in v1:
>   - replace [RFC PATCH 1/7] with a separate serise (already merge into mm-unstable):
>     https://lore.kernel.org/lkml/cover.1727332572.git.zhengqi.arch@bytedance.com/
>     (suggested by David Hildenbrand)
>   - squash [RFC PATCH 2/7] into [RFC PATCH 4/7]
>     (suggested by David Hildenbrand)
>   - change to scan and reclaim empty user PTE pages in zap_pte_range()
>     (suggested by David Hildenbrand)
>   - sent a separate RFC patch to track the tlb flushing issue, and remove
>     that part form this series ([RFC PATCH 3/7] and [RFC PATCH 6/7]).
>     link: https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@bytedance.com/
>   - add [PATCH v1 1/7] into this series
>   - drop RFC tag
>   - rebase onto the next-20241011
> 
> Changes in RFC v2:
>   - fix compilation errors in [RFC PATCH 5/7] and [RFC PATCH 7/7] reproted by
>     kernel test robot
>   - use pte_offset_map_nolock() + pmd_same() instead of check_pmd_still_valid()
>     in retract_page_tables() (in [RFC PATCH 4/7])
>   - rebase onto the next-20240805
> 
> Hi all,
> 
> Previously, we tried to use a completely asynchronous method to reclaim empty
> user PTE pages [1]. After discussing with David Hildenbrand, we decided to
> implement synchronous reclaimation in the case of madvise(MADV_DONTNEED) as the
> first step.
> 
> So this series aims to synchronously free the empty PTE pages in
> madvise(MADV_DONTNEED) case. We will detect and free empty PTE pages in
> zap_pte_range(), and will add zap_details.reclaim_pt to exclude cases other than
> madvise(MADV_DONTNEED).
> 
> In zap_pte_range(), mmu_gather is used to perform batch tlb flushing and page
> freeing operations. Therefore, if we want to free the empty PTE page in this
> path, the most natural way is to add it to mmu_gather as well. Now, if
> CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, mmu_gather will free page table
> pages by semi RCU:
> 
>   - batch table freeing: asynchronous free by RCU
>   - single table freeing: IPI + synchronous free
> 
> But this is not enough to free the empty PTE page table pages in paths other
> that munmap and exit_mmap path, because IPI cannot be synchronized with
> rcu_read_lock() in pte_offset_map{_lock}(). So we should let single table also
> be freed by RCU like batch table freeing.
> 
> As a first step, we supported this feature on x86_64 and selectd the newly
> introduced CONFIG_ARCH_SUPPORTS_PT_RECLAIM.
> 
> For other cases such as madvise(MADV_FREE), consider scanning and freeing empty
> PTE pages asynchronously in the future.
> 
> This series is based on next-20241112 (which contains the series [2]).
> 
> Note: issues related to TLB flushing are not new to this series and are tracked
>        in the separate RFC patch [3]. And more context please refer to this
>        thread [4].
> 
> Comments and suggestions are welcome!
> 
> Thanks,
> Qi
> 
> [1]. https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
> [2]. https://lore.kernel.org/lkml/cover.1727332572.git.zhengqi.arch@bytedance.com/
> [3]. https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@bytedance.com/
> [4]. https://lore.kernel.org/lkml/6f38cb19-9847-4f70-bbe7-06881bb016be@bytedance.com/
> 
> Qi Zheng (11):
>    mm: khugepaged: recheck pmd state in retract_page_tables()
>    mm: userfaultfd: recheck dst_pmd entry in move_pages_pte()
>    mm: introduce zap_nonpresent_ptes()
>    mm: introduce do_zap_pte_range()
>    mm: skip over all consecutive none ptes in do_zap_pte_range()
>    mm: zap_install_uffd_wp_if_needed: return whether uffd-wp pte has been
>      re-installed
>    mm: do_zap_pte_range: return any_skipped information to the caller
>    mm: make zap_pte_range() handle full within-PMD range
>    mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)
>    x86: mm: free page table pages by RCU instead of semi RCU
>    x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64
> 
>   Documentation/mm/process_addrs.rst |   4 +
>   arch/x86/Kconfig                   |   1 +
>   arch/x86/include/asm/tlb.h         |  20 +++
>   arch/x86/kernel/paravirt.c         |   7 +
>   arch/x86/mm/pgtable.c              |  10 +-
>   include/linux/mm.h                 |   1 +
>   include/linux/mm_inline.h          |  11 +-
>   include/linux/mm_types.h           |   4 +-
>   mm/Kconfig                         |  15 ++
>   mm/Makefile                        |   1 +
>   mm/internal.h                      |  19 +++
>   mm/khugepaged.c                    |  45 +++--
>   mm/madvise.c                       |   7 +-
>   mm/memory.c                        | 253 ++++++++++++++++++-----------
>   mm/mmu_gather.c                    |   9 +-
>   mm/pt_reclaim.c                    |  71 ++++++++
>   mm/userfaultfd.c                   |  51 ++++--
>   17 files changed, 397 insertions(+), 132 deletions(-)
>   create mode 100644 mm/pt_reclaim.c
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ