linux-kernel - Re: [PATCH v5 2/6] mm: remap unused subpages to shared zeropage when splitting isolated thp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4cf41cd5-e93a-412b-b209-4180bd2d4015@linux.dev>
Date: Fri, 19 Sep 2025 16:14:11 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: David Hildenbrand <david@...hat.com>
Cc: Qun-wei Lin (林群崴) <Qun-wei.Lin@...iatek.com>,
 "catalin.marinas@....com" <catalin.marinas@....com>,
 "usamaarif642@...il.com" <usamaarif642@...il.com>,
 "linux-mm@...ck.org" <linux-mm@...ck.org>,
 "yuzhao@...gle.com" <yuzhao@...gle.com>,
 "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
 "corbet@....net" <corbet@....net>,
 Andrew Yang (楊智強) <Andrew.Yang@...iatek.com>,
 "npache@...hat.com" <npache@...hat.com>, "rppt@...nel.org"
 <rppt@...nel.org>, "willy@...radead.org" <willy@...radead.org>,
 "kernel-team@...a.com" <kernel-team@...a.com>,
 "roman.gushchin@...ux.dev" <roman.gushchin@...ux.dev>,
 "hannes@...xchg.org" <hannes@...xchg.org>,
 "cerasuolodomenico@...il.com" <cerasuolodomenico@...il.com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "ryncsn@...il.com" <ryncsn@...il.com>, "surenb@...gle.com"
 <surenb@...gle.com>, "riel@...riel.com" <riel@...riel.com>,
 "shakeel.butt@...ux.dev" <shakeel.butt@...ux.dev>,
 Chinwen Chang (張錦文)
 <chinwen.chang@...iatek.com>,
 "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
 Casper Li (李中榮) <casper.li@...iatek.com>,
 "ryan.roberts@....com" <ryan.roberts@....com>,
 "linux-mediatek@...ts.infradead.org" <linux-mediatek@...ts.infradead.org>,
 "baohua@...nel.org" <baohua@...nel.org>,
 "kaleshsingh@...gle.com" <kaleshsingh@...gle.com>,
 "zhais@...gle.com" <zhais@...gle.com>,
 "linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH v5 2/6] mm: remap unused subpages to shared zeropage when
 splitting isolated thp



On 2025/9/19 15:55, David Hildenbrand wrote:
>>> I think where possible we really only want to identify problematic
>>> (tagged) pages and skip them. And we should either look into fixing KSM
>>> as well or finding out why KSM is not affected.
>>
>> Yeah. Seems like we could introduce a new helper,
>> folio_test_mte_tagged(struct
>> folio *folio). By default, it would return false, and architectures like
>> arm64
>> can override it.
> 
> If we add a new helper it should instead express the semantics that we 
> cannot deduplicate.

Agreed.

> 
> For THP, I recall that only some pages might be tagged. So likely we 
> want to check per page.

Yes, a per-page check would be simpler.

> 
>>
>> Looking at the code, the PG_mte_tagged flag is not set for regular THP.
> 
> I think it's supported for THP per page. Only for hugetlb we tag the 
> whole thing through the head page instead of individual pages.

Right. That's exactly what I meant.

> 
>> The MTE
>> status actually comes from the VM_MTE flag in the VMA that maps it.
>>
> 
> During the rmap walk we could check the VMA flag, but there would be no 
> way to just stop the THP shrinker scanning this page early.
> 
>> static inline bool folio_test_hugetlb_mte_tagged(struct folio *folio)
>> {
>>     bool ret = test_bit(PG_mte_tagged, &folio->flags.f);
>>
>>     VM_WARN_ON_ONCE(!folio_test_hugetlb(folio));
>>
>>     /*
>>      * If the folio is tagged, ensure ordering with a likely subsequent
>>      * read of the tags.
>>      */
>>     if (ret)
>>         smp_rmb();
>>     return ret;
>> }
>>
>> static inline bool page_mte_tagged(struct page *page)
>> {
>>     bool ret = test_bit(PG_mte_tagged, &page->flags.f);
>>
>>     VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
>>
>>     /*
>>      * If the page is tagged, ensure ordering with a likely subsequent
>>      * read of the tags.
>>      */
>>     if (ret)
>>         smp_rmb();
>>     return ret;
>> }
>>
>> contpte_set_ptes()
>>     __set_ptes()
>>         __set_ptes_anysz()
>>             __sync_cache_and_tags()
>>                 mte_sync_tags()
>>                     set_page_mte_tagged()
>>
>> Then, having the THP shrinker skip any folios that are identified as
>> MTE-tagged.
> 
> Likely we should just do something like (maybe we want better naming)
> 
> #ifndef page_is_mergable
> #define page_is_mergable(page) (true)
> #endif


Maybe something like page_is_optimizable()? Just a thought ;p

> 
> And for arm64 have it be
> 
> #define page_is_mergable(page) (!page_mte_tagged(page))
> 
> 
> And then do
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1f0813b956436..1cac9093918d6 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4251,7 +4251,8 @@ static bool thp_underused(struct folio *folio)
> 
>          for (i = 0; i < folio_nr_pages(folio); i++) {
>                  kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
> -               if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
> +               if (page_is_mergable(folio_page(folio, i)) &&
> +                   !memchr_inv(kaddr, 0, PAGE_SIZE)) {
>                          num_zero_pages++;
>                          if (num_zero_pages > khugepaged_max_ptes_none) {
>                                  kunmap_local(kaddr);
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 946253c398072..476a9a9091bd3 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -306,6 +306,8 @@ static bool try_to_map_unused_to_zeropage(struct 
> page_vma_mapped_walk *pvmw,
> 
>          if (PageCompound(page))
>                  return false;
> +       if (!page_is_mergable(page))
> +               return false;
>          VM_BUG_ON_PAGE(!PageAnon(page), page);
>          VM_BUG_ON_PAGE(!PageLocked(page), page);
>          VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page);

Looks good to me!

> 
> 
> For KSM, similarly just bail out early. But still wondering if this is 
> already checked
> somehow for KSM.

+1 I'm looking for a machine to test it on.