lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fd469dd5-8d4e-4169-ac7a-daeff9d2777f@linux.dev>
Date: Fri, 19 Sep 2025 21:24:35 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: David Hildenbrand <david@...hat.com>
Cc: Qun-wei Lin (林群崴) <Qun-wei.Lin@...iatek.com>,
 "catalin.marinas@....com" <catalin.marinas@....com>,
 "usamaarif642@...il.com" <usamaarif642@...il.com>,
 "linux-mm@...ck.org" <linux-mm@...ck.org>,
 "yuzhao@...gle.com" <yuzhao@...gle.com>,
 "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
 "corbet@....net" <corbet@....net>,
 Andrew Yang (楊智強) <Andrew.Yang@...iatek.com>,
 "npache@...hat.com" <npache@...hat.com>, "rppt@...nel.org"
 <rppt@...nel.org>, "willy@...radead.org" <willy@...radead.org>,
 "kernel-team@...a.com" <kernel-team@...a.com>,
 "roman.gushchin@...ux.dev" <roman.gushchin@...ux.dev>,
 "hannes@...xchg.org" <hannes@...xchg.org>,
 "cerasuolodomenico@...il.com" <cerasuolodomenico@...il.com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "ryncsn@...il.com" <ryncsn@...il.com>, "surenb@...gle.com"
 <surenb@...gle.com>, "riel@...riel.com" <riel@...riel.com>,
 "shakeel.butt@...ux.dev" <shakeel.butt@...ux.dev>,
 Chinwen Chang (張錦文)
 <chinwen.chang@...iatek.com>,
 "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
 Casper Li (李中榮) <casper.li@...iatek.com>,
 "ryan.roberts@....com" <ryan.roberts@....com>,
 "linux-mediatek@...ts.infradead.org" <linux-mediatek@...ts.infradead.org>,
 "baohua@...nel.org" <baohua@...nel.org>,
 "kaleshsingh@...gle.com" <kaleshsingh@...gle.com>,
 "zhais@...gle.com" <zhais@...gle.com>,
 "linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH v5 2/6] mm: remap unused subpages to shared zeropage when
 splitting isolated thp



On 2025/9/19 21:09, David Hildenbrand wrote:
> On 19.09.25 14:19, Lance Yang wrote:
>> Hey David,
>>
>> I believe I've found the exact reason why KSM skips MTE-tagged pages ;p
>>
>>>
>>>
>>> On 2025/9/19 16:14, Lance Yang wrote:
>>>>
>>>>
>>>> On 2025/9/19 15:55, David Hildenbrand wrote:
>>>>>>> I think where possible we really only want to identify problematic
>>>>>>> (tagged) pages and skip them. And we should either look into fixing
>>>>>>> KSM
>>>>>>> as well or finding out why KSM is not affected.
>>>>>>
>>>>>> Yeah. Seems like we could introduce a new helper,
>>>>>> folio_test_mte_tagged(struct
>>>>>> folio *folio). By default, it would return false, and architectures
>>>>>> like
>>>>>> arm64
>>>>>> can override it.
>>>>>
>>>>> If we add a new helper it should instead express the semantics that
>>>>> we cannot deduplicate.
>>>>
>>>> Agreed.
>>>>
>>>>>
>>>>> For THP, I recall that only some pages might be tagged. So likely we
>>>>> want to check per page.
>>>>
>>>> Yes, a per-page check would be simpler.
>>>>
>>>>>
>>>>>>
>>>>>> Looking at the code, the PG_mte_tagged flag is not set for regular 
>>>>>> THP.
>>>>>
>>>>> I think it's supported for THP per page. Only for hugetlb we tag the
>>>>> whole thing through the head page instead of individual pages.
>>>>
>>>> Right. That's exactly what I meant.
>>>>
>>>>>
>>>>>> The MTE
>>>>>> status actually comes from the VM_MTE flag in the VMA that maps it.
>>>>>>
>>>>>
>>>>> During the rmap walk we could check the VMA flag, but there would be
>>>>> no way to just stop the THP shrinker scanning this page early.
>>>>>
>>>>>> static inline bool folio_test_hugetlb_mte_tagged(struct folio *folio)
>>>>>> {
>>>>>>      bool ret = test_bit(PG_mte_tagged, &folio->flags.f);
>>>>>>
>>>>>>      VM_WARN_ON_ONCE(!folio_test_hugetlb(folio));
>>>>>>
>>>>>>      /*
>>>>>>       * If the folio is tagged, ensure ordering with a likely 
>>>>>> subsequent
>>>>>>       * read of the tags.
>>>>>>       */
>>>>>>      if (ret)
>>>>>>          smp_rmb();
>>>>>>      return ret;
>>>>>> }
>>>>>>
>>>>>> static inline bool page_mte_tagged(struct page *page)
>>>>>> {
>>>>>>      bool ret = test_bit(PG_mte_tagged, &page->flags.f);
>>>>>>
>>>>>>      VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
>>>>>>
>>>>>>      /*
>>>>>>       * If the page is tagged, ensure ordering with a likely 
>>>>>> subsequent
>>>>>>       * read of the tags.
>>>>>>       */
>>>>>>      if (ret)
>>>>>>          smp_rmb();
>>>>>>      return ret;
>>>>>> }
>>>>>>
>>>>>> contpte_set_ptes()
>>>>>>      __set_ptes()
>>>>>>          __set_ptes_anysz()
>>>>>>              __sync_cache_and_tags()
>>>>>>                  mte_sync_tags()
>>>>>>                      set_page_mte_tagged()
>>>>>>
>>>>>> Then, having the THP shrinker skip any folios that are identified as
>>>>>> MTE-tagged.
>>>>>
>>>>> Likely we should just do something like (maybe we want better naming)
>>>>>
>>>>> #ifndef page_is_mergable
>>>>> #define page_is_mergable(page) (true)
>>>>> #endif
>>>>
>>>>
>>>> Maybe something like page_is_optimizable()? Just a thought ;p
>>>>
>>>>>
>>>>> And for arm64 have it be
>>>>>
>>>>> #define page_is_mergable(page) (!page_mte_tagged(page))
>>>>>
>>>>>
>>>>> And then do
>>>>>
>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>> index 1f0813b956436..1cac9093918d6 100644
>>>>> --- a/mm/huge_memory.c
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -4251,7 +4251,8 @@ static bool thp_underused(struct folio *folio)
>>>>>
>>>>>           for (i = 0; i < folio_nr_pages(folio); i++) {
>>>>>                   kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
>>>>> -               if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>>>> +               if (page_is_mergable(folio_page(folio, i)) &&
>>>>> +                   !memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>>>>                           num_zero_pages++;
>>>>>                           if (num_zero_pages >
>>>>> khugepaged_max_ptes_none) {
>>>>>                                   kunmap_local(kaddr);
>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>> index 946253c398072..476a9a9091bd3 100644
>>>>> --- a/mm/migrate.c
>>>>> +++ b/mm/migrate.c
>>>>> @@ -306,6 +306,8 @@ static bool try_to_map_unused_to_zeropage(struct
>>>>> page_vma_mapped_walk *pvmw,
>>>>>
>>>>>           if (PageCompound(page))
>>>>>                   return false;
>>>>> +       if (!page_is_mergable(page))
>>>>> +               return false;
>>>>>           VM_BUG_ON_PAGE(!PageAnon(page), page);
>>>>>           VM_BUG_ON_PAGE(!PageLocked(page), page);
>>>>>           VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page);
>>>>
>>>> Looks good to me!
>>>>
>>>>>
>>>>>
>>>>> For KSM, similarly just bail out early. But still wondering if this
>>>>> is already checked
>>>>> somehow for KSM.
>>>>
>>>> +1 I'm looking for a machine to test it on.
>>>
>>> Interestingly, it seems KSM is already skipping MTE-tagged pages. My 
>>> test,
>>> running on a v6.8.0 kernel inside QEMU (with MTE enabled), shows no 
>>> merging
>>> activity for those pages ...
>>
>> KSM's call to pages_identical() ultimately leads to memcmp_pages(). The
>> arm64 implementation of memcmp_pages() in arch/arm64/kernel/mte.c 
>> contains
>> a specific check that prevents merging in this case.
>>
>> try_to_merge_one_page()
>>     -> pages_identical()
>>         -> !memcmp_pages() Fails!
>>         -> replace_page()
>>
>>
>> int memcmp_pages(struct page *page1, struct page *page2)
>> {
>>     char *addr1, *addr2;
>>     int ret;
>>
>>     addr1 = page_address(page1);
>>     addr2 = page_address(page2);
>>     ret = memcmp(addr1, addr2, PAGE_SIZE);
>>
>>     if (!system_supports_mte() || ret)
>>         return ret;
>>
>>     /*
>>      * If the page content is identical but at least one of the pages is
>>      * tagged, return non-zero to avoid KSM merging. If only one of the
>>      * pages is tagged, __set_ptes() may zero or change the tags of the
>>      * other page via mte_sync_tags().
>>      */
>>     if (page_mte_tagged(page1) || page_mte_tagged(page2))
>>         return addr1 != addr2;
>>
>>     return ret;
>> }
>>
>> IIUC, if either page is MTE-tagged, memcmp_pages() intentionally returns
>> a non-zero value, which in turn causes pages_identical() to return false.
> 
> Cool, so we should likely just use that then in the shrinker code. Can 
> you send a fix?

Certainly! I'll get on that ;p

Cheers,
Lance


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ