[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ca2106a3-4bb2-4457-81af-301fd99fbef4@redhat.com>
Date: Fri, 19 Sep 2025 15:09:13 +0200
From: David Hildenbrand <david@...hat.com>
To: Lance Yang <lance.yang@...ux.dev>
Cc: Qun-wei Lin (林群崴) <Qun-wei.Lin@...iatek.com>,
"catalin.marinas@....com" <catalin.marinas@....com>,
"usamaarif642@...il.com" <usamaarif642@...il.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"yuzhao@...gle.com" <yuzhao@...gle.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"corbet@....net" <corbet@....net>,
Andrew Yang (楊智強) <Andrew.Yang@...iatek.com>,
"npache@...hat.com" <npache@...hat.com>, "rppt@...nel.org"
<rppt@...nel.org>, "willy@...radead.org" <willy@...radead.org>,
"kernel-team@...a.com" <kernel-team@...a.com>,
"roman.gushchin@...ux.dev" <roman.gushchin@...ux.dev>,
"hannes@...xchg.org" <hannes@...xchg.org>,
"cerasuolodomenico@...il.com" <cerasuolodomenico@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"ryncsn@...il.com" <ryncsn@...il.com>, "surenb@...gle.com"
<surenb@...gle.com>, "riel@...riel.com" <riel@...riel.com>,
"shakeel.butt@...ux.dev" <shakeel.butt@...ux.dev>,
Chinwen Chang (張錦文)
<chinwen.chang@...iatek.com>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
Casper Li (李中榮) <casper.li@...iatek.com>,
"ryan.roberts@....com" <ryan.roberts@....com>,
"linux-mediatek@...ts.infradead.org" <linux-mediatek@...ts.infradead.org>,
"baohua@...nel.org" <baohua@...nel.org>,
"kaleshsingh@...gle.com" <kaleshsingh@...gle.com>,
"zhais@...gle.com" <zhais@...gle.com>,
"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH v5 2/6] mm: remap unused subpages to shared zeropage when
splitting isolated thp
On 19.09.25 14:19, Lance Yang wrote:
> Hey David,
>
> I believe I've found the exact reason why KSM skips MTE-tagged pages ;p
>
>>
>>
>> On 2025/9/19 16:14, Lance Yang wrote:
>>>
>>>
>>> On 2025/9/19 15:55, David Hildenbrand wrote:
>>>>>> I think where possible we really only want to identify problematic
>>>>>> (tagged) pages and skip them. And we should either look into fixing
>>>>>> KSM
>>>>>> as well or finding out why KSM is not affected.
>>>>>
>>>>> Yeah. Seems like we could introduce a new helper,
>>>>> folio_test_mte_tagged(struct
>>>>> folio *folio). By default, it would return false, and architectures
>>>>> like
>>>>> arm64
>>>>> can override it.
>>>>
>>>> If we add a new helper it should instead express the semantics that
>>>> we cannot deduplicate.
>>>
>>> Agreed.
>>>
>>>>
>>>> For THP, I recall that only some pages might be tagged. So likely we
>>>> want to check per page.
>>>
>>> Yes, a per-page check would be simpler.
>>>
>>>>
>>>>>
>>>>> Looking at the code, the PG_mte_tagged flag is not set for regular THP.
>>>>
>>>> I think it's supported for THP per page. Only for hugetlb we tag the
>>>> whole thing through the head page instead of individual pages.
>>>
>>> Right. That's exactly what I meant.
>>>
>>>>
>>>>> The MTE
>>>>> status actually comes from the VM_MTE flag in the VMA that maps it.
>>>>>
>>>>
>>>> During the rmap walk we could check the VMA flag, but there would be
>>>> no way to just stop the THP shrinker scanning this page early.
>>>>
>>>>> static inline bool folio_test_hugetlb_mte_tagged(struct folio *folio)
>>>>> {
>>>>> bool ret = test_bit(PG_mte_tagged, &folio->flags.f);
>>>>>
>>>>> VM_WARN_ON_ONCE(!folio_test_hugetlb(folio));
>>>>>
>>>>> /*
>>>>> * If the folio is tagged, ensure ordering with a likely subsequent
>>>>> * read of the tags.
>>>>> */
>>>>> if (ret)
>>>>> smp_rmb();
>>>>> return ret;
>>>>> }
>>>>>
>>>>> static inline bool page_mte_tagged(struct page *page)
>>>>> {
>>>>> bool ret = test_bit(PG_mte_tagged, &page->flags.f);
>>>>>
>>>>> VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
>>>>>
>>>>> /*
>>>>> * If the page is tagged, ensure ordering with a likely subsequent
>>>>> * read of the tags.
>>>>> */
>>>>> if (ret)
>>>>> smp_rmb();
>>>>> return ret;
>>>>> }
>>>>>
>>>>> contpte_set_ptes()
>>>>> __set_ptes()
>>>>> __set_ptes_anysz()
>>>>> __sync_cache_and_tags()
>>>>> mte_sync_tags()
>>>>> set_page_mte_tagged()
>>>>>
>>>>> Then, having the THP shrinker skip any folios that are identified as
>>>>> MTE-tagged.
>>>>
>>>> Likely we should just do something like (maybe we want better naming)
>>>>
>>>> #ifndef page_is_mergable
>>>> #define page_is_mergable(page) (true)
>>>> #endif
>>>
>>>
>>> Maybe something like page_is_optimizable()? Just a thought ;p
>>>
>>>>
>>>> And for arm64 have it be
>>>>
>>>> #define page_is_mergable(page) (!page_mte_tagged(page))
>>>>
>>>>
>>>> And then do
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 1f0813b956436..1cac9093918d6 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -4251,7 +4251,8 @@ static bool thp_underused(struct folio *folio)
>>>>
>>>> for (i = 0; i < folio_nr_pages(folio); i++) {
>>>> kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
>>>> - if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>>> + if (page_is_mergable(folio_page(folio, i)) &&
>>>> + !memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>>> num_zero_pages++;
>>>> if (num_zero_pages >
>>>> khugepaged_max_ptes_none) {
>>>> kunmap_local(kaddr);
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index 946253c398072..476a9a9091bd3 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -306,6 +306,8 @@ static bool try_to_map_unused_to_zeropage(struct
>>>> page_vma_mapped_walk *pvmw,
>>>>
>>>> if (PageCompound(page))
>>>> return false;
>>>> + if (!page_is_mergable(page))
>>>> + return false;
>>>> VM_BUG_ON_PAGE(!PageAnon(page), page);
>>>> VM_BUG_ON_PAGE(!PageLocked(page), page);
>>>> VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page);
>>>
>>> Looks good to me!
>>>
>>>>
>>>>
>>>> For KSM, similarly just bail out early. But still wondering if this
>>>> is already checked
>>>> somehow for KSM.
>>>
>>> +1 I'm looking for a machine to test it on.
>>
>> Interestingly, it seems KSM is already skipping MTE-tagged pages. My test,
>> running on a v6.8.0 kernel inside QEMU (with MTE enabled), shows no merging
>> activity for those pages ...
>
> KSM's call to pages_identical() ultimately leads to memcmp_pages(). The
> arm64 implementation of memcmp_pages() in arch/arm64/kernel/mte.c contains
> a specific check that prevents merging in this case.
>
> try_to_merge_one_page()
> -> pages_identical()
> -> !memcmp_pages() Fails!
> -> replace_page()
>
>
> int memcmp_pages(struct page *page1, struct page *page2)
> {
> char *addr1, *addr2;
> int ret;
>
> addr1 = page_address(page1);
> addr2 = page_address(page2);
> ret = memcmp(addr1, addr2, PAGE_SIZE);
>
> if (!system_supports_mte() || ret)
> return ret;
>
> /*
> * If the page content is identical but at least one of the pages is
> * tagged, return non-zero to avoid KSM merging. If only one of the
> * pages is tagged, __set_ptes() may zero or change the tags of the
> * other page via mte_sync_tags().
> */
> if (page_mte_tagged(page1) || page_mte_tagged(page2))
> return addr1 != addr2;
>
> return ret;
> }
>
> IIUC, if either page is MTE-tagged, memcmp_pages() intentionally returns
> a non-zero value, which in turn causes pages_identical() to return false.
Cool, so we should likely just use that then in the shrinker code. Can
you send a fix?
--
Cheers
David / dhildenb
Powered by blists - more mailing lists