[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6277f5a3-1d12-4d6e-9fad-9e720876a4ce@oracle.com>
Date: Tue, 25 Feb 2025 11:56:44 -0800
From: jane.chu@...cle.com
To: Miaohe Lin <linmiaohe@...wei.com>
Cc: willy@...radead.org, peterx@...hat.com, akpm@...ux-foundation.org,
kirill.shutemov@...ux.intel.com, hughd@...gle.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm: make page_mapped_in_vma() hugetlb walk aware
On 2/24/2025 10:49 PM, Miaohe Lin wrote:
> On 2025/2/25 5:14, Jane Chu wrote:
>> When a process consumes a UE in a page, the memory failure handler
>> attempts to collect information for a potential SIGBUS.
>> If the page is an anonymous page, page_mapped_in_vma(page, vma) is
>> invoked in order to
>> 1. retrieve the vaddr from the process' address space,
>> 2. verify that the vaddr is indeed mapped to the poisoned page,
>> where 'page' is the precise small page with UE.
>>
>> It's been observed that when injecting poison to a non-head subpage
>> of an anonymous hugetlb page, no SIGBUS show up; while injecting to
>> the head page produces a SIGBUS. The casue is that, though hugetlb_walk()
>> returns a valid pmd entry (on x86), but check_pte() detects mismatch
>> between the head page per the pmd and the input subpage. Thus the vaddr
>> is considered not mapped to the subpage and the process is not collected
>> for SIGBUS purpose. This is the calling stack
>> collect_procs_anon
>> page_mapped_in_vma
>> page_vma_mapped_walk
>> hugetlb_walk
>> huge_pte_lock
>> check_pte
>>
>> check_pte() header says that it
>> "check if [pvmw->pfn, @pvmw->pfn + @pvmw->nr_pages) is mapped at the @pvmw->pte"
>> but practically works only if pvmw->pfn is the head page pfn at pvmw->pte.
>> Hindsight acknowledging that some pvmw->pte could point to a hugepage of
>> some sort such that it makes sense to make check_pte() work for hugepage.
> Thanks for your patch. This patch looks good to me.
>
>> Signed-off-by: Jane Chu <jane.chu@...cle.com>
> Is a Fixes tag needed?
I don't have a clear call and here is the reason.
Since the introduction of check_pte() by ace71a19cec5e ("mm: introduce
page_vma_mapped_walk()"), it has carried the assumption that pvmw->page
(later changed to pvmw->pfn) points to the head of a huge page or a
small page and had been used in such way, so that it doesn't really
check whether a given subpage range falls within a huge leaf pte range.
When 376907f3a0b34 ("mm/memory-failure: pass the folio and the page to
collect_procs()") came along, it sort of exposed the latent issue which
hadn't been an issue before.
Thanks!
-jane
>
> Thanks.
> .
Powered by blists - more mailing lists