lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b1245ac1-4dab-443d-97e8-cee583235547@oracle.com>
Date: Mon, 20 Jan 2025 21:20:19 -0800
From: jane.chu@...cle.com
To: Matthew Wilcox <willy@...radead.org>
Cc: akpm@...ux-foundation.org, linmiaohe@...wei.com,
        kirill.shutemov@...ux.intel.com, hughd@...gle.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: make page_mapped_in_vma() hugetlb walk aware

Thanks for the quick comment!

On 1/20/2025 9:00 PM, Matthew Wilcox wrote:
> On Mon, Jan 20, 2025 at 09:18:49PM -0700, Jane Chu wrote:
>> When a process consumes a UE in a page, the memory failure handler
>> attempts to collect information for a potential SIGBUS.
>> If the page is an anonymous page, page_mapped_in_vma(page, vma) is
>> invoked in order to
>>    1. retrieve the vaddr from the process' address space,
>>    2. verify that the vaddr is indeed mapped to the poisoned page,
>> where 'page' is the precise small page with UE.
>>
>> It's been observed that when injecting poison to a non-head subpage
>> of an anonymous hugetlb page, no SIGBUS show up; while injecting to
>> the head page produces a SIGBUS. The casue is that, though hugetlb_walk()
>> returns a valid pmd entry (on x86), but check_pte() detects mismatch
>> between the head page per the pmd and the input subpage. Thus the vaddr
>> is considered not mapped to the subpage and the process is not collected
>> for SIGBUS purpose.  This is the calling stack
>>        collect_procs_anon
>>          page_mapped_in_vma
>>            page_vma_mapped_walk
>>              hugetlb_walk
>>                huge_pte_lock
>>                  check_pte
>>
>> It seems that the most obvious place to fix the issue is by making
>> page_mapped_in_vma() hugetlb walk aware. The precise subpage in the
>> input is useful in providing PAGE_SIZE granularity vaddr.
> I don't like this solution because it adds yet another special case for
> hugetlb.  If we don't split a PMD-mapped THP, we'd have the same
> problem, right?
>
> check_pte() would succeed if we set pvmw->pfn to folio_pfn() and
> pvmw->nr_pages to folio_nr_pages(), right?  I just don't know what else
> might be affected by that.
>
> I like one of these two options:
>
> @@ -206,6 +206,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>                  pvmw->pte = hugetlb_walk(vma, pvmw->address, size);
>                  if (!pvmw->pte)
>                          return false;
> +               pvmw->pte += pvmw->address & (size - PAGE_SIZE);
>
>                  pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte);
>                  if (!check_pte(pvmw))
>
> (that needs a bit of tidying up; you can't just do that, but I think
> you get the basic idea -- correct the pte to point to the precise page
> instead of the hugetlb pfn)
That'll work, let me think about how to tidy it up.  More below.
>
>
> The option I really prefer is much more work but matches our preferred
> direction of getting rid of hugetlb specific code.  Something like this:
>
> @@ -192,27 +192,6 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>          if (pvmw->pmd && !pvmw->pte)
>                  return not_found(pvmw);
>
> -       if (unlikely(is_vm_hugetlb_page(vma))) {
> -               struct hstate *hstate = hstate_vma(vma);
> -               unsigned long size = huge_page_size(hstate);
> -               /* The only possible mapping was handled on last iteration */
> [...]
> -               pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte);
> -               if (!check_pte(pvmw))
> -                       return not_found(pvmw);
> -               return true;
> -       }
> -
>          end = vma_address_end(pvmw);
>          if (pvmw->pte)
>                  goto next_pte;
> @@ -229,7 +208,19 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw                        continue;
>                  }
>                  pud = pud_offset(p4d, pvmw->address);
> -               if (!pud_present(*pud)) {
> +               pude = *pud;
> +               if (pud_trans_huge(pude) ||
> +                   (pud_present(pude) && pud_devmap(pude))) {
> +                       pvmw->ptl = pud_lock(mm, pvmw->pud);
> +                       ...
> +                       if (likely(pud_trans_huge(pude) || pud_devmap(pude))) {
> +                               if (pvmw->flags & PVMW_MIGRATION)
> +                                       return not_found(pvmw);
> +                               if (!check_pud(pud_pfn(pude), pvmw))
> +                                       return not_found(pvmw);
> +                               return true;
> +                       }
> +               } else if (!pud_present(pude)) {
>                          step_forward(pvmw, PUD_SIZE);
>                          continue;
>                  }
>
> ie get rid of all the hugetlb-specific code, and add support for the
> PUD level to the common code.  You'd also need to write check_pud().
Good idea!  I'd like to give this more generic approach a try as well.
>
> I'll understand if you don't want to do all the extra work.  And
> thanks for tracking down this bug.

Thanks a lot!

-jane


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ