lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACw3F50auHR_dWnhVq3DbUp41-y7GKHR9LeKny_8ztAOR3W5rg@mail.gmail.com>
Date:   Tue, 30 May 2023 15:11:20 -0700
From:   Jiaqi Yan <jiaqiyan@...gle.com>
To:     HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>
Cc:     "mike.kravetz@...cle.com" <mike.kravetz@...cle.com>,
        "peterx@...hat.com" <peterx@...hat.com>,
        "songmuchun@...edance.com" <songmuchun@...edance.com>,
        "duenwen@...gle.com" <duenwen@...gle.com>,
        "axelrasmussen@...gle.com" <axelrasmussen@...gle.com>,
        "jthoughton@...gle.com" <jthoughton@...gle.com>,
        "rientjes@...gle.com" <rientjes@...gle.com>,
        "linmiaohe@...wei.com" <linmiaohe@...wei.com>,
        "shy828301@...il.com" <shy828301@...il.com>,
        "baolin.wang@...ux.alibaba.com" <baolin.wang@...ux.alibaba.com>,
        "wangkefeng.wang@...wei.com" <wangkefeng.wang@...wei.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH v1 4/7] mm/memory_failure: unmap raw HWPoison PTEs
 when possible

On Tue, May 30, 2023 at 2:31 PM Jiaqi Yan <jiaqiyan@...gle.com> wrote:
>
> On Mon, May 29, 2023 at 7:25 PM HORIGUCHI NAOYA(堀口 直也)
> <naoya.horiguchi@....com> wrote:
> >
> > On Fri, Apr 28, 2023 at 12:41:36AM +0000, Jiaqi Yan wrote:
> > > When a folio's VMA is HGM eligible, try_to_unmap_one now only unmaps
> > > the raw HWPOISON page (previously split and mapped at PTE size).
> > > If HGM failed to be enabled on eligible VMA or splitting failed,
> > > try_to_unmap_one fails.
> > >
> > > For VMS that is not HGM eligible, try_to_unmap_one still unmaps
> > > the whole P*D.
> > >
> > > When only the raw HWPOISON subpage is unmapped but others keep mapped,
> > > the old way in memory_failure to check if unmapping successful doesn't
> > > work. So introduce is_unmapping_successful() to cover both existing and
> > > new unmapping behavior.
> > >
> > > For the new unmapping behavior, store how many times a raw HWPOISON page
> > > is expected to be unmapped, and how many times it is actually unmapped
> > > in try_to_unmap_one(). A HWPOISON raw page is expected to be unmapped
> > > from a VMA if splitting succeeded in try_to_split_huge_mapping(), so
> > > unmap_success = (nr_expected_unamps == nr_actual_unmaps).
> > >
> > > Old folio_set_hugetlb_hwpoison returns -EHWPOISON if a folio has any
> > > raw HWPOISON subpage, and try_memory_failure_hugetlb won't attempt
> > > recovery actions again because recovery used to be done on the entire
> > > hugepage. With the new unmapping behavior, this doesn't hold. More
> > > subpages in the hugepage can become corrupted, and needs to be recovered
> > > (i.e. unmapped) individually. New folio_set_hugetlb_hwpoison returns
> > > 0 after adding a new raw subpage to raw_hwp_list.
> > >
> > > Unmapping raw HWPOISON page requires allocating raw_hwp_page
> > > successfully in folio_set_hugetlb_hwpoison, so try_memory_failure_hugetlb
> > > now may fail due to OOM.
> > >
> > > Signed-off-by: Jiaqi Yan <jiaqiyan@...gle.com>
> > > ---
> > ...
> >
> > > @@ -1827,6 +1879,31 @@ EXPORT_SYMBOL_GPL(mf_dax_kill_procs);
> > >
> > >  #ifdef CONFIG_HUGETLB_PAGE
> > >
> > > +/*
> > > + * Given a HWPOISON @subpage as raw page, find its location in @folio's
> > > + * _hugetlb_hwpoison. Return NULL if @subpage is not in the list.
> > > + */
> > > +struct raw_hwp_page *find_in_raw_hwp_list(struct folio *folio,

BTW, per our discussion here[1], this routine will probably reuse what
comes out of the refactored routine.

It should be safe for try_to_unmap_one to hold a raw_hwp_page returned
from find_in_raw_hwp_list as long as raw_hwp_list is protected by
mf_mutex.

[1] https://lore.kernel.org/linux-mm/CACw3F53+Hg4CgFoPj3LLSiURzWfa2egWLO-=12GzfhsNC3XTvQ@mail.gmail.com/T/#m9966de1007b80eb8bd2c2ce0a9db13624cd2652e

> > > +                                       struct page *subpage)
> > > +{
> > > +     struct llist_node *t, *tnode;
> > > +     struct llist_head *raw_hwp_head = raw_hwp_list_head(folio);
> > > +     struct raw_hwp_page *hwp_page = NULL;
> > > +     struct raw_hwp_page *p;
> > > +
> > > +     VM_BUG_ON_PAGE(PageHWPoison(subpage), subpage);
> >
> > I'm testing the series (on top of v6.2-rc4 + HGM v2 patchset) and found the
> > following error triggered by this VM_BUG_ON_PAGE().  The testcase is just to
> > inject hwpoison on an anonymous page (it's not hugetlb-related one).
>
> Thanks for reporting this problem, Naoya!
>
> My mistake, this assertion meant to be "if !PageHWPoison(subpage)", to
> make sure the caller of find_in_raw_hwp_list is sure that subpage is
> hw corrupted.
>
> >
> >   [  790.610985] ===> testcase 'mm/hwpoison/base/backend-anonymous_error-hard-offline_access-avoid.auto3' start
> >   [  793.304927] page:000000006743177b refcount:1 mapcount:0 mapping:0000000000000000 index:0x700000000 pfn:0x14d739
> >   [  793.309322] memcg:ffff8a30c50b6000
> >   [  793.310934] anon flags: 0x57ffffe08a001d(locked|uptodate|dirty|lru|mappedtodisk|swapbacked|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
> >   [  793.316665] raw: 0057ffffe08a001d ffffe93cc5353c88 ffffe93cc5685fc8 ffff8a30c91878f1
> >   [  793.320211] raw: 0000000700000000 0000000000000000 00000001ffffffff ffff8a30c50b6000
> >   [  793.323665] page dumped because: VM_BUG_ON_PAGE(PageHWPoison(subpage))
> >   [  793.326764] ------------[ cut here ]------------
> >   [  793.329080] kernel BUG at mm/memory-failure.c:1894!
> >   [  793.331895] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> >   [  793.334854] CPU: 4 PID: 2644 Comm: mceinj.sh Tainted: G            E    N 6.2.0-rc4-v6.2-rc2-230529-1404+ #63
> >   [  793.340710] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
> >   [  793.345875] RIP: 0010:hwpoison_user_mappings+0x654/0x780
> >   [  793.349066] Code: ef 89 de e8 6e bc f8 ff 48 8b 7c 24 20 48 83 c7 58 e8 10 bb d9 ff e9 5f fb ff ff 48 c7 c6 80 ce 4c b1 4c 89 ef e8 1c 38 f6 ff <0f> 0b 48 c7 c6 7b c8 4c b1 4c 89 ef e8 0b 38 f6 ff 0f 0b 8b 45 58
> >   [  793.359732] RSP: 0018:ffffa3ff85ed3d28 EFLAGS: 00010296
> >   [  793.362367] RAX: 000000000000003a RBX: 0000000000000018 RCX: 0000000000000000
> >   [  793.365763] RDX: 0000000000000001 RSI: ffffffffb14ac451 RDI: 00000000ffffffff
> >   [  793.368698] RBP: ffffe93cc535ce40 R08: 0000000000000000 R09: ffffa3ff85ed3ba0
> >   [  793.370837] R10: 0000000000000003 R11: ffffffffb1d3ed28 R12: 000000000014d739
> >   [  793.372903] R13: ffffe93cc535ce40 R14: ffffe93cc535ce40 R15: ffffe93cc535ce40
> >   [  793.374931] FS:  00007f6ccc42a740(0000) GS:ffff8a31bbc00000(0000) knlGS:0000000000000000
> >   [  793.377136] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   [  793.378656] CR2: 0000561aad6474b2 CR3: 00000001492d4005 CR4: 0000000000170ee0
> >   [  793.380514] DR0: ffffffffb28ed7d0 DR1: ffffffffb28ed7d1 DR2: ffffffffb28ed7d2
> >   [  793.382296] DR3: ffffffffb28ed7d3 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> >   [  793.384028] Call Trace:
> >   [  793.384655]  <TASK>
> >   [  793.385210]  ? __lru_add_drain_all+0x164/0x1f0
> >   [  793.386316]  memory_failure+0x352/0xaa0
> >   [  793.387249]  ? __pfx_bpf_lsm_capable+0x10/0x10
> >   [  793.388323]  ? __pfx_security_capable+0x10/0x10
> >   [  793.389350]  hard_offline_page_store+0x46/0x80
> >   [  793.390397]  kernfs_fop_write_iter+0x11e/0x200
> >   [  793.391441]  vfs_write+0x1e4/0x3a0
> >   [  793.392221]  ksys_write+0x53/0xd0
> >   [  793.392976]  do_syscall_64+0x3a/0x90
> >   [  793.393790]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> >
> > I'm wondering how this code path is called, one possible path is like this:
> >
> >   hwpoison_user_mappings
> >     if PageHuge(hpage) && !PageAnon(hpage)
> >       try_to_split_huge_mapping()
> >         find_in_raw_hwp_list
> >           VM_BUG_ON_PAGE(PageHWPoison(subpage), subpage)
> >
> > but this looks unlikely because the precheck "PageHuge(hpage) && !PageAnon(hpage)" is
> > false for anonymous pages.
> >
> > Another possible code path is:
> >
> >   hwpoison_user_mappings
> >     if PageHuge(hpage) && !PageAnon(hpage)
> >       ...
> >     else
> >       try_to_unmap
> >         rmap_walk
> >           rmap_walk_anon
> >             try_to_unmap_one
> >               if folio_test_hugetlb
> >                 if hgm_eligible
> >                   find_in_raw_hwp_list
> >                     VM_BUG_ON_PAGE(PageHWPoison(subpage), subpage)
> >
> > but this looks also unlikely because of checking folio_test_hugetlb and hgm_eligible
> > (I think both are false in this testcase.)
> > Maybe I miss something (and I'll dig this more), but let me share the issue.
>
> I bet it is in "is_unmapping_successful". So another problem with this
> patch is, "is_unmapping_successful" should only calls
> find_in_raw_hwp_list after it handles non hugetlb and non shared
> mapping, i.e.:
>
> struct raw_hwp_page *hwp_page = NULL;
>
> if (!folio_test_hugetlb(folio) ||
>     folio_test_anon(folio) ||
>     !IS_ENABLED(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING)) {
>         ...
> }
>
> hwp_page = find_in_raw_hwp_list(folio, poisoned_page);
> VM_BUG_ON_PAGE(!hwp_page, poisoned_page);
>
> I will make sure these two issues get fixed up in follow-up revisions.
>
> >
> > Thanks,
> > Naoya Horiguchi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ