[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210812042813.GA1576603@agluck-desk2.amr.corp.intel.com>
Date: Wed, 11 Aug 2021 21:28:13 -0700
From: "Luck, Tony" <tony.luck@...el.com>
To: Naoya Horiguchi <nao.horiguchi@...il.com>
Cc: Oscar Salvador <osalvador@...e.de>,
Muchun Song <songmuchun@...edance.com>,
Mike Kravetz <mike.kravetz@...cle.com>, linux-mm@...ck.org,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.com>,
Naoya Horiguchi <naoya.horiguchi@....com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 1/2] mm,hwpoison: fix race with hugetlb page allocation
On Fri, Jun 04, 2021 at 08:36:31AM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@....com>
>
> When hugetlb page fault (under overcommitting situation) and
> memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
>
> CPU0: CPU1:
>
> gather_surplus_pages()
> page = alloc_surplus_huge_page()
> memory_failure_hugetlb()
> get_hwpoison_page(page)
> __get_hwpoison_page(page)
> get_page_unless_zero(page)
> zero = put_page_testzero(page)
> VM_BUG_ON_PAGE(!zero, page)
> enqueue_huge_page(h, page)
> put_page(page)
>
> __get_hwpoison_page() only checks the page refcount before taking an
> additional one for memory error handling, which is not enough because
> there's a time window where compound pages have non-zero refcount during
> hugetlb page initialization.
>
> So make __get_hwpoison_page() check page status a bit more for hugetlb
> pages with get_hwpoison_huge_page(). Checking hugetlb-specific flags
> under hugetlb_lock makes sure that the hugetlb page is not transitive.
> It's notable that another new function, HWPoisonHandlable(), is helpful
> to prevent a race against other transitive page states (like a generic
> compound page just before PageHuge becomes true).
I'm seeing some strange results when doing a simple injection/recovery.
Current upstream often fails to offline the page with messages like:
"high-order kernel page"
or
"unknown page"
Things were working in v5.12. Broken in v5.13.
Bisect says that:
25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation")
is the culprit (though it is possible that there is more than one
issue ... failure symptoms changed a bit during the bisection).
This commit doesn't revert automatically from upstream. But it
does revert from v5.13. Running with this reverted from v5.13
gives kernel that recovers normally[1] from hundreds of consecutive
error injections.
-Tony
[1] Almost normally. My test catches SIGBUS and prints the virtual
address from the siginfo_t structure. Sometimes the address is correct
other times it is NULL.
Powered by blists - more mailing lists