[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c40cfd0b-f045-4887-a955-fee7e0392cf1@oracle.com>
Date: Fri, 26 Apr 2024 11:53:01 -0700
From: Sidhartha Kumar <sidhartha.kumar@...cle.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
akpm@...ux-foundation.org, linmiaohe@...wei.com, jane.chu@...cle.com,
nao.horiguchi@...il.com, osalvador@...e.de
Subject: Re: [PATCH] mm/memory-failure: remove shake_page()
On 4/26/24 11:27 AM, Matthew Wilcox wrote:
> On Fri, Apr 26, 2024 at 10:57:31AM -0700, Sidhartha Kumar wrote:
>> On 4/26/24 10:34 AM, Matthew Wilcox wrote:
>>> On Fri, Apr 26, 2024 at 10:15:11AM -0700, Sidhartha Kumar wrote:
>>>> Use a folio in get_any_page() to save 5 calls to compound head and
>>>> convert the last user of shake_page() to shake_folio(). This allows us
>>>> to remove the shake_page() definition.
>>>
>>> So I didn't do this before because I wasn't convinced it was safe.
>>> We don't have a refcount on the folio, so the page might no longer
>>> be part of this folio by the time we get the refcount on the folio.
>>>
>>> I'd really like to see some argumentation for why this is safe.
>>
>> If I moved down the folio = page_folio() line to after we verify
>> __get_hwpoison_page() has returned 1, which indicates the reference count
>> was successfully incremented via foliO_try_get(), that means the folio
>> conversion would happen after we have a refcount. In the case we don't call
>> __get_hwpoison_page(), that means the MF_COUNT_INCREASED flag is set. This
>> means the page has existing users so that path would be safe as well. So I
>> think this is safe after moving page_folio() after __get_hwpoison_page().
>
> See if you can find a hole in this chain of reasoning ...
>
> memory_failure()
> p = pfn_to_online_page(pfn);
> res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
> (not a hugetlb)
> if (TestSetPageHWPoison(p)) {
> (not already poisoned)
> if (!(flags & MF_COUNT_INCREASED)) {
> res = get_hwpoison_page(p, flags);
>
> get_hwpoison_page()
> ret = get_any_page(p, flags);
>
> get_any_page()
> folio = page_folio(page)
That would be unsafe, the safe way would be if we moved page_folio() after the
call to __get_hw_poison() in get_any_page() and there would still be one
remaining user of shake_page() that we can't convert. A safe version of this
patch would result in a removal of one use of PageHuge() and two uses of
put_page(), would that be worth submitting?
get_any_page()
if(__get_hwpoison_page())
folio = page_folio() /* folio_try_get() returned 1, safe */
>
> Because we don't have a reference on the folio at this point (how could
> we?), the folio might be split, and now we have a pointer to a folio
> which no longer contains the page (assuming we had a hwerror in what
> was a tail page at this time).
Powered by blists - more mailing lists