lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 26 Apr 2024 19:27:23 +0100
From: Matthew Wilcox <willy@...radead.org>
To: Sidhartha Kumar <sidhartha.kumar@...cle.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	akpm@...ux-foundation.org, linmiaohe@...wei.com,
	jane.chu@...cle.com, nao.horiguchi@...il.com, osalvador@...e.de
Subject: Re: [PATCH] mm/memory-failure: remove shake_page()

On Fri, Apr 26, 2024 at 10:57:31AM -0700, Sidhartha Kumar wrote:
> On 4/26/24 10:34 AM, Matthew Wilcox wrote:
> > On Fri, Apr 26, 2024 at 10:15:11AM -0700, Sidhartha Kumar wrote:
> > > Use a folio in get_any_page() to save 5 calls to compound head and
> > > convert the last user of shake_page() to shake_folio(). This allows us
> > > to remove the shake_page() definition.
> > 
> > So I didn't do this before because I wasn't convinced it was safe.
> > We don't have a refcount on the folio, so the page might no longer
> > be part of this folio by the time we get the refcount on the folio.
> > 
> > I'd really like to see some argumentation for why this is safe.
> 
> If I moved down the folio = page_folio() line to after we verify
> __get_hwpoison_page() has returned 1, which indicates the reference count
> was successfully incremented via foliO_try_get(), that means the folio
> conversion would happen after we have a refcount. In the case we don't call
> __get_hwpoison_page(), that means the MF_COUNT_INCREASED flag is set. This
> means the page has existing users so that path would be safe as well. So I
> think this is safe after moving page_folio() after __get_hwpoison_page().

See if you can find a hole in this chain of reasoning ...

memory_failure()
        p = pfn_to_online_page(pfn);
        res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
(not a hugetlb)
        if (TestSetPageHWPoison(p)) {
(not already poisoned)
        if (!(flags & MF_COUNT_INCREASED)) {
                res = get_hwpoison_page(p, flags);

get_hwpoison_page()
                ret = get_any_page(p, flags);

get_any_page()
	folio = page_folio(page)

Because we don't have a reference on the folio at this point (how could
we?), the folio might be split, and now we have a pointer to a folio
which no longer contains the page (assuming we had a hwerror in what
was a tail page at this time).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ