lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190830104530.GA29647@linux>
Date:   Fri, 30 Aug 2019 12:45:35 +0200
From:   Oscar Salvador <osalvador@...e.de>
To:     Naoya Horiguchi <n-horiguchi@...jp.nec.com>
Cc:     "mhocko@...nel.org" <mhocko@...nel.org>,
        "mike.kravetz@...cle.com" <mike.kravetz@...cle.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "vbabka@...e.cz" <vbabka@...e.cz>
Subject: Re: poisoned pages do not play well in the buddy allocator

On Tue, Aug 27, 2019 at 09:28:13AM +0200, Oscar Salvador wrote:
> On Tue, Aug 27, 2019 at 01:34:29AM +0000, Naoya Horiguchi wrote:
> > > @Naoya: I could give it a try if you are busy.
> > 
> > Thanks for raising hand. That's really wonderful. I think that the series [1] is not
> > merge yet but not rejected yet, so feel free to reuse/update/revamp it.
> 
> I will continue pursuing this then :-).

I have started implementing a fix for this.
Right now I only performed tests on normal pages (non-hugetlb).

I took the approach of:

- Free page: remove it from the buddy allocator and set it as PageReserved|PageHWPoison.
- Used page: migrate it and do not release it (skip put_page in unmap_and_move for MR_MEMORY_FAILURE
	     reason). Set it as PageReserved|PageHWPoison.

The routine that handles this also sets the refcount of these pages to 1, so the unpoison
machinery will only have to check for PageHWPoison and to a put_page() to send
the page to the buddy allocator.

The Reserved bit is used because these pages will now __only__ be accessible through
pfn walkers, and pfn walkers should respect Reserved pages.
The PageHWPoison bit is used to remember that this page is poisoned, so the unpoison
machinery knows that it is valid to unpoison it.

It should also let us get rid of some if not all of the PageHWPoison() checks.

Overall, it seems to work as I no longer see the issue our customer and I faced.

My goal is to go further and publish that fix along with several
cleanups/refactors for the soft-offline machinery (hard-poison will come later),
as I strongly think we do really need to re-work that a bit, to make it more simple.

Since it will take a bit to have everything ready, I just wanted to
let you know.

-- 
Oscar Salvador
SUSE L3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ