[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190827013429.GA5125@hori.linux.bs1.fc.nec.co.jp>
Date: Tue, 27 Aug 2019 01:34:29 +0000
From: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To: Oscar Salvador <osalvador@...e.de>
CC: "mhocko@...nel.org" <mhocko@...nel.org>,
"mike.kravetz@...cle.com" <mike.kravetz@...cle.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"vbabka@...e.cz" <vbabka@...e.cz>
Subject: Re: poisoned pages do not play well in the buddy allocator
On Mon, Aug 26, 2019 at 12:41:50PM +0200, Oscar Salvador wrote:
> Hi,
>
> When analyzing a problem reported by one of our customers, I stumbbled upon an issue
> that origins from the fact that poisoned pages end up in the buddy allocator.
>
> Let me break down the stepts that lie to the problem:
>
> 1) We soft-offline a page
> 2) Page gets flagged as HWPoison and is being sent to the buddy allocator.
> This is done through set_hwpoison_free_buddy_page().
> 3) Kcompactd wakes up in order to perform some compaction.
> 4) compact_zone() will call migrate_pages()
> 5) migrate_pages() will try to get a new page from compaction_alloc() to migrate to
> 6) if cc->freelist is empty, compaction_alloc() will call isolate_free_pagesblock()
> 7) isolate_free_pagesblock only checks for PageBuddy() to assume that a page is OK
> to be used to migrate to. Since HWPoisoned page are also PageBuddy, we add
> the page to the list. (same problem exists in fast_isolate_freepages()).
>
> The outcome of that is that we end up happily handing poisoned pages in compaction_alloc,
> so if we ever got a fault on that page through *_fault, we will return VM_FAULT_HWPOISON,
> and the process will be killed.
>
> I first though that I could get away with it by checking PageHWPoison in
> {fast_isolate_freepages/isolate_free_pagesblock}, but not really.
> It might be that the page we are checking is an order > 0 page, so the first page
> might not be poisoned, but the one the follows might be, and we end up in the
> same situation.
Yes, this is a whole point of the current implementation.
>
> After some more thought, I really came to the conclusion that HWPoison pages should not
> really be in the buddy allocator, as this is only asking for problems.
> In this case it is only compaction code, but it could be happening somewhere else,
> and one would expect that the pages you got from the buddy allocator are __ready__ to use.
>
> I __think__ that we thought we were safe to put HWPoison pages in the buddy allocator as we
> perform healthy checks when getting a page from there, so we skip poisoned pages
>
> Of course, this is not the end of the story, now that someone got a page, if he frees it,
> there is a high chance that this page ends up in a pcplist (I saw that).
> Unless we are on CONFIG_VM_DEBUG, we do not check for the health of pages got from pcplist,
> as we do when getting a page from the buddy allocator.
>
> I checked [1], and it seems that [2] was going towards fixing this kind of issue.
>
> I think it is about time to revamp the whole thing.
>
> @Naoya: I could give it a try if you are busy.
Thanks for raising hand. That's really wonderful. I think that the series [1] is not
merge yet but not rejected yet, so feel free to reuse/update/revamp it.
>
> [1] https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> [2] https://lore.kernel.org/linux-mm/1541746035-13408-9-git-send-email-n-horiguchi@ah.jp.nec.com/
>
> --
> Oscar Salvador
> SUSE L3
>
Powered by blists - more mailing lists