[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180713133545.658173ca953e7d2a8a4ee6bd@linux-foundation.org>
Date: Fri, 13 Jul 2018 13:35:45 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
Cc: linux-mm@...ck.org, Michal Hocko <mhocko@...nel.org>,
xishi.qiuxishi@...baba-inc.com, zy.zhengyi@...baba-inc.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 1/2] mm: fix race on soft-offlining free huge pages
On Fri, 13 Jul 2018 12:26:05 +0900 Naoya Horiguchi <n-horiguchi@...jp.nec.com> wrote:
> There's a race condition between soft offline and hugetlb_fault which
> causes unexpected process killing and/or hugetlb allocation failure.
>
> The process killing is caused by the following flow:
>
> CPU 0 CPU 1 CPU 2
>
> soft offline
> get_any_page
> // find the hugetlb is free
> mmap a hugetlb file
> page fault
> ...
> hugetlb_fault
> hugetlb_no_page
> alloc_huge_page
> // succeed
> soft_offline_free_page
> // set hwpoison flag
> mmap the hugetlb file
> page fault
> ...
> hugetlb_fault
> hugetlb_no_page
> find_lock_page
> return VM_FAULT_HWPOISON
> mm_fault_error
> do_sigbus
> // kill the process
>
>
> The hugetlb allocation failure comes from the following flow:
>
> CPU 0 CPU 1
>
> mmap a hugetlb file
> // reserve all free page but don't fault-in
> soft offline
> get_any_page
> // find the hugetlb is free
> soft_offline_free_page
> // set hwpoison flag
> dissolve_free_huge_page
> // fail because all free hugepages are reserved
> page fault
> ...
> hugetlb_fault
> hugetlb_no_page
> alloc_huge_page
> ...
> dequeue_huge_page_node_exact
> // ignore hwpoisoned hugepage
> // and finally fail due to no-mem
>
> The root cause of this is that current soft-offline code is written
> based on an assumption that PageHWPoison flag should beset at first to
> avoid accessing the corrupted data. This makes sense for memory_failure()
> or hard offline, but does not for soft offline because soft offline is
> about corrected (not uncorrected) error and is safe from data lost.
> This patch changes soft offline semantics where it sets PageHWPoison flag
> only after containment of the error page completes successfully.
>
> ...
>
> --- v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c
> +++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
> @@ -1598,8 +1598,18 @@ static int soft_offline_huge_page(struct page *page, int flags)
> if (ret > 0)
> ret = -EIO;
> } else {
> - if (PageHuge(page))
> - dissolve_free_huge_page(page);
> + /*
> + * We set PG_hwpoison only when the migration source hugepage
> + * was successfully dissolved, because otherwise hwpoisoned
> + * hugepage remains on free hugepage list, then userspace will
> + * find it as SIGBUS by allocation failure. That's not expected
> + * in soft-offlining.
> + */
This comment is unclear. What happens if there's a hwpoisoned page on
the freelist? The allocator just skips it and looks for another page?
Or does the allocator return the poisoned page, it gets mapped and
userspace gets a SIGBUS when accessing it? If the latter (or the
former!), why does the comment mention allocation failure?
Powered by blists - more mailing lists