[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20171024004605.GA19663@hori1.linux.bs1.fc.nec.co.jp>
Date: Tue, 24 Oct 2017 00:46:05 +0000
From: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To: Mike Kravetz <mike.kravetz@...cle.com>
CC: "linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Michal Hocko <mhocko@...nel.org>,
"Aneesh Kumar" <aneesh.kumar@...ux.vnet.ibm.com>,
Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
On Mon, Oct 23, 2017 at 11:20:02AM -0700, Mike Kravetz wrote:
> On 10/23/2017 12:32 AM, Naoya Horiguchi wrote:
> > On Fri, Oct 20, 2017 at 10:49:46AM -0700, Mike Kravetz wrote:
> >> On 10/19/2017 07:30 PM, Naoya Horiguchi wrote:
> >>> On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
> >>>
> >>> Thank you for addressing this. The patch itself looks good to me, but
> >>> the reported issue (negative reserve count) doesn't reproduce in my trial
> >>> with v4.14-rc5, so could you share the exact procedure for this issue?
> >>
> >> Sure, but first one question on your test scenario below.
> >>
> >>>
> >>> When error handler runs over a huge page, the reserve count is incremented
> >>> so I'm not sure why the reserve count goes negative.
> >>
> >> I'm not sure I follow. What specific code is incrementing the reserve
> >> count?
> >
> > The call path is like below:
> >
> > hugetlbfs_error_remove_page
> > hugetlb_fix_reserve_counts
> > hugepage_subpool_get_pages(spool, 1)
> > hugetlb_acct_memory(h, 1);
> > gather_surplus_pages
> > h->resv_huge_pages += delta;
> >
>
> Ah OK. This is a result of call to hugetlb_fix_reserve_counts which
> I believe is incorrect in most instances, and is unlikely to happen
> with my patch.
>
> >>
> >> Remove the file (rm /var/opt/oracle/hugepool/foo)
> >> -------------------------------------------------
> >> HugePages_Total: 1
> >> HugePages_Free: 0
> >> HugePages_Rsvd: 18446744073709551615
> >> HugePages_Surp: 0
> >> Hugepagesize: 2048 kB
> >>
> >> I am still confused about how your test maintains a reserve count after
> >> poisoning. It may be a good idea for you to test my patch with your
> >> test scenario as I can not recreate here.
> >
> > Interestingly, I found that this reproduces if all hugetlb pages are
> > reserved when poisoning.
> > Your testing meets the condition, and mine doesn't.
> >
> > In gather_surplus_pages() we determine whether we extend hugetlb pool
> > with surplus pages like below:
> >
> > needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
> > if (needed <= 0) {
> > h->resv_huge_pages += delta;
> > return 0;
> > }
> > ...
> >
> > needed is 1 if h->resv_huge_pages == h->free_huge_pages, and then
> > the reserve count gets inconsistent.
> > I confirmed that your patch fixes the issue, so I'm OK with it.
>
> Thanks. That now makes sense to me.
>
> hugetlb_fix_reserve_counts (which results in gather_surplus_pages being
> called), is only designed to be called in the extremely rare cases when
> we have free'ed a huge page but are unable to free the reservation entry.
>
> Just curious, when the hugetlb_fix_reserve_counts call was added to
> hugetlbfs_error_remove_page, was the intention to preserve the original
> reservation?
No, the intention was to remove the reservation of the error hugepage
which was unmapped and isolated from normal hugepage's lifecycle.
The error hugepage is not freed back to hugepage pool, but it should be
handled in the same manner as freeing from the perspective of reserve count.
When I was writing commit 78bb920344b8, I experienced some reserve count
mismatch, and wrongly borrowed the code from truncation code.
> I remember thinking hard about that for the hole punch
> case and came to the conclusion that it was easier and less error prone
> to remove the reservation as well. That will also happen in the error
> case with the patch I provided.
Yes, hole punching seems sililar to poisoning except that the final destination
of the target page differs. So we can make the same conclusion here.
Thanks,
Naoya Horiguchi
Powered by blists - more mailing lists