[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5016e528-8ea9-7597-3420-086ae57f3d9d@oracle.com>
Date: Fri, 20 Oct 2017 10:49:46 -0700
From: Mike Kravetz <mike.kravetz@...cle.com>
To: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
Cc: "linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Michal Hocko <mhocko@...nel.org>,
Aneesh Kumar <aneesh.kumar@...ux.vnet.ibm.com>,
Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
On 10/19/2017 07:30 PM, Naoya Horiguchi wrote:
> On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
>
> Thank you for addressing this. The patch itself looks good to me, but
> the reported issue (negative reserve count) doesn't reproduce in my trial
> with v4.14-rc5, so could you share the exact procedure for this issue?
Sure, but first one question on your test scenario below.
>
> When error handler runs over a huge page, the reserve count is incremented
> so I'm not sure why the reserve count goes negative.
I'm not sure I follow. What specific code is incrementing the reserve
count?
> My operation is like below:
>
> $ sysctl vm.nr_hugepages=10
> $ grep HugePages_ /proc/meminfo
> HugePages_Total: 10
> HugePages_Free: 10
> HugePages_Rsvd: 0
> HugePages_Surp: 0
> $ ./test_alloc_generic -B hugetlb_file -N1 -L "mmap access memory_error_injection:error_type=madv_hard" // allocate a 2MB file on hugetlbfs, then madvise(MADV_HWPOISON) on it.
> $ grep HugePages_ /proc/meminfo
> HugePages_Total: 10
> HugePages_Free: 9
> HugePages_Rsvd: 1 // reserve count is incremented
> HugePages_Surp: 0
This is confusing to me. I can not create a test where there is a reserve
count after poisoning page.
I tried to recreate your test. Running unmodified 4.14.0-rc5.
Before test
-----------
HugePages_Total: 1
HugePages_Free: 1
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
After open(creat) and mmap of 2MB hugetlbfs file
------------------------------------------------
HugePages_Total: 1
HugePages_Free: 1
HugePages_Rsvd: 1
HugePages_Surp: 0
Hugepagesize: 2048 kB
Reserve count is 1 as expected/normal
After madvise(MADV_HWPOISON) of the single huge page in mapping/file
--------------------------------------------------------------------
HugePages_Total: 1
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
In this case, the reserve (and free) count were decremented. Note that
before the poison operation the page was not associated with the mapping/
file. I did not look closely at the code, but assume the madvise may
cause the page to be 'faulted in'.
The counts remain the same when the program exits
-------------------------------------------------
HugePages_Total: 1
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Remove the file (rm /var/opt/oracle/hugepool/foo)
-------------------------------------------------
HugePages_Total: 1
HugePages_Free: 0
HugePages_Rsvd: 18446744073709551615
HugePages_Surp: 0
Hugepagesize: 2048 kB
I am still confused about how your test maintains a reserve count after
poisoning. It may be a good idea for you to test my patch with your
test scenario as I can not recreate here.
--
Mike Kravetz
Powered by blists - more mailing lists