[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87d1ai4riv.fsf@e105922-lin.cambridge.arm.com>
Date: Mon, 05 Jun 2017 12:12:40 +0100
From: Punit Agrawal <punit.agrawal@....com>
To: Will Deacon <will.deacon@....com>
Cc: Yury Norov <ynorov@...iumnetworks.com>,
<akpm@...ux-foundation.org>, <linux-kernel@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>, <steve.capper@....com>
Subject: Re: arm64: segfaults on next-20170602 with LTP tests
Punit Agrawal <punit.agrawal@....com> writes:
> Will Deacon <will.deacon@....com> writes:
>
>> On Fri, Jun 02, 2017 at 05:42:04PM +0300, Yury Norov wrote:
>>> On Fri, Jun 02, 2017 at 03:37:51PM +0300, Yury Norov wrote:
>>> > On Fri, Jun 02, 2017 at 01:19:18PM +0100, Will Deacon wrote:
>>> > > Hi Yury,
>>> > >
>>> > > [adding Steve and Punit]
>>> > >
>>> > > On Fri, Jun 02, 2017 at 02:11:51PM +0300, Yury Norov wrote:
>>> > > > I see that latest and yesterday's linux-next segfaults with tests pth_str01,
>>> > > > pth_str03, rwtest04. Rwtest04 hangs sometimes. Crashes are not always
>>> > > > reproducible. About week ago everything was fine. Kernel log and config file
>>> > > > are attached. The testing is performed on qemu.
>>> > >
>>> > > It's weird that these haven't cropped up in our nightly tests, especially
>>> > > given that defconfig is very similar to the one you're using. That said,
>>> > > I see huge pmds cropping up in the traces below and there have been some
>>> > > recent changes from Punit and Steve in that area, in particular things
>>> > > like 55f379263bcc ("mm, gup: ensure real head page is ref-counted when using
>>> > > hugepages").
>>> > >
>>> > > Are you in a position to bisect this, or is it too fiddly to reproduce?
>>>
>>> I have bisected the bug to exactly this patch. If I revert it, the
>>> pth_str01/03 are passed.
>>
>> Thanks for doing that: I had my suspicion ;) I can also reproduce the
>> failure locally on my Juno.
>>
>> Punit -- please can you investigate this? Otherwise I think we have to
>> revert this for now and bring it back after some better testing.
>
> Apologies for the breakage. It looks like anonymous hugepages are not
> happy with the change. Let me dig into this.
I've found the issue - in certain scenarios the ref-count was being
taken on a page following the one of interest. The below fixup (on top
of the patches in next) makes the failures with pth_str0[1|3] go away
for me.
I'll send an updated patch for Andrew to pick up shortly.
Thanks,
Punit
---------------->8------------------------
commit ee56e197cd8f3f782e32219f828e04ef0145a1aa
Author: Punit Agrawal <punit.agrawal@....com>
Date: Mon Jun 5 12:03:55 2017 +0100
fixup! mm, gup: Ensure real head page is ref-counted when using hugepages
diff --git a/mm/gup.c b/mm/gup.c
index 7d730e4188ce..6bd39264d0e7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1362,7 +1362,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
- head = compound_head(page);
+ head = compound_head(pmd_page(orig));
if (!page_cache_add_speculative(head, refs)) {
*nr -= refs;
return 0;
@@ -1400,7 +1400,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
- head = compound_head(page);
+ head = compound_head(pud_page(orig));
if (!page_cache_add_speculative(head, refs)) {
*nr -= refs;
return 0;
@@ -1437,7 +1437,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
- head = compound_head(page);
+ head = compound_head(pgd_page(orig));
if (!page_cache_add_speculative(head, refs)) {
*nr -= refs;
return 0;
Powered by blists - more mailing lists