[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.11.1406201438590.8689@eggly.anvils>
Date:	Fri, 20 Jun 2014 14:42:25 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	Naoya Horiguchi <n-horiguchi@...jp.nec.com>
cc:	Hugh Dickins <hughd@...gle.com>, Christoph Lameter <cl@...two.org>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Naoya Horiguchi <nao.horiguchi@...il.com>
Subject: Re: kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! on v3.16-rc1
On Fri, 20 Jun 2014, Naoya Horiguchi wrote:
> On Fri, Jun 20, 2014 at 01:03:58PM -0700, Hugh Dickins wrote:
> > On Fri, 20 Jun 2014, Naoya Horiguchi wrote:
> > > On Fri, Jun 20, 2014 at 09:24:36AM -0500, Christoph Lameter wrote:
> > > > On Thu, 19 Jun 2014, Naoya Horiguchi wrote:
> > > >
> > > > > I'm suspecting that mbind_range() do something wrong around vma handling,
> > > > > but I don't have enough luck yet. Anyone has an idea?
> > > >
> > > > Well memory policy data corrupted. This looks like you were trying to do
> > > > page migration via mbind()?
> > > 
> > > Right.
> > > 
> > > > Could we get some more details as to what is
> > > > going on here? Specifically the parameters passed to mbind would be
> > > > interesting.
> > > 
> > > My view about the kernel behavior was in another email a few hours ago.
> > > And as for what userspace did, I attach the reproducer below. It's simply
> > > doing mbind(mode=MPOL_BIND, flags=MPOL_MF_MOVE_ALL) on random address/length/node.
> > 
> > Thanks for the additional information earlier.  ext4, so no shmem
> > shared mempolicy involved: that cuts down the bugspace considerably.
> > 
> > I agree from what you said that it looked like corrupt vm_area_struct
> > and hence corrupt policy.
> > 
> > Here's an obvious patch to try, entirely untested - thanks for the
> > reproducer, but I'd rather leave the testing to you.  Sounds like
> > you have a useful fuzzer there: good catch.
> > 
> > 
> > [PATCH] mm: fix crashes from mbind() merging vmas
> > 
> > v2.6.34's 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") introduced
> > vma merging to mbind(), but it should have also changed the convention
> > of passing start vma from queue_pages_range() (formerly check_range())
> > to new_vma_page(): vma merging may have already freed that structure,
> > resulting in BUG at mm/mempolicy.c:1738 and probably worse crashes.
> > 
> > Fixes: 9d8cebd4bcd7 ("mm: fix mbind vma merge problem")
> > Reported-by: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
> > Signed-off-by: Hugh Dickins <hughd@...gle.com>
> > Cc: stable@...r.kernel.org # 2.6.34+
> 
> With your patch, the bug doesn't reproduce in one hour testing. I think
> it's long enough because it took only a few minutes until the reproducer
> triggered the bug without your patch.
> So I think the problem was gone, thank you very much!
> 
> Tested-by: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
& Acked-by: Christoph Lameter <cl@...ux.com>
Great, thank you both.  I was afraid you might hit something else
immediately.  I'll send the patch to Andrew later - unless it
magically appears in his tree before.
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
