[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.00.1205261317310.2488@eggly.anvils>
Date: Sat, 26 May 2012 13:26:48 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: Sasha Levin <levinsasha928@...il.com>
cc: Andrew Morton <akpm@...ux-foundation.org>,
viro <viro@...iv.linux.org.uk>, oleg@...hat.com,
"a.p.zijlstra" <a.p.zijlstra@...llo.nl>, mingo <mingo@...nel.org>,
Dave Jones <davej@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>,
Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: mm: kernel BUG at mm/memory.c:1230
On Thu, 24 May 2012, Sasha Levin wrote:
> On Thu, May 24, 2012 at 9:07 PM, Andrew Morton
> <akpm@...ux-foundation.org> wrote:
> > On Thu, 24 May 2012 20:27:34 +0200
> > Sasha Levin <levinsasha928@...il.com> wrote:
> >
> >> Hi all,
> >>
> >> During fuzzing with trinity inside a KVM tools guest, using latest linux-next, I've stumbled on the following:
> >>
> >> [ 2043.098949] ------------[ cut here ]------------
> >> [ 2043.099014] kernel BUG at mm/memory.c:1230!
> >
> > That's
> >
> > VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
> >
> > in zap_pmd_range()?
>
> Yup.
>
> > The assertion was added in Jan 2011 by 14d1a55cd26f1860 ("thp: add
> > debug checks for mapcount related invariants"). AFAICT it's just wrong
> > on the exit path. Unclear why it's triggering now...
I've been round this loop before with that particular VM_BUG_ON.
At first I thought like Andrew, that it's glaringly wrong on the exit
path; but then changed my mind.
When munmapping, we certainly can arrive here with an unaligned addr
and next; but in that case rwsem_is_locked.
Whereas in exiting, rwsem is not locked, but we're going linearly upwards,
and whenever we walk into a pmd_trans_huge area, both addr and next should
be hpage aligned: the vma bounds are unsuited to THP if they're unaligned.
Other cases equally should not arise: madvise MADV_DONTNEED should
have rwsem_is_locked; and truncation or hole-punching shouldn't be
possible on a pure-anonymous (!vma->vm_ops) area considered for THP.
But I cannot remember what brought me here before: a crash in testing
on one of my machines, which further investigation root-caused elsewhere?
or a report from someone else? or noticed when auditing another problem?
I'm frustrated not to recall.
>
> I'm not sure if that's indeed the issue or not, but note that this is
> the first time I've managed to trigger that with the fuzzer, and it's
> not that easy to reproduce. Which is a bit odd for code that was there
> for 4 months...
I'm keeping off the linux-next for the moment; I'll worry about this
more if it shows up when we try 3.5-rc1. Your fuzzing tells that my
logic above is wrong, but maybe it's just a passing defect in next.
Hugh
Powered by blists - more mailing lists