linux-kernel - Re: mm: kernel BUG at mm/memory.c:1230

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LSU.2.00.1205261317310.2488@eggly.anvils>
Date:	Sat, 26 May 2012 13:26:48 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	Sasha Levin <levinsasha928@...il.com>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	viro <viro@...iv.linux.org.uk>, oleg@...hat.com,
	"a.p.zijlstra" <a.p.zijlstra@...llo.nl>, mingo <mingo@...nel.org>,
	Dave Jones <davej@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: mm: kernel BUG at mm/memory.c:1230

On Thu, 24 May 2012, Sasha Levin wrote:
> On Thu, May 24, 2012 at 9:07 PM, Andrew Morton
> <akpm@...ux-foundation.org> wrote:
> > On Thu, 24 May 2012 20:27:34 +0200
> > Sasha Levin <levinsasha928@...il.com> wrote:
> >
> >> Hi all,
> >>
> >> During fuzzing with trinity inside a KVM tools guest, using latest linux-next, I've stumbled on the following:
> >>
> >> [ 2043.098949] ------------[ cut here ]------------
> >> [ 2043.099014] kernel BUG at mm/memory.c:1230!
> >
> > That's
> >
> >        VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
> >
> > in zap_pmd_range()?
> 
> Yup.
> 
> > The assertion was added in Jan 2011 by 14d1a55cd26f1860 ("thp: add
> > debug checks for mapcount related invariants").  AFAICT it's just wrong
> > on the exit path.  Unclear why it's triggering now...

I've been round this loop before with that particular VM_BUG_ON.

At first I thought like Andrew, that it's glaringly wrong on the exit
path; but then changed my mind.

When munmapping, we certainly can arrive here with an unaligned addr
and next; but in that case rwsem_is_locked.

Whereas in exiting, rwsem is not locked, but we're going linearly upwards,
and whenever we walk into a pmd_trans_huge area, both addr and next should
be hpage aligned: the vma bounds are unsuited to THP if they're unaligned.

Other cases equally should not arise: madvise MADV_DONTNEED should
have rwsem_is_locked; and truncation or hole-punching shouldn't be
possible on a pure-anonymous (!vma->vm_ops) area considered for THP.

But I cannot remember what brought me here before: a crash in testing
on one of my machines, which further investigation root-caused elsewhere?
or a report from someone else? or noticed when auditing another problem?
I'm frustrated not to recall.

> 
> I'm not sure if that's indeed the issue or not, but note that this is
> the first time I've managed to trigger that with the fuzzer, and it's
> not that easy to reproduce. Which is a bit odd for code that was there
> for 4 months...

I'm keeping off the linux-next for the moment; I'll worry about this
more if it shows up when we try 3.5-rc1.  Your fuzzing tells that my
logic above is wrong, but maybe it's just a passing defect in next.

Hugh