linux-kernel - Re: v4.6 kernel BUG at mm/rmap.c:1101!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160524145332.GF1789@lahna.fi.intel.com>
Date:	Tue, 24 May 2016 17:53:32 +0300
From:	Mika Westerberg <mika.westerberg@...ux.intel.com>
To:	Andrea Arcangeli <aarcange@...hat.com>
Cc:	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	linux-kernel@...r.kernel.org
Subject: Re: v4.6 kernel BUG at mm/rmap.c:1101!

On Tue, May 24, 2016 at 04:08:09PM +0200, Andrea Arcangeli wrote:
> On Tue, May 24, 2016 at 11:12:23AM +0300, Mika Westerberg wrote:
> > Hmm, the kernel shipped with Fedora 23 has that enabled:
> > 
> > lahna % grep CONFIG_DEBUG_VM /boot/config-4.4.9-300.fc23.x86_64 
> > CONFIG_DEBUG_VM=y
> > # CONFIG_DEBUG_VM_VMACACHE is not set
> > # CONFIG_DEBUG_VM_RB is not set
> 
> Yes, it would have been more accurate to say "enterprise", not just
> "production".

Fair enough.

> It's great to run Fedora with CONFIG_DEBUG_VM=y and I'd recommend to
> keep it that way, so it contributes to stronger runtime validation of
> the VM invariants.
>
> I keep CONFIG_DEBUG_VM=y on all my systems too of course.
> 
> Also note the RHEL debug kernel has CONFIG_DEBUG_VM=y also enabled,
> but only the debug kernel.
> 
> In general while testing new kernels with new VM modifications it's
> good idea to set CONFIG_DEBUG_VM=y, if you can afford the occasional
> false positive like in this case and it's not an enterprise production
> kernel, where clearly all testing should have already happened before
> that become "enterprise" ready in the first place, so we can save a
> few cycles.
> 
> Lately we got VM_WARN_ON too and I added to my tree recently:
> 
> +#define VM_WARN_ON_PAGE(cond, page) \
> +       do { \
> +               if (unlikely(cond)) { \
> +                       dump_page(page, "VM_WARN_ON_PAGE(" __stringify(cond)")");\
> +                       __WARN(); \
> +               } \
> +       } while (0)
> 
> So we could convert some... to reduce the pain of a false positive,
> but in cases like the one that triggered I'm not sure it'd be good
> idea to switch it to a WARN_ON as it may be a sign of memory
> corruption if the assert fails (after the patch) and keeping going
> after memory corruption can actually do more harm than good.
> 
> One thing to keep =n however is CONFIG_DEBUG_VM_RB=n, that one is
> expensive and that's why it has its own separate knob to be able to
> disable it while keeping CONFIG_DEBUG_VM=y. IIRC I kept originally
> under #if 0... so I wouldn't recommend to enable VM_RB on production
> (it's too much overhead), that's a nice validation but for development
> only.

Understood. Thanks for the thorough explanation :)