linux-kernel - Re: [PATCH v2] mm/slub: introduce SLAB_WARN_ON

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1548748424.18511.34.camel@mtkswgap22>
Date:   Tue, 29 Jan 2019 15:53:44 +0800
From:   Miles Chen <miles.chen@...iatek.com>
To:     Christopher Lameter <cl@...ux.com>
CC:     Andrew Morton <akpm@...ux-foundation.org>,
        Pekka Enberg <penberg@...nel.org>,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Jonathan Corbet <corbet@....net>, <linux-mm@...ck.org>,
        <linux-kernel@...r.kernel.org>,
        <linux-mediatek@...ts.infradead.org>
Subject: Re: [PATCH v2] mm/slub: introduce SLAB_WARN_ON_ERROR

On Tue, 2019-01-29 at 05:46 +0000, Christopher Lameter wrote:
> On Mon, 28 Jan 2019, Andrew Morton wrote:
> 
> > > When debugging slab errors in slub.c, sometimes we have to trigger
> > > a panic in order to get the coredump file. Add a debug option
> > > SLAB_WARN_ON_ERROR to toggle WARN_ON() when the option is set.
> > >
> > > Change since v1:
> > > 1. Add a special debug option SLAB_WARN_ON_ERROR and toggle WARN_ON()
> > > if it is set.
> > > 2. SLAB_WARN_ON_ERROR can be set by kernel parameter slub_debug.
> > >
> >
> > Hopefully the slab developers will have an opinion on this.
> 
> Debugging slab itself is usually done in kvm or some other virtualized
> environment. Then gdb can be used to set breakpoints. Otherwise one may
> add printks and stuff to the allocators to figure out more or use perf.
> 
> 
> What you are changing here is the debugging for data corruption within
> objects managed by slub or the metadata. Slub currently outputs extensive
> data about the metadata corruption (typically caused by a user of
> slab allocation) which should allow you to set a proper
> breakpoint not in the allocator but in the subsystem where the corruption
> occurs.
> 
Thanks for your comments. The real problems the change can help are:

a) classic slub issue. e.g., use-after-free, redzone overwritten. It's
more efficient to report a issue as soon as slub detects it. (comparing
to monitor the log, set a breakpoint, and re-produce the issue). With
the coredump file, we can analyze the issue.

b) memory corruption issues caused by h/w write. e.g., memory
overwritten by a DMA engine. Memory corruptions may or may not related
to the slab cache that reports any error. For example: kmalloc-256 or
dentry may report the same errors. If we can preserve the the coredump
file without any restore/reset processing in slub, we could have more
information of this memory corruption.

c) memory corruption issues caused by unstable h/w. e.g., bit flipping
because of xxxx DRAM die or applying new power settings. It's hard to
re-produce this kind of issue and it much easier to tell this kind of
issue in the coredump file without any restore/reset processing.

Users can set the option by slub_debug. We can still have the original
behavior(keep the system alive) if the option is not set. We can turn on
the option when we need the coredump file. (with panic_on_warn is set,
of course).