linux-kernel - Re: [PATCH] x86: panic when a kernel stack overflow is detected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20190729103458.GZ31381@hirez.programming.kicks-ass.net>
Date:   Mon, 29 Jul 2019 12:34:58 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Daniel Axtens <dja@...ens.net>,
        kasan-dev <kasan-dev@...glegroups.com>, X86 ML <x86@...nel.org>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>,
        Alexander Potapenko <glider@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Marco Elver <elver@...gle.com>
Subject: Re: [PATCH] x86: panic when a kernel stack overflow is detected

On Sun, Jul 28, 2019 at 08:53:58PM -0700, Andy Lutomirski wrote:
> On Sun, Jul 28, 2019 at 6:59 PM Daniel Axtens <dja@...ens.net> wrote:
> >
> > Currently, when a kernel stack overflow is detected via VMAP_STACK,
> > the task is killed with die().
> >
> > This isn't safe, because we don't know how that process has affected
> > kernel state. In particular, we don't know what locks have been taken.
> > For example, we can hit a case with lkdtm where a thread takes a
> > stack overflow in printk() after taking the logbuf_lock. In that case,
> > we deadlock when the kernel next does a printk.
> >
> > Do not attempt to kill the process when a kernel stack overflow is
> > detected. The system state is unknown, the only safe thing to do is
> > panic(). (panic() also prints without taking locks so a useful debug
> > splat is printed even when logbuf_lock is held.)
> 
> The thing I don't like about this is that it reduces the chance that
> we successfully log anything to disk.
> 
> PeterZ, do you have any useful input here?  I wonder if we could do
> something like printk_oh_crap() that is just printk() except that it
> panics if it fails to return after a few seconds.

People are already had at work rewriting printk. The current thing is
unfixable.  Then again, I don't know if there's any sane options aside
of early serial.

Still, mucking with printk won't help you at all if the task is holding
some other/filesystem lock required to do that writeback.