linux-kernel - Re: [PATCH v2 0/2] Introduce the pkill_on

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <59534db5-b251-c0c8-791f-58aca5c00a2b@linux.com>
Date:   Tue, 16 Nov 2021 12:12:16 +0300
From:   Alexander Popov <alex.popov@...ux.com>
To:     Kees Cook <keescook@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Lukas Bulwahn <lukas.bulwahn@...il.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jonathan Corbet <corbet@....net>,
        Paul McKenney <paulmck@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Joerg Roedel <jroedel@...e.de>,
        Maciej Rozycki <macro@...am.me.uk>,
        Muchun Song <songmuchun@...edance.com>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Robin Murphy <robin.murphy@....com>,
        Randy Dunlap <rdunlap@...radead.org>,
        Lu Baolu <baolu.lu@...ux.intel.com>,
        Petr Mladek <pmladek@...e.com>,
        Luis Chamberlain <mcgrof@...nel.org>, Wei Liu <wl@....org>,
        John Ogness <john.ogness@...utronix.de>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Alexey Kardashevskiy <aik@...abs.ru>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        Jann Horn <jannh@...gle.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Mark Rutland <mark.rutland@....com>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Will Deacon <will@...nel.org>,
        Ard Biesheuvel <ardb@...nel.org>,
        Laura Abbott <labbott@...nel.org>,
        David S Miller <davem@...emloft.net>,
        Borislav Petkov <bp@...en8.de>, Arnd Bergmann <arnd@...db.de>,
        Andrew Scull <ascull@...gle.com>,
        Marc Zyngier <maz@...nel.org>, Jessica Yu <jeyu@...nel.org>,
        Iurii Zaikin <yzaikin@...gle.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Wang Qing <wangqing@...o.com>, Mel Gorman <mgorman@...e.de>,
        Mauro Carvalho Chehab <mchehab+huawei@...nel.org>,
        Andrew Klychkov <andrew.a.klychkov@...il.com>,
        Mathieu Chouquet-Stringer <me@...hieu.digital>,
        Daniel Borkmann <daniel@...earbox.net>,
        Stephen Kitt <steve@....org>, Stephen Boyd <sboyd@...nel.org>,
        Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
        Mike Rapoport <rppt@...nel.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Kernel Hardening <kernel-hardening@...ts.openwall.com>,
        linux-hardening@...r.kernel.org,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        linux-arch <linux-arch@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>, notify@...nel.org,
        main@...ts.elisa.tech, safety-architecture@...ts.elisa.tech,
        devel@...ts.elisa.tech, Shuah Khan <shuah@...nel.org>
Subject: Re: [PATCH v2 0/2] Introduce the pkill_on_warn parameter

On 16.11.2021 01:06, Kees Cook wrote:
> Hmm, yes. What it originally boiled down to, which is why Linus first
> objected to BUG(), was that we don't know what other parts of the system
> have been disrupted. The best example is just that of locking: if we
> BUG() or do_exit() in the middle of holding a lock, we'll wreck whatever
> subsystem that was attached to. Without a deterministic system state
> unwinder, there really isn't a "safe" way to just stop a kernel thread.
> 
> With this pkill_on_warn, we avoid the BUG problem (since the thread of
> execution continues and stops at an 'expected' place: the signal
> handler).
> 
> However, now we have the newer objection from Linus, which is one of
> attribution: the WARN might be hit during an "unrelated" thread of
> execution and "current" gets blamed, etc. And beyond that, if we take
> down a portion of userspace, what in userspace may be destabilized? In
> theory, we get a case where any required daemons would be restarted by
> init, but that's not "known".
> 
> The safest version of this I can think of is for processes to opt into
> this mitigation. That would also cover the "special cases" we've seen
> exposed too. i.e. init and kthreads would not opt in.
> 
> However, that's a lot to implement when Marco's tracing suggestion might
> be sufficient and policy could be entirely implemented in userspace. It
> could be as simple as this (totally untested):

I don't think that this userspace warning handling can work as pkill_on_warn.

1. The kernel code execution continues after WARN_ON(), it will not wait some 
userspace daemon that is polling trace events. That's not different from 
ignoring and having all negative effects after WARN_ON().

2. This userspace policy will miss WARN_ON_ONCE(), WARN_ONCE() and 
WARN_TAINT_ONCE() after the first hit.


Oh, wait...
I got a crazy idea that may bring more consistency in the error handling mess.

What if the Linux kernel had a LSM module responsible for error handling policy?
That would require adding LSM hooks to BUG*(), WARN*(), KERN_EMERG, etc.
In such LSM policy we can decide immediately how to react on the kernel error.
We can even decide depending on the subsystem and things like that.

(idea for brainstorming)

Best regards,
Alexander