lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 16 Nov 2021 10:52:39 +0300
From:   Alexander Popov <>
To:     Gabriele Paoloni <>,
        Lukas Bulwahn <>,
        Robert Krutsch <>
Cc:     Linus Torvalds <>,
        Jonathan Corbet <>,
        Paul McKenney <>,
        Andrew Morton <>,
        Thomas Gleixner <>,
        Peter Zijlstra <>,
        Joerg Roedel <>,
        Maciej Rozycki <>,
        Muchun Song <>,
        Viresh Kumar <>,
        Robin Murphy <>,
        Randy Dunlap <>,
        Lu Baolu <>,
        Petr Mladek <>,
        Kees Cook <>,
        Luis Chamberlain <>, Wei Liu <>,
        John Ogness <>,
        Andy Shevchenko <>,
        Alexey Kardashevskiy <>,
        Christophe Leroy <>,
        Jann Horn <>,
        Greg Kroah-Hartman <>,
        Mark Rutland <>,
        Andy Lutomirski <>,
        Dave Hansen <>,
        Steven Rostedt <>,
        Will Deacon <>,
        Ard Biesheuvel <>,
        Laura Abbott <>,
        David S Miller <>,
        Borislav Petkov <>, Arnd Bergmann <>,
        Andrew Scull <>,
        Marc Zyngier <>, Jessica Yu <>,
        Iurii Zaikin <>,
        Rasmus Villemoes <>,
        Wang Qing <>, Mel Gorman <>,
        Mauro Carvalho Chehab <>,
        Andrew Klychkov <>,
        Mathieu Chouquet-Stringer <>,
        Daniel Borkmann <>,
        Stephen Kitt <>, Stephen Boyd <>,
        Thomas Bogendoerfer <>,
        Mike Rapoport <>,
        Bjorn Andersson <>,
        Kernel Hardening <>,,
        "open list:DOCUMENTATION" <>,
        linux-arch <>,
        Linux Kernel Mailing List <>,
        linux-fsdevel <>,,,,, Shuah Khan <>
Subject: Re: [ELISA Safety Architecture WG] [PATCH v2 0/2] Introduce the
 pkill_on_warn parameter

On 15.11.2021 18:51, Gabriele Paoloni wrote:
> On 15/11/2021 14:59, Lukas Bulwahn wrote:
>> On Sat, Nov 13, 2021 at 7:14 PM Alexander Popov <> wrote:
>>> On 13.11.2021 00:26, Linus Torvalds wrote:
>>>> On Fri, Nov 12, 2021 at 10:52 AM Alexander Popov <> wrote:
>>>>> Hello everyone!
>>>>> Friendly ping for your feedback.
>>>> I still haven't heard a compelling _reason_ for this all, and why
>>>> anybody should ever use this or care?
>>> Ok, to sum up:
>>> Killing the process that hit a kernel warning complies with the Fail-Fast
>>> principle [1]. pkill_on_warn sysctl allows the kernel to stop the process when
>>> the **first signs** of wrong behavior are detected.
>>> By default, the Linux kernel ignores a warning and proceeds the execution from
>>> the flawed state. That is opposite to the Fail-Fast principle.
>>> A kernel warning may be followed by memory corruption or other negative effects,
>>> like in CVE-2019-18683 exploit [2] or many other cases detected by the SyzScope
>>> project [3]. pkill_on_warn would prevent the system from the errors going after
>>> a warning in the process context.
>>> At the same time, pkill_on_warn does not kill the entire system like
>>> panic_on_warn. That is the middle way of handling kernel warnings.
>>> Linus, it's similar to your BUG_ON() policy [4]. The process hitting BUG_ON() is
>>> killed, and the system proceeds to work. pkill_on_warn just brings a similar
>>> policy to WARN_ON() handling.
>>> I believe that many Linux distros (which don't hit WARN_ON() here and there)
>>> will enable pkill_on_warn because it's reasonable from the safety and security
>>> points of view.
>>> And I'm sure that the ELISA project by the Linux Foundation (Enabling Linux In
>>> Safety Applications [5]) would support the pkill_on_warn sysctl.
>>> [Adding people from this project to CC]
>>> I hope that I managed to show the rationale.
>> Alex, officially and formally, I cannot talk for the ELISA project
>> (Enabling Linux In Safety Applications) by the Linux Foundation and I
>> do not think there is anyone that can confidently do so on such a
>> detailed technical aspect that you are raising here, and as the
>> various participants in the ELISA Project have not really agreed on
>> such a technical aspect being one way or the other and I would not see
>> that happening quickly. However, I have spent quite some years on the
>> topic on "what is the right and important topics for using Linux in
>> safety applications"; so here are my five cents:
>> One of the general assumptions about safety applications and safety
>> systems is that the malfunction of a function within a system is more
>> critical, i.e., more likely to cause harm to people, directly or
>> indirectly, than the unavailability of the system. So, before
>> "something potentially unexpected happens"---which can have arbitrary
>> effects and hence effects difficult to foresee and control---, it is
>> better to just shutdown/silence the system, i.e., design a fail-safe
>> or fail-silent system, as the effect of shutdown is pretty easily
>> foreseeable during the overall system design and you could think about
>> what the overall system does, when the kernel crashes the usual way.
>> So, that brings us to what a user would expect from the kernel in a
>> safety-critical system: Shutdown on any event that is unexpected.
>> Here, I currently see panic_on_warn as the closest existing feature to
>> indicate any event that is unexpected and to shutdown the system. That
>> requires two things for the kernel development:
>> 1. Allow a reasonably configured kernel to boot and run with
>> panic_on_warn set. Warnings should only be raised when something is
>> not configured as the developers expect it or the kernel is put into a
>> state that generally is _unexpected_ and has been exposed little to
>> the critical thought of the developer, to testing efforts and use in
>> other systems in the wild. Warnings should not be used for something
>> informative, which still allows the kernel to continue running in a
>> proper way in a generally expected environment. Up to my knowledge,
>> there are some kernels in production that run with panic_on_warn; so,
>> IMHO, this requirement is generally accepted (we might of course
>> discuss the one or other use of warn) and is not too much to ask for.
>> 2. Really ensure that the system shuts down when it hits warn and
>> panic. That requires that the execution path for warn() and panic() is
>> not overly complicated (stuffed with various bells and whistles).
>> Otherwise, warn() and panic() could fail in various complex ways and
>> potentially keep the system running, although it should be shut down.
>> Some people in the ELISA Project looked a bit into why they believe
>> panic() shuts down a system but I have not seen a good system analysis
>> and argument why any third person could be convinced that panic()
>> works under all circumstances where it is invoked or that at least,
>> the circumstances under which panic really works is properly
>> documented. That is a central aspect for using Linux in a
>> reasonably-designed safety-critical system. That is possibly also
>> relevant for security, as you might see an attacker obtain information
>> because it was possible to "block" the kernel shutting down after
>> invoking panic() and hence, the attacker could obtain certain
>> information that was only possible because 1. the system got into an
>> inconsistent state, 2. it was detected by some check leading to warn()
>> or panic(), and 3. the system's security engineers assumed that the
>> system must have been shutting down at that point, as panic() was
>> invoked, and hence, this would be disallowing a lot of further
>> operations or some specific operations that the attacker would need to
>> trigger in that inconsistent state to obtain information.
>> To your feature, Alex, I do not see the need to have any refined
>> handling of killing a specific process when the kernel warns; stopping
>> the whole system is the better and more predictable thing to do. I
>> would prefer if systems, which have those high-integrity requirements,
>> e.g., in a highly secure---where stopping any unintended information
>> flow matters more than availability---or in fail-silent environments
>> in safety systems, can use panic_on_warn. That should address your
>> concern above of handling certain CVEs as well.
>> In summary, I am not supporting pkill_on_warn. I would support the
>> other points I mentioned above, i.e., a good enforced policy for use
>> of warn() and any investigation to understand the complexity of
>> panic() and reducing its complexity if triggered by such an
>> investigation.
> Hi Alex
> I also agree with the summary that Lukas gave here. From my experience
> the safety system are always guarded by an external flow monitor (e.g. a
> watchdog) that triggers in case the safety relevant workloads slows down
> or block (for any reason); given this condition of use, a system that
> goes into the panic state is always safe, since the watchdog would
> trigger and drive the system automatically into safe state.
> So I also don't see a clear advantage of having pkill_on_warn();
> actually on the flip side it seems to me that such feature could
> introduce more risk, as it kills only the threads of the process that
> caused the kernel warning whereas the other processes are trusted to
> run on a weaker Kernel (does killing the threads of the process that
> caused the kernel warning always fix the Kernel condition that lead to
> the warning?)

Lukas, Gabriele, Robert,
Thanks for showing this from the safety point of view.

The part about believing in panic() functionality is amazing :)
Yes, safety critical systems depend on the robust ability to restart.

Best regards,

Powered by blists - more mailing lists