linux-kernel - Re: [PATCH RFC 1/2] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALu+AoThhou7z+JCyv44AxGWDLDt2b7h0W6wcKRsJyLvSR1iQA@mail.gmail.com>
Date:   Fri, 26 Aug 2022 09:43:54 +0800
From:   Dave Young <dyoung@...hat.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     John Hubbard <jhubbard@...dia.com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, linux-doc@...r.kernel.org,
        kexec@...ts.infradead.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>,
        David Laight <David.Laight@...lab.com>,
        Jonathan Corbet <corbet@....net>,
        Andy Whitcroft <apw@...onical.com>,
        Joe Perches <joe@...ches.com>,
        Dwaipayan Ray <dwaipayanray1@...il.com>,
        Lukas Bulwahn <lukas.bulwahn@...il.com>,
        Baoquan He <bhe@...hat.com>, Vivek Goyal <vgoyal@...hat.com>,
        Stephen Johnston <sjohnsto@...hat.com>,
        Prarit Bhargava <prarit@...hat.com>
Subject: Re: [PATCH RFC 1/2] coding-style.rst: document BUG() and WARN() rules
 ("do not crash the kernel")

Hi David,

[Added more people in cc]

On Thu, 25 Aug 2022 at 20:13, David Hildenbrand <david@...hat.com> wrote:
>
> On 24.08.22 23:59, John Hubbard wrote:
> > On 8/24/22 09:30, David Hildenbrand wrote:
> >> diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
> >> index 03eb53fd029a..a6d81ff578fe 100644
> >> --- a/Documentation/process/coding-style.rst
> >> +++ b/Documentation/process/coding-style.rst
> >> @@ -1186,6 +1186,33 @@ expression used.  For instance:
> >>      #endif /* CONFIG_SOMETHING */
> >>
> >
> > I like the idea of adding this documentation, and this is the right
> > place. Naturally, if one likes something, one must immediately change
> > it. :) Therefore, here is an alternative writeup that I think captures
> > what you and the email threads were saying.
> >
> > How's this sound?
>
> Much better, thanks! :)
>
> >
> > diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
> > index 03eb53fd029a..32df0d503388 100644
> > --- a/Documentation/process/coding-style.rst
> > +++ b/Documentation/process/coding-style.rst
> > @@ -1185,6 +1185,53 @@ expression used.  For instance:
> >         ...
> >         #endif /* CONFIG_SOMETHING */
> >
> > +22) Do not crash the kernel
> > +---------------------------
> > +
> > +Use WARN() rather than BUG()
> > +****************************
> > +
> > +Do not add new code that uses any of the BUG() variants, such as BUG(),
> > +BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably
> > +WARN_ON_ONCE(), and possibly with recovery code. Recovery code is not required
> > +if there is no reasonable way to at least partially recover.
>
> I'll tend to keep in this section:
>
> "Unavoidable data corruption / security issues might be a very rare
> exception to this rule and need good justification."
>
> Because there are rare exceptions, and I'd much rather document the
> clear exception to this rule.
>
> > +
> > +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
> > +**************************************************
> > +
> > +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it is
> > +common for a given warning condition, if it occurs at all, to occur multiple
> > +times. (For example, once per file, or once per struct page.) This can fill up
>
> I'll drop the "For example" part. I feel like this doesn't really need
> an example -- most probably we've all been there already when the kernel
> log was flooded :)
>
> > +and wrap the kernel log, and can even slow the system enough that the excessive
> > +logging turns into its own, additional problem.
> > +
> > +Do not WARN lightly
> > +*******************
> > +
> > +WARN*() is intended for unexpected, this-should-never-happen situations. WARN*()
> > +macros are not to be used for anything that is expected to happen during normal
> > +operation. These are not pre- or post-condition asserts, for example. Again:
> > +WARN*() must not be used for a condition that is expected to trigger easily, for
> > +example, by user space actions. pr_warn_once() is a possible alternative, if you
> > +need to notify the user of a problem.
> > +
> > +Do not worry about panic_on_warn users
> > +**************************************
> > +
> > +A few more words about panic_on_warn: Remember that ``panic_on_warn`` is an
> > +available kernel option, and that many users set this option. This is why there
> > +is a "Do not WARN lightly" writeup, above. However, the existence of
> > +panic_on_warn users is not a valid reason to avoid the judicious use WARN*().
> > +That is because, whoever enables panic_on_warn has explicitly asked the kernel
> > +to crash if a WARN*() fires, and such users must be prepared to deal with the
> > +consequences of a system that is somewhat more likely to crash.
>
> Side note: especially with kdump() I feel like we might see much more
> widespread use of panic_on_warn to be able to actually extract debug
> information in a controlled manner -- for example on enterprise distros.
> ... which would then make these systems more likely to crash, because
> there is no way to distinguish a rather harmless warning from a severe
> warning :/ . But let's see if some kdump() folks will share their
> opinion as reply to the cover letter.

I can understand the intention of this patch, and I totally agree that
BUG() should be used carefully, this is a good proposal if we can
clearly define the standard about when to use BUG().  But I do have
some worries,  I think this standard is different for different sub
components, it is not clear to me at least,  so this may introduce an
unstable running kernel and cause troubles (eg. data corruption) with
a WARN instead of a BUG. Probably it would be better to say "Do not
WARN lightly, and do not hesitate to use BUG if it is really needed"?

About "patch_on_warn", it will depend on the admin/end user to set it,
it is not a good idea for distribution to set it. It seems we are
leaving it to end users to take the risk of a kernel panic even with
all kernel WARN even if it is sometimes not necessary.


>
> --
> Thanks,
>
> David / dhildenb
>

Thanks
Dave