linux-kernel - Re: [PATCH] exit: Put an upper limit on how often we can oops

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20221109161904.GA10899@openwall.com>
Date:   Wed, 9 Nov 2022 17:19:04 +0100
From:   Solar Designer <solar@...nwall.com>
To:     Kees Cook <keescook@...omium.org>
Cc:     Jann Horn <jannh@...gle.com>, linux-hardening@...r.kernel.org,
        kernel-hardening@...ts.openwall.com,
        Greg KH <gregkh@...uxfoundation.org>,
        Linus Torvalds <torvalds@...uxfoundation.org>,
        Seth Jenkins <sethjenkins@...gle.com>,
        "Eric W . Biederman" <ebiederm@...ssion.com>,
        Andy Lutomirski <luto@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] exit: Put an upper limit on how often we can oops

On Tue, Nov 08, 2022 at 11:38:22AM -0800, Kees Cook wrote:
> On Tue, Nov 08, 2022 at 09:24:40AM -0800, Kees Cook wrote:
> > On Mon, Nov 07, 2022 at 10:48:20PM +0100, Jann Horn wrote:
> > > On Mon, Nov 7, 2022 at 10:15 PM Solar Designer <solar@...nwall.com> wrote:
> > > > On Mon, Nov 07, 2022 at 09:13:17PM +0100, Jann Horn wrote:
> > > > > +oops_limit
> > > > > +==========
> > > > > +
> > > > > +Number of kernel oopses after which the kernel should panic when
> > > > > +``panic_on_oops`` is not set.
> > > >
> > > > Rather than introduce this separate oops_limit, how about making
> > > > panic_on_oops (and maybe all panic_on_*) take the limit value(s) instead
> > > > of being Boolean?  I think this would preserve the current behavior at
> > > > panic_on_oops = 0 and panic_on_oops = 1, but would introduce your
> > > > desired behavior at panic_on_oops = 10000.  We can make 10000 the new
> > > > default.  If a distro overrides panic_on_oops, it probably sets it to 1
> > > > like RHEL does.
> > > >
> > > > Are there distros explicitly setting panic_on_oops to 0?  If so, that
> > > > could be a reason to introduce the separate oops_limit.
> > > >
> > > > I'm not advocating one way or the other - I just felt this should be
> > > > explicitly mentioned and decided on.
> > > 
> > > I think at least internally in the kernel, it probably works better to
> > > keep those two concepts separate? For example, sparc has a function
> > > die_nmi() that uses panic_on_oops to determine whether the system
> > > should panic when a watchdog detects a lockup.
> > 
> > Internally, yes, the kernel should keep "panic_on_oops" to mean "panic
> > _NOW_ on oops?" but I would agree with Solar -- this is a counter as far
> > as userspace is concerned. "Panic on Oops" after 1 oops, 2, oopses, etc.
> > I would like to see this for panic_on_warn too, actually.
> 
> Hm, in looking at this more closely, I think it does make sense as you
> already have it. The count is for the panic_on_oops=0 case, so even in
> userspace, trying to remap that doesn't make a bunch of sense. So, yes,
> let's keep this as-is.

I don't follow your logic there - maybe you got confused?  Yes, as
proposed the count is for panic_on_oops=0, but that's just weird - first
kind of request no panic with panic_on_oops=0, then override that with
oops_limit=10000.  I think it is more natural to request
panic_on_oops=10000 in one step.  Also, I think it is more natural to
preserve panic_on_oops=0's meaning of no panic on Oops.

To me, about the only reason to introduce the override is if we want to
literally override a distro's explicit default of panic_on_oops=0.

Alexander