[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20221109161904.GA10899@openwall.com>
Date: Wed, 9 Nov 2022 17:19:04 +0100
From: Solar Designer <solar@...nwall.com>
To: Kees Cook <keescook@...omium.org>
Cc: Jann Horn <jannh@...gle.com>, linux-hardening@...r.kernel.org,
kernel-hardening@...ts.openwall.com,
Greg KH <gregkh@...uxfoundation.org>,
Linus Torvalds <torvalds@...uxfoundation.org>,
Seth Jenkins <sethjenkins@...gle.com>,
"Eric W . Biederman" <ebiederm@...ssion.com>,
Andy Lutomirski <luto@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] exit: Put an upper limit on how often we can oops
On Tue, Nov 08, 2022 at 11:38:22AM -0800, Kees Cook wrote:
> On Tue, Nov 08, 2022 at 09:24:40AM -0800, Kees Cook wrote:
> > On Mon, Nov 07, 2022 at 10:48:20PM +0100, Jann Horn wrote:
> > > On Mon, Nov 7, 2022 at 10:15 PM Solar Designer <solar@...nwall.com> wrote:
> > > > On Mon, Nov 07, 2022 at 09:13:17PM +0100, Jann Horn wrote:
> > > > > +oops_limit
> > > > > +==========
> > > > > +
> > > > > +Number of kernel oopses after which the kernel should panic when
> > > > > +``panic_on_oops`` is not set.
> > > >
> > > > Rather than introduce this separate oops_limit, how about making
> > > > panic_on_oops (and maybe all panic_on_*) take the limit value(s) instead
> > > > of being Boolean? I think this would preserve the current behavior at
> > > > panic_on_oops = 0 and panic_on_oops = 1, but would introduce your
> > > > desired behavior at panic_on_oops = 10000. We can make 10000 the new
> > > > default. If a distro overrides panic_on_oops, it probably sets it to 1
> > > > like RHEL does.
> > > >
> > > > Are there distros explicitly setting panic_on_oops to 0? If so, that
> > > > could be a reason to introduce the separate oops_limit.
> > > >
> > > > I'm not advocating one way or the other - I just felt this should be
> > > > explicitly mentioned and decided on.
> > >
> > > I think at least internally in the kernel, it probably works better to
> > > keep those two concepts separate? For example, sparc has a function
> > > die_nmi() that uses panic_on_oops to determine whether the system
> > > should panic when a watchdog detects a lockup.
> >
> > Internally, yes, the kernel should keep "panic_on_oops" to mean "panic
> > _NOW_ on oops?" but I would agree with Solar -- this is a counter as far
> > as userspace is concerned. "Panic on Oops" after 1 oops, 2, oopses, etc.
> > I would like to see this for panic_on_warn too, actually.
>
> Hm, in looking at this more closely, I think it does make sense as you
> already have it. The count is for the panic_on_oops=0 case, so even in
> userspace, trying to remap that doesn't make a bunch of sense. So, yes,
> let's keep this as-is.
I don't follow your logic there - maybe you got confused? Yes, as
proposed the count is for panic_on_oops=0, but that's just weird - first
kind of request no panic with panic_on_oops=0, then override that with
oops_limit=10000. I think it is more natural to request
panic_on_oops=10000 in one step. Also, I think it is more natural to
preserve panic_on_oops=0's meaning of no panic on Oops.
To me, about the only reason to introduce the override is if we want to
literally override a distro's explicit default of panic_on_oops=0.
Alexander
Powered by blists - more mailing lists