[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250430084852.GN4198@noisy.programming.kicks-ass.net>
Date: Wed, 30 Apr 2025 10:48:52 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Carlos Bilbao <bilbao@...edu>
Cc: Andrew Morton <akpm@...ux-foundation.org>, carlos.bilbao@...nel.org,
tglx@...utronix.de, seanjc@...gle.com, jan.glauber@...il.com,
pmladek@...e.com, jani.nikula@...el.com,
linux-kernel@...r.kernel.org, gregkh@...uxfoundation.org,
takakura@...inux.co.jp, john.ogness@...utronix.de, x86@...nel.org
Subject: Re: [PATCH v3 0/2] Reduce CPU consumption after panic
On Tue, Apr 29, 2025 at 03:52:05PM -0500, Carlos Bilbao wrote:
> Hello,
>
> On 4/29/25 17:10, Peter Zijlstra wrote:
> > On Tue, Apr 29, 2025 at 03:32:56PM -0500, Carlos Bilbao wrote:
> >
> >> Yes, the machine is effectively dead, but as things stand today,
> >> it's still drawing resources unnecessarily.
> >>
> >> Who cares? An example, as mentioned in the cover letter, is Linux running
> >
> > Ah, see, I didn't have no cover letter, only akpm's reply.
> >
> >> in VMs. Imagine a scenario where customers are billed based on CPU usage --
> >> having panicked VMs spinning in useless loops wastes their money. In shared
> >> envs, those wasted cycles could be used by other processes/VMs. But this
> >> is as much about the cloud as it is for laptops/embedded/anywhere -- Linux
> >> should avoid wasting resources wherever possible.
> >
> > So I don't really buy the laptop and embedded case, people tend to look
> > at laptops when open, and get very impatient when they don't respond.
> > Embedded things really should have a watchdog.
> >
> > Also, should you not be using panic_timeout to auto reboot your machine
> > in all these cases?
> >
> > In any case, the VM nonsense, do they not have a virtual watchdog to
> > 'reap' crashed VMs or something?
>
> The key word here is "should." Should embedded systems have a watchdog?
> Maybe. Should I've auto reboot set? Maybe. Perhaps I don’t want to reboot
> until I’ve root-caused the crash.
Install a kdump kernel, or log your serial line :-)
> But my patch set isn’t about “shoulds.”
> What I’m discussing here is (1) the default Linux behavior,
Well, the default behaviour works for the 'your own physical machine'
thing just fine -- and that has always been the default use-case.
Nobody is going to be sitting there staring at a panic screen for ages.
All the other weirdo cases like embedded and VMs, they're just that,
weirdos and they can keep their pieces :-)
> and (2)
> providing people with the flexibility to do what THEY think they should do,
> not what you think they should do.
Well, there are a ton of options already. Like said, we have watchdogs,
reboots, crash kernels and all sorts. Why do we need more?
All that said... the default more or less does for(;;) { mdelay(100) },
if you have a modern chip that should not end up using much power at
all. That should end up in delay_halt_tpause() or delay_halt_mwaitx()
(depending on you being on Intel or AMD). And spend most its time in
deep idle states.
Is something not working?
Powered by blists - more mailing lists