[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191018071707.GA2328@hirez.programming.kicks-ass.net>
Date: Fri, 18 Oct 2019 09:17:07 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Borislav Petkov <bp@...en8.de>
Cc: "Luck, Tony" <tony.luck@...el.com>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"hpa@...or.com" <hpa@...or.com>,
"bberg@...hat.com" <bberg@...hat.com>,
"x86@...nel.org" <x86@...nel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"hdegoede@...hat.com" <hdegoede@...hat.com>,
"ckellner@...hat.com" <ckellner@...hat.com>
Subject: Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal
throttle messages
On Thu, Oct 17, 2019 at 11:44:45PM +0200, Borislav Petkov wrote:
> On Thu, Oct 17, 2019 at 09:31:30PM +0000, Luck, Tony wrote:
> > That sounds like the right short term action.
> >
> > Depending on what we end up with from Srinivas ... we may want
> > to reconsider the severity. The basic premise of Srinivas' patch
> > is to avoid printing anything for short excursions above temperature
> > threshold. But the effect of that is that when we find the core/package
> > staying above temperature for an extended period of time, we are
> > in a serious situation where some action may be needed. E.g.
> > move the laptop off the soft surface that is blocking the air vents.
>
> I don't think having a critical severity message is nearly enough.
> There are cases where the users simply won't see that message, no shell
> opened, nothing scanning dmesg, nothing pops up on the desktop to show
> KERN_CRIT messages, etc.
>
> If we really wanna handle this case then we must be much more reliable:
>
> * we throttle the machine from within the kernel - whatever that may mean
> * if that doesn't help, we stop scheduling !root tasks
> * if that doesn't help, we halt
> * ...
We have forced idle injection, that should be able to reduce the system
to barely functional but non-cooker status.
Powered by blists - more mailing lists