lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191017214445.GG14441@zn.tnic>
Date:   Thu, 17 Oct 2019 23:44:45 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "bberg@...hat.com" <bberg@...hat.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "hdegoede@...hat.com" <hdegoede@...hat.com>,
        "ckellner@...hat.com" <ckellner@...hat.com>
Subject: Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal
 throttle messages

On Thu, Oct 17, 2019 at 09:31:30PM +0000, Luck, Tony wrote:
> That sounds like the right short term action.
> 
> Depending on what we end up with from Srinivas ... we may want
> to reconsider the severity.  The basic premise of Srinivas' patch
> is to avoid printing anything for short excursions above temperature
> threshold. But the effect of that is that when we find the core/package
> staying above temperature for an extended period of time, we are
> in a serious situation where some action may be needed. E.g.
> move the laptop off the soft surface that is blocking the air vents.

I don't think having a critical severity message is nearly enough.
There are cases where the users simply won't see that message, no shell
opened, nothing scanning dmesg, nothing pops up on the desktop to show
KERN_CRIT messages, etc.

If we really wanna handle this case then we must be much more reliable:

* we throttle the machine from within the kernel - whatever that may mean
* if that doesn't help, we stop scheduling !root tasks
* if that doesn't help, we halt
* ...

These are purely hypothetical things to do but I'm pointing them out as
an example that in a high temperature situation we should be actively
doing something and not wait for the user to do that.

Come to think of it, one can apply the same type of logic here and split
the temp severity into action-required events and action-optional events
and then depending on the type, we do things.

Now what those things are, should be determined by the severity of the
events. Which would mean, we'd need to know how severe those events are.
And since this is left in the hands of the OEMs, good luck to us. ;-\

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ