lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f481b4ab6dfebbc0637c843e5f1cd4ddfd4bd60b.camel@linux.intel.com>
Date:   Tue, 15 Oct 2019 06:31:46 -0700
From:   Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     tony.luck@...el.com, bp@...en8.de, tglx@...utronix.de,
        mingo@...hat.com, hpa@...or.com, bberg@...hat.com, x86@...nel.org,
        linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
        hdegoede@...hat.com, ckellner@...hat.com
Subject: Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal
 throttle messages

On Tue, 2019-10-15 at 10:48 +0200, Peter Zijlstra wrote:
> On Mon, Oct 14, 2019 at 02:21:00PM -0700, Srinivas Pandruvada wrote:
> > Some modern systems have very tight thermal tolerances. Because of
> > this
> > they may cross thermal thresholds when running normal workloads
> > (even
> > during boot). The CPU hardware will react by limiting
> > power/frequency
> > and using duty cycles to bring the temperature back into normal
> > range.
> > 
> > Thus users may see a "critical" message about the "temperature
> > above
> > threshold" which is soon followed by "temperature/speed normal".
> > These
> > messages are rate limited, but still may repeat every few minutes.
> > 
> > The solution here is to set a timeout when the temperature first
> > exceeds
> > the threshold.
> 
> Why can we even reach critical thresholds when the fans are working?
> I
> always thought it was BAD to ever reach the critical temps and have
> the
> hardware throttle.
CPU temperature doesn't have to hit max(TjMax) to get these warnings.
OEMs has an ability to program a threshold where a thermal interrupt
can be generated. In some systems the offset is 20C+ (Read only value).

In recent systems, there is another offset on top of it which can be
programmed by OS, once some agent can adjust power limits dynamically.
By default this is set to low by the firmware, which I guess the prime
motivation of Benjamin to submit the patch.

Thanks,
Srinivas



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ