lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C78E8EF.1000009@cesarb.net>
Date:	Sat, 28 Aug 2010 07:46:07 -0300
From:	Cesar Eduardo Barros <cesarb@...arb.net>
To:	Joe Perches <joe@...ches.com>
CC:	Jesse Barnes <jbarnes@...tuousgeek.org>,
	Matthew Garrett <mjg@...hat.com>,
	platform-driver-x86@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded"
 messages

Em 27-08-2010 23:21, Joe Perches escreveu:
> On Fri, 2010-08-27 at 20:12 -0300, Cesar Eduardo Barros wrote:
>> Em 27-08-2010 04:39, Joe Perches escreveu:
>>> On Thu, 2010-08-26 at 22:38 -0300, Cesar Eduardo Barros wrote:
>>>> - The first "MCP power limit exceeded" seems very bogus.
>>>> - What do you mean, core_power_limit is zero?
>>> I added a logging message whenever the turbo limits change
>>> and logging messages for power/temp on MCH for completeness.
>>> Maybe this will show something useful like when/how
>>> CPU power limit gets set to 0.
>
>> Running with it right now, did not help much:
>>
>> $ dmesg | fgrep 'intel ips'
>> intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value
>> (found 25, expected 35)
>> intel ips 0000:00:1f.6: PCI INT C ->  GSI 18 (level, low) ->  IRQ 18
>> intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 65535
>> intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:8058 +
>> mch:23392829
>> intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5675
>> intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6369
>
> I believe all these limits should always have non-zero values.
> So I still think you've hardware problems, but I suppose it
> could be the driver not reading the right registers or some
> such.  It seems odd that the driver never printed a logging
> message for either of the polling or irq methods to read the
> device cpu and thermal limits.

Come on, no blaming the BIOS? ;-)

If I read the code with your previous patch correctly, show_turbo_limits 
will never be called if poll_turbo_status is false but no interrupt 
happens. And we know no interrupt happened (at least not with nonzero 
register values), because the interrupt handler does two dev_info() 
right at the beginning. So the limits could still be the ones initially 
set at ips_probe().

I will try to enable dev_dbg() later and see what it prints.

>
> Jesse or any Intel folk, can you verify or suggest anything
> better?
>
> If cpu_power_limit, or any _limit, is not set perhaps changing
> the test style to verify limit and adding a printed_once alert
> for each 0 value limit.  At least that'd shut up the continuous
> logging but at least give a notification message.
>
> if (limit) {
> 	if (measured_val>  limit)
> 		dev_info(foo)
> } else
> 	dev_alert_once()

Wouldn't it make more sense to do the alert when the limit is set, 
instead of when it is used? Also, it should still treat it as limit 
exceeded (better safe than sorry). Something like:

if (measured_val > limit) {
     if (limit)
         dev_info(...);
     ret = true;
}

-- 
Cesar Eduardo Barros
cesarb@...arb.net
cesar.barros@...il.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ