[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5134F44C.7040700@free.fr>
Date: Mon, 04 Mar 2013 20:21:48 +0100
From: Martin Peres <martin.peres@...e.fr>
To: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
CC: airlied@...ux.ie, bskeggs@...hat.com, marcin.slusarz@...il.com,
dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: nouveau shuts the machine down with v3.9-rc1 (temperature (72
C) hit the 'shutdown' threshold).
Hi Konrad,
On 04/03/2013 19:40, Konrad Rzeszutek Wilk wrote:> After git merge
ab7826595e9ec51a51f622c5fc91e2f59440481a
> (Merge tag 'mfd-3.9-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6)
> the nouveau driver ends up shutting of the machine when booting.
>
>
> I hadn't done a git bisection yet and was wondering if there are some
> juice commits I ought to look at?
Sure, no need to bisect, it is a new (apparently-broken-for-you) feature.
The code is in /drivers/gpu/drm/nouveau/core/subdev/therm/
>
> Here is the serial console:
> [ 6.940628] nouveau [ PTHERM][0000:00:0d.0] Thermal management:
disabled
> [ 6.957474] nouveau [ PTHERM][0000:00:0d.0] programmed
thresholds [ 90(2), 95(3), 145(2), 135(5) ]
> [ 6.966594] nouveau 6.975100] nouveau [
PTHERM][0000:00:0d.0] Thermal management: automatic
> [ 6.982059] nouveau [ PTHERM][0000:00:0d.0] temperature (88 C)
hit the 'downclock' threshold
> [ 6.990680] nouveau [ PTHERM][0000:00:0d.0] temperature (88 C)
hit the 'critical' threshold
> [ 6.999194] nouveau [ PTHERM][0000:00:0d.0] temperature (90 C)
hit the 'shutdown' threshold
See, this is strange. If I believe the "programmed thresholds" line, the
fanboost threshold is at 90°C, downclock is at 95°C, critical
temperature is at 145°C and shutdown is at 135°C.
So, from the BIOS side, things seem to be in fairly good shape (critical
should be lower than shutdown, but that's OK).
My theory is that your temperature sensor is very variable that would
set off the shutdown alarm. So, either the sensor needs more settling
time or the output is genuinely very variable.
In the first case, we could fix that by increasing the settling time (at
the expense of a longer boot period). We could also for a 10s wait at
boot time before reading temperature.
If this is the latter case, we only have the solution to average the
temperature on several samples. I would need statistics on the
variability in order to calculate a proper low-pass filter that wouldn't
be too slow or too RAM/wakeup-intensive.
I really hope the problem is the settling time!
Here is what you can do to test the theory:
Change the mdelay at line 41 of
/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c
(http://cgit.freedesktop.org/nouveau/linux-2.6/tree/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c#n41)
from 10 to 1000.
Please also add an mdelay of 1000 between lines 44 and 45.
If it works with this patch, then try decreasing the delay to 20ms.
In any way, I'll send some thermal patches tonight to be more resistant
to long settling times.
Thanks for reporting!
Martin (mupuf)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists