[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5349D4D6.9060506@netscape.net>
Date: Sun, 13 Apr 2014 02:05:42 +0200
From: Manuel Krause <manuelkrause@...scape.net>
To: rui.zhang@...el.com
CC: "Rafael J. Wysocki" <rjw@...ysocki.net>,
Guenter Roeck <linux@...ck-us.net>,
linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
Jean Delvare <jdelvare@...e.de>, lm-sensors@...sensors.org
Subject: Re: 3.13.?: Strange / dangerous fan policy...
On 2014-04-11 00:51, Manuel Krause wrote:
> On 2014-04-07 13:45, Rafael J. Wysocki wrote:
>> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
>>> On 2014-04-06 04:43, Guenter Roeck wrote:
>>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel
>>>>>>>>>>>>>>> Krause
>>>>>>>>>>>>>>> wrote:
>>>>>>>> [SNIP]
>>>>>>>>
>>>>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>>>>> convention? Or were my charts that unusable for your
>>>>>>>> analysis/work?
>>>>>>>>
>>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the
>>>>>>>> problem
>>>>>>>> persists. "Strange / dangerous fan policy..."
>>>>>>>>
>>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>>>>> overheating problem by manually issuing a:
>>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>>>>> particular setting may only work for my system! ...and keeps
>>>>>>>> working for 3.14-rc.
>>>>>>>>
>>>>>>>> In the following I'd like to present you a modified output
>>>>>>>> of my
>>>>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>>>>> system), that shows the results in the way of
>>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>>>>> {I've uploded the files to pastebin, to not swamp you and
>>>>>>>> the
>>>>>>>> lists with so many lines of logs.}
>>>>>>>>
>>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>>>> http://pastebin.com/HL1PNcda
>>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>>>> http://pastebin.com/98hgf1a9
>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>>>> http://pastebin.com/MuTwTnjD
>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>>>> *) command:
>>>>>>>> http://pastebin.com/2peda54z
>>>>>>>>
>>>>>>>> Please, have a look at them! And maybe, give me hints on
>>>>>>>> how I
>>>>>>>> can help you to further debug this issue, as my manual
>>>>>>>> method
>>>>>>>> works but it's annoying.
>>>>>>>>
>>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>>>>> Email-thread to someone in charge.
>>>>>>>>
>>>>>>>> Thank you for your work && best regards,
>>>>>>>> Manuel Krause
>>>>>>>>
>>>>>>>
>>>>>>> This is still BUG 71711
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>>>
>>>>>>> 3.12.15 works very well
>>>>>>> 3.13.7 fails
>>>>>>> 3.14.0-rc8 fails
>>>>>>>
>>>>>>
>>>>>> Best you can do would really be to bisect the problem.
>>>>>> Unfortunately only you (or someone else with an affected
>>>>>> system)
>>>>>> can do that. Once the culprit is known it would be much easier
>>>>>> to get it fixed.
>>>>>>
>>>>>> To answer your earlier question: I don't think you did
>>>>>> anything
>>>>>> wrong.
>>>>>> I guess everyone else is just as clueless as I am (if not,
>>>>>> speak up
>>>>>> and help ;-).
>>>>>>
>>>>>> Guenter
>>>>>>
>>>>>
>>>>> I've now bisected two times. From two different kernel origins,
>>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
>>>>> and, to be sure, I haven't given a false positive inbetween due
>>>>> to boredom.
>>>>>
>>>>
>>>> Not really. Keep in mint that you were able to track down the
>>>> bad
>>>> commit
>>>> among more than 10,000 commits in a reasonably short period
>>>> of time.
>>>>
>>>>> In the end it says each time:
>>>>> # git bisect bad | tee -a /var/log/bisect.log
>>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad
>>>>> commit
>>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>> Author: Zhang Rui <rui.zhang@...el.com>
>>>>> Date: Wed Sep 25 20:39:45 2013 +0800
>>>>>
>>>>> ACPI / AC: convert ACPI ac driver to platform bus
>>>>>
>>>>> Signed-off-by: Zhang Rui <rui.zhang@...el.com>
>>>>> Signed-off-by: Rafael J. Wysocki
>>>>> <rafael.j.wysocki@...el.com>
>>>>>
>>>> Off to the two of you...
>>>>
>>>> Guenter
>>>>
>>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers
>>>>>
>>>>>
>>>>> Please help me, on how I can help debug this more, and please
>>>>> also read the newest from
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>
>>>>> Manuel Krause
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> Sorry, that I've forgotton to add the following last night: After
>>> the first bisection round, I was so glad about a result that
>>> time, that I reverted this mentioned patch from the 3.13.8
>>> kernel, but this didn't fix it.
>>
>> This means that the commit in question didn't introduce the
>> problem
>> you're seeing.
>>
>> Please check out commit 7f2dc5c4bcbf (Merge tag
>> 'dm-3.13-changes' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
>>
>> build a kernel from that and see if you can reprocude the
>> problem with it.
>> If so, it can be used as your new "first known bad" kernel for
>> bisection.
>> Otherwise, you can use it as the "first good" one and commit
>> cc8ef52707341
>> as "first known bad".
>>
>> Thanks!
>>
>
> Sorry, for any inconvenience, but you should forget about what
> I've written, that reverting the patch in question from 3.13.x
> didn't fix it. Of course it didn't fix it, as the patch doesn't
> cleanly revert from release-kernels at all. My mistake!
>
> I' ve been guided by Guenter Roeck through two more bisecting
> sessions/ways on this, that always pointed to the commit in
> question.
>
> Some citation:
> Me:
>>>> O.k. I've now followed your latest directions:
>>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> => result after rebuild was BAD =>
>>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> => result after rebuild was GOOD
>>>>
> [ ...]
>>>> Reverting that commit in question from this very git tree
>>>> makes the
>>>> kernel work as expected.
> [ ... ]
> Guenter:
>>> Report the results you have above. That should show without
>>> question
>>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
>>> and it should be easy to reproduce.
>
> That seems to be all I can do for you for now. Please let me know
> of any preliminary patches to test!
> And I want to add special thanks to Guenter Roeck for his
> always-just-in-time assistance over so many days,
>
> Manuel Krause
>
BTW -- applying this patch in question to a 3.12.17 kernel, that
worked optimal WITHOUT it, makes it FAIL as described for 3.13.x
kernels. (And, yes, the patch applied cleanly, compiled fine and
boots nicely.)
Manuel Krause
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists