[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0891fecd-88e9-cd50-b2ac-8b50c7149b19@intel.com>
Date: Thu, 30 Mar 2023 21:08:16 +0200
From: "Wysocki, Rafael J" <rafael.j.wysocki@...el.com>
To: "Zhang, Rui" <rui.zhang@...el.com>,
"Torvalds, Linus" <torvalds@...ux-foundation.org>,
"rostedt@...dmis.org" <rostedt@...dmis.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [BUG v6.3-rc4+] WARNING: CPU: 0 PID: 1 at
drivers/thermal/thermal_sysfs.c:879 cooling_device_stats_setup+0xac/0xc0
On 3/30/2023 11:57 AM, Zhang, Rui wrote:
> Hi, Linus,
>
> On Wed, 2023-03-29 at 15:52 -0700, Linus Torvalds wrote:
>> On Wed, Mar 29, 2023 at 1:58 PM Steven Rostedt <rostedt@...dmis.org>
>> wrote:
>>> In preparation to adding my patch that checks for some kinds of
>>> bugs in
>>> trace events, I decided to run it on the Linus's latest branch, to
>>> see if
>>> there's any other trace events that may cause issues. But instead I
>>> hit
>>> this unrelated bug. Looks to be triggering an added
>>> lockdep_assert() on
>>> boot up.
>> So I think that lockdep assert is likely bogus.
It is, sorry about that.
I have a fix queued up,
https://patchwork.kernel.org/project/linux-pm/patch/2681615.mvXUDI8C0e@kreacher
>> It was added in commit 790930f44289 ("thermal: core: Introduce
>> thermal_cooling_device_update()") but the reason I say it's bogus is
>> that I don't think it has ever been tested:
>>
>>> static void cooling_device_stats_setup(struct
>>> thermal_cooling_device *cdev)
>>> {
>>> lockdep_assert_held(&cdev->lock); <<<---- line 879
>> Yeah, so cooling_device_stats_setup() is called from two places:
>>
>> - thermal_cooling_device_setup_sysfs()
>>
>> - thermal_cooling_device_stats_reinit()
>>
>> and that first place is when that cdev is created, before it's
>> registered anywhere. It's not locked in that case, and yes, the
>> lockdep_assert_held() will trigger.
>>
>> As far as I can tell it will always trigger, and this
>> lockdep_assert()
>> has thus never been tested with lockdep enabled.
>>
>> The "stats_reinit" case seems to also be called from only one place
>> (thermal_cooling_device_update()), and that path does indeed hold the
>> cdev->lock.
>>
>> That lockdep could be made happy by having
>> thermal_cooling_device_setup_sysfs() create that device with the cdev
>> lock held. I guess that's easy enough, although somewhat annoyingly
>> there is no "mutex_init_locked()", you have to actually do
>> "mutex_init()" followed by a "mutex_lock()". And obviously unlock it
>> after doing the setup_sysfs().
>>
>> But I question whether the lockdep test should be done at all. I find
>> it distasteful that it was added with absolutely zero testing.
>>
>>
> I just realized why I cannot reproduce this problem on my testbox.
>
> In order to test the original patch with ACPI passive cooling enabled,
> I rebuild the kernel with customized DSDT.
> This taints the kernel, and clears the debug_locks, thus I didn't get
> any lockdep warnings...
So I was happy when I got a Tested-by from Rui on this and I didn't
double check. Turns out this was a mistake.
Cheers,
Rafael
Powered by blists - more mailing lists