lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <685ef627-e377-bbf1-da11-7f7556ca2dd7@collabora.com>
Date:   Thu, 2 Jul 2020 19:19:32 +0200
From:   Andrzej Pietrasiewicz <andrzej.p@...labora.com>
To:     Daniel Lezcano <daniel.lezcano@...aro.org>,
        linux-pm@...r.kernel.org, linux-acpi@...r.kernel.org,
        netdev@...r.kernel.org, linux-wireless@...r.kernel.org,
        platform-driver-x86@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org,
        linux-renesas-soc@...r.kernel.org,
        linux-rockchip@...ts.infradead.org
Cc:     "Rafael J . Wysocki" <rjw@...ysocki.net>,
        Len Brown <lenb@...nel.org>,
        Vishal Kulkarni <vishal@...lsio.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jiri Pirko <jiri@...lanox.com>,
        Ido Schimmel <idosch@...lanox.com>,
        Johannes Berg <johannes.berg@...el.com>,
        Emmanuel Grumbach <emmanuel.grumbach@...el.com>,
        Luca Coelho <luciano.coelho@...el.com>,
        Intel Linux Wireless <linuxwifi@...el.com>,
        Kalle Valo <kvalo@...eaurora.org>,
        Peter Kaestle <peter@...e.net>,
        Darren Hart <dvhart@...radead.org>,
        Andy Shevchenko <andy@...radead.org>,
        Sebastian Reichel <sre@...nel.org>,
        Miquel Raynal <miquel.raynal@...tlin.com>,
        Amit Kucheria <amit.kucheria@...durent.com>,
        Support Opensource <support.opensource@...semi.com>,
        Shawn Guo <shawnguo@...nel.org>,
        Sascha Hauer <s.hauer@...gutronix.de>,
        Pengutronix Kernel Team <kernel@...gutronix.de>,
        Fabio Estevam <festevam@...il.com>,
        NXP Linux Team <linux-imx@....com>,
        Niklas Söderlund <niklas.soderlund@...natech.se>,
        Heiko Stuebner <heiko@...ech.de>,
        Orson Zhai <orsonzhai@...il.com>,
        Baolin Wang <baolin.wang7@...il.com>,
        Chunyan Zhang <zhang.lyra@...il.com>,
        Zhang Rui <rui.zhang@...el.com>,
        Allison Randal <allison@...utok.net>,
        Enrico Weigelt <info@...ux.net>,
        Gayatri Kammela <gayatri.kammela@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
        kernel@...labora.com
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

Hi,

W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze:
> On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote:
>> Hi Daniel,
>>
>> <snip>
>>
>>>>>>>
>>>>>>> I did reproduce:
>>>>>>>
>>>>>>> v5.8-rc3 + series => imx6 hang at boot time
>>>>>>> v5.8-rc3 => imx6 boots correctly
>>>
>>> So finally I succeeded to reproduce it on my imx7 locally. The sensor
>>> was failing to initialize for another reason related to the legacy
>>> cooling device, this is why it is not appearing on the imx7.
>>>
>>> I can now git-bisect :)
>>>
>>
>> That would be very kind of you, thank you!
> 
> With the lock correctness option enabled:
> 
> [    4.179223] imx_thermal tempmon: Extended Commercial CPU temperature
> grade - max:105C critical:100C passive:95C
> [    4.189557]
> [    4.191060] ============================================
> [    4.196378] WARNING: possible recursive locking detected
> [    4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted
> [    4.207102] --------------------------------------------
> [    4.212421] kworker/0:3/54 is trying to acquire lock:
> [    4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> thermal_zone_device_is_enabled+0x18/0x34
> [    4.225777]
> [    4.225777] but task is already holding lock:
> [    4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> thermal_zone_get_temp+0x38/0x6c
> [    4.239121]
> [    4.239121] other info that might help us debug this:
> [    4.245655]  Possible unsafe locking scenario:
> [    4.245655]
> [    4.251579]        CPU0
> [    4.254031]        ----
> [    4.256481]   lock(&tz->lock);
> [    4.259544]   lock(&tz->lock);
> [    4.262608]
> [    4.262608]  *** DEADLOCK ***
> [    4.262608]
> [    4.268533]  May be due to missing lock nesting notation
> [    4.268533]
> [    4.275329] 4 locks held by kworker/0:3/54:
> [    4.279517]  #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, at:
> process_one_work+0x224/0x808
> [    4.288241]  #1: ca075f10 (deferred_probe_work){+.+.}-{0:0}, at:
> process_one_work+0x224/0x808
> [    4.296787]  #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at:
> __device_attach+0x30/0x140
> [    4.304468]  #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> thermal_zone_get_temp+0x38/0x6c
> [    4.312408]
> [    4.312408] stack backtrace:
> [    4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted
> 5.8.0-rc3-00011-gf5e50bf4d3ef #42
> [    4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree)
> [    4.330809] Workqueue: events deferred_probe_work_func
> [    4.335973] [<c0312384>] (unwind_backtrace) from [<c030c580>]
> (show_stack+0x10/0x14)
> [    4.343734] [<c030c580>] (show_stack) from [<c079d7d8>]
> (dump_stack+0xe8/0x114)
> [    4.351062] [<c079d7d8>] (dump_stack) from [<c03abf78>]
> (__lock_acquire+0xbfc/0x2cb4)
> [    4.358909] [<c03abf78>] (__lock_acquire) from [<c03ae9c4>]
> (lock_acquire+0xf4/0x4e4)
> [    4.366758] [<c03ae9c4>] (lock_acquire) from [<c10630fc>]
> (__mutex_lock+0xb0/0xaa8)
> [    4.374431] [<c10630fc>] (__mutex_lock) from [<c1063b10>]
> (mutex_lock_nested+0x1c/0x24)
> [    4.382452] [<c1063b10>] (mutex_lock_nested) from [<c0d932c0>]
> (thermal_zone_device_is_enabled+0x18/0x34)
> [    4.392036] [<c0d932c0>] (thermal_zone_device_is_enabled) from
> [<c0d9da90>] (imx_get_temp+0x30/0x208)
> [    4.401271] [<c0d9da90>] (imx_get_temp) from [<c0d97484>]
> (thermal_zone_get_temp+0x4c/0x6c)
> [    4.409640] [<c0d97484>] (thermal_zone_get_temp) from [<c0d93df0>]
> (thermal_zone_device_update+0x8c/0x258)
> [    4.419310] [<c0d93df0>] (thermal_zone_device_update) from
> [<c0d9401c>] (thermal_zone_device_set_mode+0x60/0x88)
> [    4.429500] [<c0d9401c>] (thermal_zone_device_set_mode) from
> [<c0d9e1d4>] (imx_thermal_probe+0x3e4/0x578)
> [    4.439082] [<c0d9e1d4>] (imx_thermal_probe) from [<c0a78388>]
> (platform_drv_probe+0x48/0x98)
> [    4.447622] [<c0a78388>] (platform_drv_probe) from [<c0a7603c>]
> (really_probe+0x218/0x348)
> [    4.455903] [<c0a7603c>] (really_probe) from [<c0a76278>]
> (driver_probe_device+0x5c/0xb4)
> [    4.464098] [<c0a76278>] (driver_probe_device) from [<c0a743bc>]
> (bus_for_each_drv+0x58/0xb8)
> [    4.472638] [<c0a743bc>] (bus_for_each_drv) from [<c0a75db0>]
> (__device_attach+0xd4/0x140)
> [    4.480919] [<c0a75db0>] (__device_attach) from [<c0a750b0>]
> (bus_probe_device+0x88/0x90)
> [    4.489112] [<c0a750b0>] (bus_probe_device) from [<c0a75564>]
> (deferred_probe_work_func+0x68/0x98)
> [    4.498088] [<c0a75564>] (deferred_probe_work_func) from [<c0369988>]
> (process_one_work+0x2e0/0x808)
> [    4.507237] [<c0369988>] (process_one_work) from [<c036a150>]
> (worker_thread+0x2a0/0x59c)
> [    4.515432] [<c036a150>] (worker_thread) from [<c0372208>]
> (kthread+0x16c/0x178)
> [    4.522843] [<c0372208>] (kthread) from [<c0300174>]
> (ret_from_fork+0x14/0x20)
> [    4.530074] Exception stack(0xca075fb0 to 0xca075ff8)
> [    4.535138] 5fa0:                                     00000000
> 00000000 00000000 00000000
> [    4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000
> [    4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> 
> 
> 

Thanks!

That confirms your suspicions.

So the reason is that ->get_temp() is called while the mutex is held and
thermal_zone_device_is_enabled() wants to take the same mutex.

Is adding a comment to thermal_zone_device_is_enabled() to never call
it while the mutex is held and adding another version of it which does
not take the mutex ok?

Andrzej

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ