[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fe6c90ea-7b19-36d9-2568-f484c54eafff@linaro.org>
Date: Tue, 4 Oct 2022 13:49:36 +0200
From: Daniel Lezcano <daniel.lezcano@...aro.org>
To: Guenter Roeck <linux@...ck-us.net>, linux-pm@...r.kernel.org
Cc: "Rafael J . Wysocki" <rafael@...nel.org>,
Amit Kucheria <amitk@...nel.org>,
Zhang Rui <rui.zhang@...el.com>,
Lukasz Luba <lukasz.luba@....com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC/RFT PATCH resend] thermal: Protect thermal device operations
against thermal device removal
On 04/10/2022 05:39, Guenter Roeck wrote:
> A call to thermal_zone_device_unregister() results in thermal device
> removal. While the thermal device itself is reference counted and
> protected against removal of its associated data structures, the
> thermal device operations are owned by the calling code and unprotected.
> This may result in crashes such as
>
> BUG: unable to handle page fault for address: ffffffffc04ef420
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 5d60e067 P4D 5d60e067 PUD 5d610067 PMD 110197067 PTE 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 1 PID: 3209 Comm: cat Tainted: G W 5.10.136-19389-g615abc6eb807 #1 02df41ac0b12f3a64f4b34245188d8875bb3bce1
> Hardware name: Google Coral/Coral, BIOS Google_Coral.10068.92.0 11/27/2018
> RIP: 0010:thermal_zone_get_temp+0x26/0x73
> Code: 89 c3 eb d3 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 53 48 85 ff 74 50 48 89 fb 48 81 ff 00 f0 ff ff 77 44 48 8b 83 98 03 00 00 <48> 83 78 10 00 74 36 49 89 f6 4c 8d bb d8 03 00 00 4c 89 ff e8 9f
> RSP: 0018:ffffb3758138fd38 EFLAGS: 00010287
> RAX: ffffffffc04ef410 RBX: ffff98f14d7fb000 RCX: 0000000000000000
> RDX: ffff98f17cf90000 RSI: ffffb3758138fd64 RDI: ffff98f14d7fb000
> RBP: ffffb3758138fd50 R08: 0000000000001000 R09: ffff98f17cf90000
> R10: 0000000000000000 R11: ffffffff8dacad28 R12: 0000000000001000
> R13: ffff98f1793a7d80 R14: ffff98f143231708 R15: ffff98f14d7fb018
> FS: 00007ec166097800(0000) GS:ffff98f1bbd00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffc04ef420 CR3: 000000010ee9a000 CR4: 00000000003506e0
> Call Trace:
> temp_show+0x31/0x68
> dev_attr_show+0x1d/0x4f
> sysfs_kf_seq_show+0x92/0x107
> seq_read_iter+0xf5/0x3f2
> vfs_read+0x205/0x379
> __x64_sys_read+0x7c/0xe2
> do_syscall_64+0x43/0x55
> entry_SYSCALL_64_after_hwframe+0x61/0xc6
>
> if a thermal device is removed while accesses to its device attributes
> are ongoing.
>
> Use the thermal device mutex to protect device operations. Clear the
> device operations pointer in thermal_zone_device_unregister() under
> protection of this mutex, and only access it while the mutex is held.
> Flatten and simplify device mutex operations to only acquire the mutex
> once and hold it instead of acquiring and releasing it several times
> during thermal operations. Only validate parameters once at module entry
> points after acquiring the mutex. Execute governor operations under mutex
> instead of expecting governors to acquire and release it.
Does the following series:
https://lore.kernel.org/lkml/20220805153834.2510142-1-daniel.lezcano@linaro.org/
goes to the same direction than your proposal?
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Powered by blists - more mailing lists