lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <c3258cb2-9a56-d048-5738-1132331a157d@linaro.org> Date: Mon, 3 Oct 2022 23:18:07 +0200 From: Daniel Lezcano <daniel.lezcano@...aro.org> To: Marek Szyprowski <m.szyprowski@...sung.com>, rafael@...nel.org Cc: linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org, rui.zhang@...el.com, Broadcom Kernel Team <bcm-kernel-feedback-list@...adcom.com>, Support Opensource <support.opensource@...semi.com>, Pengutronix Kernel Team <kernel@...gutronix.de>, NXP Linux Team <linux-imx@....com>, Andy Gross <agross@...nel.org>, Bartlomiej Zolnierkiewicz <bzolnier@...il.com>, Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>, Alim Akhtar <alim.akhtar@...sung.com>, netdev@...r.kernel.org, platform-driver-x86@...r.kernel.org, linux-rpi-kernel@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org, linux-arm-msm@...r.kernel.org, linux-renesas-soc@...r.kernel.org, linux-samsung-soc@...r.kernel.org, linux-tegra@...r.kernel.org, linux-omap@...r.kernel.org Subject: Re: [PATCH v8 00/29] Rework the trip points creation On 03/10/2022 16:10, Marek Szyprowski wrote: > Hi Daniel, > > On 03.10.2022 11:25, Daniel Lezcano wrote: >> This work is the pre-requisite of handling correctly when the trip >> point are crossed. For that we need to rework how the trip points are >> declared and assigned to a thermal zone. >> >> Even if it appears to be a common sense to have the trip points being >> ordered, this no guarantee neither documentation telling that is the >> case. >> >> One solution could have been to create an ordered array of trips built >> when registering the thermal zone by calling the different get_trip* >> ops. However those ops receive a thermal zone pointer which is not >> known as it is in the process of creating it. >> >> This cyclic dependency shows we have to rework how we manage the trip >> points. >> >> Actually, all the trip points definition can be common to the backend >> sensor drivers and we can factor out the thermal trip structure in all >> of them. >> >> Then, as we register the thermal trips array, they will be available >> in the thermal zone structure and a core function can return the trip >> given its id. >> >> The get_trip_* ops won't be needed anymore and could be removed. The >> resulting code will be another step forward to a self encapsulated >> generic thermal framework. >> >> Most of the drivers can be converted more or less easily. This series >> does a first round with most of the drivers. Some remain and will be >> converted but with a smaller set of changes as the conversion is a bit >> more complex. >> >> Changelog: >> v8: >> - Pretty oneline change and parenthesis removal (Rafael) >> - Collected tags >> v7: >> - Added missing return 0 in the x86_pkg_temp driver >> v6: >> - Improved the code for the get_crit_temp() function as suggested by >> Rafael >> - Removed inner parenthesis in the set_trip_temp() function and invert the >> conditions. Check the type of the trip point is unchanged >> - Folded patch 4 with 1 >> - Add per thermal zone info message in the bang-bang governor >> - Folded the fix for an uninitialized variable in >> int340x_thermal_zone_add() >> v5: >> - Fixed a deadlock when calling thermal_zone_get_trip() while >> handling the thermal zone lock >> - Remove an extra line in the sysfs change >> - Collected tags >> v4: >> - Remove extra lines on exynos changes as reported by Krzysztof Kozlowski >> - Collected tags >> v3: >> - Reorg the series to be git-bisect safe >> - Added the set_trip generic function >> - Added the get_crit_temp generic function >> - Removed more dead code in the thermal-of >> - Fixed the exynos changelog >> - Fixed the error check for the exynos drivers >> - Collected tags >> v2: >> - Added missing EXPORT_SYMBOL_GPL() for thermal_zone_get_trip() >> - Removed tab whitespace in the acerhdf driver >> - Collected tags >> >> Cc: Raju Rangoju <rajur@...lsio.com> >> Cc: "David S. Miller" <davem@...emloft.net> >> Cc: Eric Dumazet <edumazet@...gle.com> >> Cc: Jakub Kicinski <kuba@...nel.org> >> Cc: Paolo Abeni <pabeni@...hat.com> >> Cc: Peter Kaestle <peter@...e.net> >> Cc: Hans de Goede <hdegoede@...hat.com> >> Cc: Mark Gross <markgross@...nel.org> >> Cc: Miquel Raynal <miquel.raynal@...tlin.com> >> Cc: "Rafael J. Wysocki" <rafael@...nel.org> >> Cc: Daniel Lezcano <daniel.lezcano@...aro.org> >> Cc: Amit Kucheria <amitk@...nel.org> >> Cc: Zhang Rui <rui.zhang@...el.com> >> Cc: Nicolas Saenz Julienne <nsaenz@...nel.org> >> Cc: Broadcom Kernel Team <bcm-kernel-feedback-list@...adcom.com> >> Cc: Florian Fainelli <f.fainelli@...il.com> >> Cc: Ray Jui <rjui@...adcom.com> >> Cc: Scott Branden <sbranden@...adcom.com> >> Cc: Support Opensource <support.opensource@...semi.com> >> Cc: Lukasz Luba <lukasz.luba@....com> >> Cc: Shawn Guo <shawnguo@...nel.org> >> Cc: Sascha Hauer <s.hauer@...gutronix.de> >> Cc: Pengutronix Kernel Team <kernel@...gutronix.de> >> Cc: Fabio Estevam <festevam@...il.com> >> Cc: NXP Linux Team <linux-imx@....com> >> Cc: Thara Gopinath <thara.gopinath@...aro.org> >> Cc: Andy Gross <agross@...nel.org> >> Cc: Bjorn Andersson <bjorn.andersson@...aro.org> >> Cc: "Niklas Söderlund" <niklas.soderlund@...natech.se> >> Cc: Bartlomiej Zolnierkiewicz <bzolnier@...il.com> >> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org> >> Cc: Alim Akhtar <alim.akhtar@...sung.com> >> Cc: Thierry Reding <thierry.reding@...il.com> >> Cc: Jonathan Hunter <jonathanh@...dia.com> >> Cc: Eduardo Valentin <edubezval@...il.com> >> Cc: Keerthy <j-keerthy@...com> >> Cc: Kunihiko Hayashi <hayashi.kunihiko@...ionext.com> >> Cc: Masami Hiramatsu <mhiramat@...nel.org> >> Cc: Antoine Tenart <atenart@...nel.org> >> Cc: Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com> >> Cc: Dmitry Osipenko <digetx@...il.com> >> Cc: netdev@...r.kernel.org >> Cc: linux-kernel@...r.kernel.org >> Cc: platform-driver-x86@...r.kernel.org >> Cc: linux-pm@...r.kernel.org >> Cc: linux-rpi-kernel@...ts.infradead.org >> Cc: linux-arm-kernel@...ts.infradead.org >> Cc: linux-arm-msm@...r.kernel.org >> Cc: linux-renesas-soc@...r.kernel.org >> Cc: linux-samsung-soc@...r.kernel.org >> Cc: linux-tegra@...r.kernel.org >> Cc: linux-omap@...r.kernel.org >> >> Daniel Lezcano (29): >> thermal/core: Add a generic thermal_zone_get_trip() function >> thermal/sysfs: Always expose hysteresis attributes >> thermal/core: Add a generic thermal_zone_set_trip() function >> thermal/core/governors: Use thermal_zone_get_trip() instead of ops >> functions >> thermal/of: Use generic thermal_zone_get_trip() function >> thermal/of: Remove unused functions >> thermal/drivers/exynos: Use generic thermal_zone_get_trip() function >> thermal/drivers/exynos: of_thermal_get_ntrips() >> thermal/drivers/exynos: Replace of_thermal_is_trip_valid() by >> thermal_zone_get_trip() >> thermal/drivers/tegra: Use generic thermal_zone_get_trip() function >> thermal/drivers/uniphier: Use generic thermal_zone_get_trip() function >> thermal/drivers/hisi: Use generic thermal_zone_get_trip() function >> thermal/drivers/qcom: Use generic thermal_zone_get_trip() function >> thermal/drivers/armada: Use generic thermal_zone_get_trip() function >> thermal/drivers/rcar_gen3: Use the generic function to get the number >> of trips >> thermal/of: Remove of_thermal_get_ntrips() >> thermal/of: Remove of_thermal_is_trip_valid() >> thermal/of: Remove of_thermal_set_trip_hyst() >> thermal/of: Remove of_thermal_get_crit_temp() >> thermal/drivers/st: Use generic trip points >> thermal/drivers/imx: Use generic thermal_zone_get_trip() function >> thermal/drivers/rcar: Use generic thermal_zone_get_trip() function >> thermal/drivers/broadcom: Use generic thermal_zone_get_trip() function >> thermal/drivers/da9062: Use generic thermal_zone_get_trip() function >> thermal/drivers/ti: Remove unused macros ti_thermal_get_trip_value() / >> ti_thermal_trip_is_valid() >> thermal/drivers/acerhdf: Use generic thermal_zone_get_trip() function >> thermal/drivers/cxgb4: Use generic thermal_zone_get_trip() function >> thermal/intel/int340x: Replace parameter to simplify >> thermal/drivers/intel: Use generic thermal_zone_get_trip() function > > I've tested this v8 patchset after fixing the issue with Exynos TMU with > https://lore.kernel.org/all/20221003132943.1383065-1-daniel.lezcano@linaro.org/ > patch and I got the following lockdep warning on all Exynos-based boards: > > > ====================================================== > WARNING: possible circular locking dependency detected > 6.0.0-rc1-00083-ge5c9d117223e #12945 Not tainted > ------------------------------------------------------ > swapper/0/1 is trying to acquire lock: > c1ce66b0 (&data->lock#2){+.+.}-{3:3}, at: exynos_get_temp+0x3c/0xc8 > > but task is already holding lock: > c2979b94 (&tz->lock){+.+.}-{3:3}, at: > thermal_zone_device_update.part.0+0x3c/0x528 > > which lock already depends on the new lock. I'm wondering if the problem is not already there and related to data->lock ... Doesn't the thermal zone lock already prevent racy access to the data structure? Another question: if the sensor clock is disabled after reading it, how does the hardware update the temperature and detect the programed threshold is crossed? > the existing dependency chain (in reverse order) is: > > -> #1 (&tz->lock){+.+.}-{3:3}: > mutex_lock_nested+0x1c/0x24 > thermal_zone_get_trip+0x20/0x44 > exynos_tmu_initialize+0x144/0x1e0 > exynos_tmu_probe+0x2b0/0x728 > platform_probe+0x5c/0xb8 > really_probe+0xe0/0x414 > __driver_probe_device+0xa0/0x208 > driver_probe_device+0x30/0xc0 > __driver_attach+0xf0/0x1f0 > bus_for_each_dev+0x70/0xb0 > bus_add_driver+0x174/0x218 > driver_register+0x88/0x11c > do_one_initcall+0x64/0x380 > kernel_init_freeable+0x1c0/0x224 > kernel_init+0x18/0x12c > ret_from_fork+0x14/0x2c > 0x0 > > -> #0 (&data->lock#2){+.+.}-{3:3}: > lock_acquire+0x124/0x3e4 > __mutex_lock+0x90/0x948 > mutex_lock_nested+0x1c/0x24 > exynos_get_temp+0x3c/0xc8 > __thermal_zone_get_temp+0x5c/0x12c > thermal_zone_device_update.part.0+0x78/0x528 > __thermal_cooling_device_register.part.0+0x298/0x354 > __cpufreq_cooling_register.constprop.0+0x138/0x218 > of_cpufreq_cooling_register+0x48/0x8c > cpufreq_online+0x8d0/0xb2c > cpufreq_add_dev+0xb0/0xec > subsys_interface_register+0x108/0x118 > cpufreq_register_driver+0x15c/0x380 > dt_cpufreq_probe+0x2e4/0x434 > platform_probe+0x5c/0xb8 > really_probe+0xe0/0x414 > __driver_probe_device+0xa0/0x208 > driver_probe_device+0x30/0xc0 > __driver_attach+0xf0/0x1f0 > bus_for_each_dev+0x70/0xb0 > bus_add_driver+0x174/0x218 > driver_register+0x88/0x11c > do_one_initcall+0x64/0x380 > kernel_init_freeable+0x1c0/0x224 > kernel_init+0x18/0x12c > ret_from_fork+0x14/0x2c > 0x0 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&tz->lock); > lock(&data->lock#2); > lock(&tz->lock); > lock(&data->lock#2); > > *** DEADLOCK *** > > 5 locks held by swapper/0/1: > #0: c1c8648c (&dev->mutex){....}-{3:3}, at: __driver_attach+0xe4/0x1f0 > #1: c1210434 (cpu_hotplug_lock){++++}-{0:0}, at: > cpufreq_register_driver+0xc4/0x380 > #2: c1ed8298 (subsys mutex#8){+.+.}-{3:3}, at: > subsys_interface_register+0x4c/0x118 > #3: c131f944 (thermal_list_lock){+.+.}-{3:3}, at: > __thermal_cooling_device_register.part.0+0x238/0x354 > #4: c2979b94 (&tz->lock){+.+.}-{3:3}, at: > thermal_zone_device_update.part.0+0x3c/0x528 > > stack backtrace: > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-rc1-00083-ge5c9d117223e > #12945 > Hardware name: Samsung Exynos (Flattened Device Tree) > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x58/0x70 > dump_stack_lvl from check_noncircular+0xf0/0x158 > check_noncircular from __lock_acquire+0x15e8/0x2a7c > __lock_acquire from lock_acquire+0x124/0x3e4 > lock_acquire from __mutex_lock+0x90/0x948 > __mutex_lock from mutex_lock_nested+0x1c/0x24 > mutex_lock_nested from exynos_get_temp+0x3c/0xc8 > exynos_get_temp from __thermal_zone_get_temp+0x5c/0x12c > __thermal_zone_get_temp from thermal_zone_device_update.part.0+0x78/0x528 > thermal_zone_device_update.part.0 from > __thermal_cooling_device_register.part.0+0x298/0x354 > __thermal_cooling_device_register.part.0 from > __cpufreq_cooling_register.constprop.0+0x138/0x218 > __cpufreq_cooling_register.constprop.0 from > of_cpufreq_cooling_register+0x48/0x8c > of_cpufreq_cooling_register from cpufreq_online+0x8d0/0xb2c > cpufreq_online from cpufreq_add_dev+0xb0/0xec > cpufreq_add_dev from subsys_interface_register+0x108/0x118 > subsys_interface_register from cpufreq_register_driver+0x15c/0x380 > cpufreq_register_driver from dt_cpufreq_probe+0x2e4/0x434 > dt_cpufreq_probe from platform_probe+0x5c/0xb8 > platform_probe from really_probe+0xe0/0x414 > really_probe from __driver_probe_device+0xa0/0x208 > __driver_probe_device from driver_probe_device+0x30/0xc0 > driver_probe_device from __driver_attach+0xf0/0x1f0 > __driver_attach from bus_for_each_dev+0x70/0xb0 > bus_for_each_dev from bus_add_driver+0x174/0x218 > bus_add_driver from driver_register+0x88/0x11c > driver_register from do_one_initcall+0x64/0x380 > do_one_initcall from kernel_init_freeable+0x1c0/0x224 > kernel_init_freeable from kernel_init+0x18/0x12c > kernel_init from ret_from_fork+0x14/0x2c > Exception stack(0xf082dfb0 to 0xf082dff8) > ... > > Let me know if You need anything more to test. > > >> drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 2 - >> .../ethernet/chelsio/cxgb4/cxgb4_thermal.c | 41 +---- >> drivers/platform/x86/acerhdf.c | 73 +++----- >> drivers/thermal/armada_thermal.c | 39 ++--- >> drivers/thermal/broadcom/bcm2835_thermal.c | 8 +- >> drivers/thermal/da9062-thermal.c | 52 +----- >> drivers/thermal/gov_bang_bang.c | 39 +++-- >> drivers/thermal/gov_fair_share.c | 18 +- >> drivers/thermal/gov_power_allocator.c | 51 +++--- >> drivers/thermal/gov_step_wise.c | 22 ++- >> drivers/thermal/hisi_thermal.c | 11 +- >> drivers/thermal/imx_thermal.c | 72 +++----- >> .../int340x_thermal/int340x_thermal_zone.c | 33 ++-- >> .../int340x_thermal/int340x_thermal_zone.h | 4 +- >> .../processor_thermal_device.c | 10 +- >> drivers/thermal/intel/x86_pkg_temp_thermal.c | 120 +++++++------ >> drivers/thermal/qcom/qcom-spmi-temp-alarm.c | 39 ++--- >> drivers/thermal/rcar_gen3_thermal.c | 2 +- >> drivers/thermal/rcar_thermal.c | 53 +----- >> drivers/thermal/samsung/exynos_tmu.c | 57 +++---- >> drivers/thermal/st/st_thermal.c | 47 +---- >> drivers/thermal/tegra/soctherm.c | 33 ++-- >> drivers/thermal/tegra/tegra30-tsensor.c | 17 +- >> drivers/thermal/thermal_core.c | 160 +++++++++++++++--- >> drivers/thermal/thermal_core.h | 24 +-- >> drivers/thermal/thermal_helpers.c | 28 +-- >> drivers/thermal/thermal_netlink.c | 21 +-- >> drivers/thermal/thermal_of.c | 116 ------------- >> drivers/thermal/thermal_sysfs.c | 133 +++++---------- >> drivers/thermal/ti-soc-thermal/ti-thermal.h | 15 -- >> drivers/thermal/uniphier_thermal.c | 27 ++- >> include/linux/thermal.h | 10 ++ >> 32 files changed, 559 insertions(+), 818 deletions(-) >> > Best regards > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
Powered by blists - more mailing lists