lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jv0BYE1pgCEJDsadfzH0ZnZYfwJuScPMQcpFYSJPYL6w@mail.gmail.com>
Date:   Thu, 30 Mar 2023 12:07:12 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
Cc:     rafael.j.wysocki@...el.com, rui.zhang@...el.com,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        rafael@...nel.org, daniel.lezcano@...aro.org,
        linux-pm@...r.kernel.org
Subject: Re: [bug/6.3-rc4/bisected] WARNING at cooling_device_stats_setup+0xac
 caused by commit 790930f44289c8209c57461b2db499fcc702e0b3

On Thu, Mar 30, 2023 at 9:52 AM Mikhail Gavrilov
<mikhail.v.gavrilov@...il.com> wrote:
>
> Hi,
> The release 6.3-rc4 brings new warning messages to log:

Thanks for the report, please see this patch:

https://patchwork.kernel.org/project/linux-pm/patch/2681615.mvXUDI8C0e@kreacher/

> [    4.590775] ------------[ cut here ]------------
> [    4.590783] WARNING: CPU: 2 PID: 1 at
> drivers/thermal/thermal_sysfs.c:879
> cooling_device_stats_setup+0xac/0xc0
> [    4.590799] Modules linked in:
> [    4.590806] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
> 6.3.0-rc3-08-790930f44289c8209c57461b2db499fcc702e0b3+ #87
> [    4.590819] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
> G513QY_G513QY/G513QY, BIOS G513QY.320 09/07/2022
> [    4.590832] RIP: 0010:cooling_device_stats_setup+0xac/0xc0
> [    4.590841] Code: ff 48 89 1d 9e 27 9f 01 5b 5d 41 5c c3 cc cc cc
> cc 48 8d bf 60 05 00 00 be ff ff ff ff e8 5c 16 3b 00 85 c0 0f 85 72
> ff ff ff <0f> 0b e9 6b ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
> 90 90
> [    4.590863] RSP: 0018:ffffa5a080107c60 EFLAGS: 00010246
> [    4.590871] RAX: 0000000000000000 RBX: ffff96fc51f6d800 RCX: 0000000000000001
> [    4.590880] RDX: 0000000000000000 RSI: ffffffffb9a7f591 RDI: ffffffffb9b325ce
> [    4.590889] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001
> [    4.590898] R10: 0000000000000001 R11: 0000000000000001 R12: ffff96fc51f6d800
> [    4.590907] R13: ffff96fc51f6d818 R14: ffff96fc4b450000 R15: 0000000000000000
> [    4.590916] FS:  0000000000000000(0000) GS:ffff970b16a00000(0000)
> knlGS:0000000000000000
> [    4.590927] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    4.590934] CR2: 0000000000000000 CR3: 000000034643c000 CR4: 0000000000750ee0
> [    4.590944] PKRU: 55555554
> [    4.590948] Call Trace:
> [    4.590953]  <TASK>
> [    4.590958]  thermal_cooling_device_setup_sysfs+0xe/0x20
> [    4.590967]  __thermal_cooling_device_register.part.0+0x13c/0x3d0
> [    4.590977]  acpi_processor_thermal_init+0x22/0x100
> [    4.590987]  __acpi_processor_start+0x7f/0xf0
> [    4.590995]  acpi_processor_start+0x2c/0x50
> [    4.591002]  really_probe+0x19e/0x3e0
> [    4.591010]  ? __pfx___driver_attach+0x10/0x10
> [    4.591017]  __driver_probe_device+0x78/0x160
> [    4.591025]  driver_probe_device+0x1f/0x90
> [    4.591032]  __driver_attach+0xd2/0x1c0
> [    4.591039]  bus_for_each_dev+0x8b/0xe0
> [    4.591047]  bus_add_driver+0x115/0x210
> [    4.591055]  driver_register+0x55/0x100
> [    4.591062]  ? __pfx_acpi_processor_driver_init+0x10/0x10
> [    4.591072]  acpi_processor_driver_init+0x3b/0xc0
> [    4.591080]  ? __pfx_acpi_processor_driver_init+0x10/0x10
> [    4.591088]  do_one_initcall+0x70/0x290
> [    4.591101]  kernel_init_freeable+0x3c5/0x580
> [    4.591112]  ? __pfx_kernel_init+0x10/0x10
> [    4.591122]  kernel_init+0x16/0x1c0
> [    4.591128]  ret_from_fork+0x2c/0x50
> [    4.591139]  </TASK>
>
> This message appears after each boot.
>
> Bisect blaming this commit:
>
> commit 790930f44289c8209c57461b2db499fcc702e0b3
> Author: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> Date:   Fri Mar 17 18:01:26 2023 +0100
>
>     thermal: core: Introduce thermal_cooling_device_update()
>
>     Introduce a core thermal API function, thermal_cooling_device_update(),
>     for updating the max_state value for a cooling device and rearranging
>     its statistics in sysfs after a possible change of its ->get_max_state()
>     callback return value.
>
>     That callback is now invoked only once, during cooling device
>     registration, to populate the max_state field in the cooling device
>     object, so if its return value changes, it needs to be invoked again
>     and the new return value needs to be stored as max_state.  Moreover,
>     the statistics presented in sysfs need to be rearranged in general,
>     because there may not be enough room in them to store data for all
>     of the possible states (in the case when max_state grows).
>
>     The new function takes care of that (and some other minor things
>     related to it), but some extra locking and lockdep annotations are
>     added in several places too to protect against crashes in the cases
>     when the statistics are not present or when a stale max_state value
>     might be used by sysfs attributes.
>
>     Note that the actual user of the new function will be added separately.
>
>     Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
>     Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>     Tested-by: Zhang Rui <rui.zhang@...el.com>
>     Reviewed-by: Zhang Rui <rui.zhang@...el.com>
>
>  drivers/thermal/thermal_core.c  | 83 ++++++++++++++++++++++++++++++++++++++++-
>  drivers/thermal/thermal_core.h  |  2 +
>  drivers/thermal/thermal_sysfs.c | 74 +++++++++++++++++++++++++++++++-----
>  include/linux/thermal.h         |  1 +
>  4 files changed, 150 insertions(+), 10 deletions(-)
>
> All my PCs turned up affected by this issue:
> - CPU: Ryzen 3950X / MB: ROG Strix X570-I
> - CPU Ruzen 7950X / MB: MPG B650I EDGE WIFI
> - Laptop: ASUS ROG Strix G15 G513QY-HF001 (CPU: 5900HX)
>
> Unfortunately I couldn't check revert this commit, because after
> reverting the kernel does not build.
>
> drivers/acpi/processor_thermal.c: In function ‘acpi_thermal_cpufreq_init’:
> drivers/acpi/processor_thermal.c:149:17: error: implicit declaration
> of function ‘thermal_cooling_device_update’; did you mean
> ‘thermal_zone_device_update’? [-Werror=implicit-function-declaration]
>   149 |                 thermal_cooling_device_update(pr->cdev);
>       |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>       |                 thermal_zone_device_update
>
>
> Who wants to see the full kernel log could see an attached archive (for laptop).
>
> --
> Best Regards,
> Mike Gavrilov.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ