linux-kernel - Re: Regression in PMC code in 6.12-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1cc25b5f-edd9-4520-b89b-9b7f833fbb4c@redhat.com>
Date: Sat, 12 Oct 2024 20:38:55 +0200
From: Hans de Goede <hdegoede@...hat.com>
To: Marek Maślanka <mmaslanka@...gle.com>
Cc: linux-kernel@...r.kernel.org, daniel.lezcano@...aro.org,
 Luca Coelho <luca@...lho.fi>
Subject: Re: Regression in PMC code in 6.12-rc1

Hi Marek,

On 12-Oct-24 8:30 PM, Marek Maślanka wrote:
> Hi Hans,
> 
> On Thu, Oct 10, 2024 at 4:12 PM Hans de Goede <hdegoede@...hat.com> wrote:
>>
>> Hi Marek,
>>
>> On 10-Oct-24 4:09 PM, Marek Maślanka wrote:
>>> Hi Franz,
>>
>> Franz? I guess you are trying to address me (Hans) ?
> 
> Yes! Forgive me for this mistake!

No problem / no worries.

>>> I need to redesign this patch. The pmcdev->lock in the
>>> pmc_core_acpi_pm_timer_suspend_resume might already be held by the
>>> pmc_core_mphy_pg_show or pmc_core_pll_show if the userspace gets
>>> frozen when these functions are being executed, this will cause a hang.
>>>
>>> Can you instruct me how to revert this patch? Or you can just do it?
>>
>> Please submit a revert based on top of:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/log/?h=fixes
>>
>> with a commit message explaining why this needs to be reverted for now
>> and then I will merge the revert into the fixes branch and include
>> it in the next fixes pull-request to Torvalds.
> 
> Done.

Thank you. I'll apply this and send it on its way to Linus some
time next week.

Regards,

Hans





>>> On Mon, Oct 7, 2024 at 12:57 PM Marek Maślanka <mmaslanka@...gle.com <mailto:mmaslanka@...gle.com>> wrote:
>>>
>>>     Hi Luca,
>>>
>>>     Thanks for the report.
>>>
>>>     Seems that the tick_freeze function in the kernel/time/tick-common.c
>>>     is helding the spinlock so the pmc_core_acpi_pm_timer_suspend_resume
>>>     shouldn't try to take the mutex lock. I'll look for the solution.
>>>
>>>     Marek
>>>
>>>
>>>     On Mon, Oct 7, 2024 at 11:17 AM Luca Coelho <luca@...lho.fi <mailto:luca@...lho.fi>> wrote:
>>>     >
>>>     > Hi Marek et al,
>>>     >
>>>     > We have been facing some errors when running some of our Display CI
>>>     > tests that seem to have been introduced by the following commit:
>>>     >
>>>     > e86c8186d03a ("platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended")
>>>     >
>>>     > The errors we are getting look like this:
>>>     >
>>>     > <4> [222.857770] =============================
>>>     > <4> [222.857771] [ BUG: Invalid wait context ]
>>>     > <4> [222.857772] 6.12.0-rc1-xe #1 Not tainted
>>>     > <4> [222.857773] -----------------------------
>>>     > <4> [222.857774] swapper/4/0 is trying to lock:
>>>     > <4> [222.857775] ffff8881174c88c8 (&pmcdev->lock){+.+.}-{3:3}, at: pmc_core_acpi_pm_timer_suspend_resume+0x50/0xe0 [intel_pmc_core]
>>>     > <4> [222.857782] other info that might help us debug this:
>>>     > <4> [222.857783] context-{4:4}
>>>     > <4> [222.857784] 1 lock held by swapper/4/0:
>>>     > <4> [222.857785]  #0: ffffffff83452258 (tick_freeze_lock){....}-{2:2}, at: tick_freeze+0x16/0x110
>>>     > <4> [222.857791] stack backtrace:
>>>     > <4> [222.857793] CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Not tainted 6.12.0-rc1-xe #1
>>>     > <4> [222.857794] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR5 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
>>>     > <4> [222.857796] Call Trace:
>>>     > <4> [222.857797]  <TASK>
>>>     > <4> [222.857798]  dump_stack_lvl+0x80/0xc0
>>>     > <4> [222.857802]  dump_stack+0x10/0x20
>>>     > <4> [222.857805]  __lock_acquire+0x943/0x2800
>>>     > <4> [222.857808]  ? stack_trace_save+0x4b/0x70
>>>     > <4> [222.857812]  lock_acquire+0xc5/0x2f0
>>>     > <4> [222.857814]  ? pmc_core_acpi_pm_timer_suspend_resume+0x50/0xe0 [intel_pmc_core]
>>>     > <4> [222.857817]  __mutex_lock+0xbe/0xc70
>>>     > <4> [222.857819]  ? pmc_core_acpi_pm_timer_suspend_resume+0x50/0xe0 [intel_pmc_core]
>>>     > <4> [222.857822]  ? pmc_core_acpi_pm_timer_suspend_resume+0x50/0xe0 [intel_pmc_core]
>>>     > <4> [222.857825]  mutex_lock_nested+0x1b/0x30
>>>     > <4> [222.857827]  ? mutex_lock_nested+0x1b/0x30
>>>     > <4> [222.857828]  pmc_core_acpi_pm_timer_suspend_resume+0x50/0xe0 [intel_pmc_core]
>>>     > <4> [222.857831]  acpi_pm_suspend+0x23/0x40
>>>     > <4> [222.857834]  clocksource_suspend+0x2b/0x50
>>>     > <4> [222.857836]  timekeeping_suspend+0x22a/0x360
>>>     > <4> [222.857839]  tick_freeze+0x89/0x110
>>>     > <4> [222.857840]  enter_s2idle_proper+0x34/0x1d0
>>>     > <4> [222.857843]  cpuidle_enter_s2idle+0xaa/0x120
>>>     > <4> [222.857845]  ? tsc_verify_tsc_adjust+0x42/0x100
>>>     > <4> [222.857849]  do_idle+0x221/0x250
>>>     > <4> [222.857852]  cpu_startup_entry+0x29/0x30
>>>     > <4> [222.857854]  start_secondary+0x12e/0x160
>>>     > <4> [222.857856]  common_startup_64+0x13e/0x141
>>>     > <4> [222.857859]  </TASK>
>>>     >
>>>     > And the full logs can be found, for example, here:
>>>     >
>>>     > https://intel-gfx-ci.01.org/tree/intel-xe/xe-2016-92d12099cc768f36cf676ee1b014442a5c5ba965/shard-adlp-3/igt@kms_flip@flip-vs-suspend-interruptible.html <https://intel-gfx-ci.01.org/tree/intel-xe/xe-2016-92d12099cc768f36cf676ee1b014442a5c5ba965/shard-adlp-3/igt@kms_flip@flip-vs-suspend-interruptible.html>
>>>     >
>>>     >
>>>     > Reverting this commit seems to prevent the problem.  Do you have any
>>>     > idea what could be causing this and, more importantly, how to fix it?
>>>     > :)
>>>     >
>>>     > Thanks!
>>>     >
>>>     > --
>>>     > Cheers,
>>>     > Luca.
>>>
>>
>