[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <03a18f37-9b22-42bd-a3ca-86c8d95b4b1d@gmail.com>
Date: Sun, 4 May 2025 22:02:44 +0200
From: Jacek Łuczak <difrost.kernel@...il.com>
To: "Prasad, Prasad" <venkataprasad.potturu@....com>,
"Limonciello, Mario" <Mario.Limonciello@....com>,
Mark Brown <broonie@...nel.org>, "Mukunda, Vijendar"
<Vijendar.Mukunda@....com>
Cc: "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>,
"linux-sound@...r.kernel.org" <linux-sound@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [REGRESSION] Resume from suspend broken in 6.15-rc due to ACP
changes.
On 4/24/25 6:53 PM, Prasad, Prasad wrote:
> On 4/24/2025 12:57 AM, Mario Limonciello wrote:
>> On 4/23/2025 2:12 PM, Mario Limonciello wrote:
>>> On 4/23/2025 10:18 AM, Mario Limonciello wrote:
>>>> On 4/23/2025 10:06 AM, Mark Brown wrote:
>>>>> On Wed, Apr 16, 2025 at 01:20:33PM +0200, Jacek Luczak wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On my ASUS Vivobook S16 (and on similar ASUS HW - see [1]) on resume
>>>>>> from suspend system dies (no logs available) soon after GPU completes
>>>>>> resume - I can see the login screen, only power cycle left.
>>>>> Are there any updates on this from the AMD side? As things stand my
>>>>> inclination is to revert the bulk of the changes to the driver from
>>>>> the
>>>>> past merge window, I don't really know anything about this hardware
>>>>> specifically and "dies without logs" is obviously giving few hints.
>>>>> None of the skipped commits looks immediately suspect, there's
>>>>> doubtless
>>>>> some unintended change in there.
>>>> This is the first I'm hearing of it; I expect we can dig in and find
>>>> a solution so we don't need to revert that whole series.
>>>>
>>>> Let me add Vijendar to check if this jumps out to him what went wrong.
>>>>
>>>> * Can we please see the full dmesg up to the failure?
>>>> * journalctl -k -b-1 can fetch everything from the last boot up
>>>> until the freeze.
>>>> * Any crash in /var/lib/systemd/pstore by chance?
>>>>
>>>>> Adding Mario and leaving the context for his benefit.
>>>> To double check - can you blacklist the ACP driver and
>>>> suspend/resume and everything is OK?
>>>>
>>>> If possible can you please capture a report with https://
>>>> web.git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug-
>>>> tools.git/ tree/amd_s2idle.py both in the case of ACP driver
>>>> blacklisted and not blacklisted? I would like to compare.
>>>>
>>>> Also; can you put all these artifacts I'm asking for into somewhere
>>>> non ephemeral like a kernel bugzilla? You can loop me and Vijendar
>>>> into it.
>>> FYI - We managed to track an S16 down and can reproduce the issue.
>>> It's a NULL pointer deref happening on the resume path.
>>>
>>> <1>[ 74.046372] BUG: kernel NULL pointer dereference, address:
>>> 0000000000000010
>>> <1>[ 74.046375] #PF: supervisor read access in kernel mode
>>> <1>[ 74.046377] #PF: error_code(0x0000) - not-present page
>>> <6>[ 74.046380] PGD 0 P4D 0
>>> <4>[ 74.046384] Oops: Oops: 0000 [#1] SMP NOPTI
>>> <4>[ 74.046389] CPU: 4 UID: 0 PID: 2563 Comm: rtcwake Not tainted
>>> 6.15.0-061500rc3-generic #202504202138 PREEMPT(voluntary)
>>> Oops#1 Part4
>>> <4>[ 74.046394] Hardware name: ASUSTeK COMPUTER INC. ASUS Vivobook
>>> S 16 M5606KA_M5606KA/M5606KA, BIOS M5606KA.304 01/24/2025
>>> <4>[ 74.046396] RIP: 0010:acp70_pcm_resume+0x4f/0xe0 [snd_acp70]
>>> <4>[ 74.046405] Code: 48 89 45 d0 e8 c2 da 98 fc 49 8b 5d 50 49 39
>>> de 75 18 eb 7b 48 89 da 4c 89 ee 4c 89 ff e8 29 cc f6 ff 48 8b 1b 4c
>>> 39 f3 74 65 <4c> 8b 7b 10 4d 85 ff 74 ef 49 8b 97 c0 00 00 00 48 85
>>> d2 74 e3 8b
>>> <4>[ 74.046407] RSP: 0018:ffffd12644d13880 EFLAGS: 00010286
>>> <4>[ 74.046410] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>>> 0000000000000000
>>> <4>[ 74.046412] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>> 0000000000000000
>>> <4>[ 74.046413] RBP: ffffd12644d138b0 R08: 0000000000000000 R09:
>>> 0000000000000000
>>> <4>[ 74.046415] R10: 0000000000000000 R11: 0000000000000000 R12:
>>> ffffffffbd774fd0
>>> <4>[ 74.046416] R13: ffff8a9f13051e00 R14: ffff8a9f13051e50 R15:
>>> 0000000000000010
>>> <4>[ 74.046418] FS: 0000799af9db9740(0000)
>>> GS:ffff8aa486e9d000(0000) knlGS:0000000000000000
>>> <4>[ 74.046420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> <4>[ 74.046421] CR2: 0000000000000010 CR3: 000000016dfaa000 CR4:
>>> 0000000000f50ef0
>>> <4>[ 74.046423] PKRU: 55555554
>>> <4>[ 74.046425] Call Trace:
>>> <4>[ 74.046427] <TASK>
>>> <4>[ 74.046432] ? __pfx_platform_pm_resume+0x10/0x10
>>> <4>[ 74.046439] platform_pm_resume+0x28/0x60
>>> <4>[ 74.046443] dpm_run_callback+0x63/0x160
>>> <4>[ 74.046447] device_resume+0x15c/0x260
>>> <4>[ 74.046450] dpm_resume+0x15d/0x230
>>> <4>[ 74.046453] dpm_resume_end+0x11/0x30
>>> <4>[ 74.046456] suspend_devices_and_enter+0x1ea/0x2c0
>>> <4>[ 74.046460] enter_state+0x223/0x560
>>> Oops#1 Part3
>>> <4>[ 74.046463] pm_suspend+0x4e/0x80
>>>
>>> We'll need some more time to dig into it, but I wanted to share the
>>> trace in case it makes it jump out to anyone what's going on.
>>>
>>> Just looking at git blame from that function is this perhaps
>>> 8fd0e127d8da856e34391399df40b33af2b307e0?
>> Reverting a95a1dbbd3d64adf392fed13c8eef4f72b4e5b90 seems to help the
>> issue on S16 here.
>>
>> Jacek - can you reproduce with that reverted?
> Hi Mark Brown,
>
> We will send a fix patch to resolve this issue.
Hi Folks,
I've just build and tested the fixes that are heading to rc5 and the
issue is fixed with
https://lore.kernel.org/linux-sound/20250425060144.1773265-1-venkataprasad.potturu@amd.com/
Thanks,
-Jacek
Powered by blists - more mailing lists