lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <691c35de-f054-40a1-98bb-2b602e256011@amd.com>
Date: Wed, 23 Apr 2025 14:27:11 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Mark Brown <broonie@...nel.org>, Jacek Luczak <difrost.kernel@...il.com>,
 "Mukunda, Vijendar" <Vijendar.Mukunda@....com>
Cc: regressions@...ts.linux.dev, linux-sound@...r.kernel.org,
 venkataprasad.potturu@....com, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [REGRESSION] Resume from suspend broken in 6.15-rc due to ACP
 changes.

On 4/23/2025 2:12 PM, Mario Limonciello wrote:
> On 4/23/2025 10:18 AM, Mario Limonciello wrote:
>> On 4/23/2025 10:06 AM, Mark Brown wrote:
>>> On Wed, Apr 16, 2025 at 01:20:33PM +0200, Jacek Luczak wrote:
>>>> Hi,
>>>>
>>>> On my ASUS Vivobook S16 (and on similar ASUS HW - see [1]) on resume
>>>> from suspend system dies (no logs available) soon after GPU completes
>>>> resume - I can see the login screen, only power cycle left.
>>>
>>> Are there any updates on this from the AMD side?  As things stand my
>>> inclination is to revert the bulk of the changes to the driver from the
>>> past merge window, I don't really know anything about this hardware
>>> specifically and "dies without logs" is obviously giving few hints.
>>> None of the skipped commits looks immediately suspect, there's doubtless
>>> some unintended change in there.
>>
>> This is the first I'm hearing of it; I expect we can dig in and find a 
>> solution so we don't need to revert that whole series.
>>
>> Let me add Vijendar to check if this jumps out to him what went wrong.
>>
>> * Can we please see the full dmesg up to the failure?
>> * journalctl -k -b-1 can fetch everything from the last boot up until 
>> the freeze.
>> * Any crash in /var/lib/systemd/pstore by chance?
>>
>>>
>>> Adding Mario and leaving the context for his benefit.
>>
>> To double check - can you blacklist the ACP driver and suspend/resume 
>> and everything is OK?
>>
>> If possible can you please capture a report with https:// 
>> web.git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug- 
>> tools.git/ tree/amd_s2idle.py both in the case of ACP driver 
>> blacklisted and not blacklisted?  I would like to compare.
>>
>> Also; can you put all these artifacts I'm asking for into somewhere 
>> non ephemeral like a kernel bugzilla?  You can loop me and Vijendar 
>> into it.
> 
> FYI - We managed to track an S16 down and can reproduce the issue.
> It's a NULL pointer deref happening on the resume path.
> 
> <1>[   74.046372] BUG: kernel NULL pointer dereference, address: 
> 0000000000000010
> <1>[   74.046375] #PF: supervisor read access in kernel mode
> <1>[   74.046377] #PF: error_code(0x0000) - not-present page
> <6>[   74.046380] PGD 0 P4D 0
> <4>[   74.046384] Oops: Oops: 0000 [#1] SMP NOPTI
> <4>[   74.046389] CPU: 4 UID: 0 PID: 2563 Comm: rtcwake Not tainted 
> 6.15.0-061500rc3-generic #202504202138 PREEMPT(voluntary)
> Oops#1 Part4
> <4>[   74.046394] Hardware name: ASUSTeK COMPUTER INC. ASUS Vivobook S 
> 16 M5606KA_M5606KA/M5606KA, BIOS M5606KA.304 01/24/2025
> <4>[   74.046396] RIP: 0010:acp70_pcm_resume+0x4f/0xe0 [snd_acp70]
> <4>[   74.046405] Code: 48 89 45 d0 e8 c2 da 98 fc 49 8b 5d 50 49 39 de 
> 75 18 eb 7b 48 89 da 4c 89 ee 4c 89 ff e8 29 cc f6 ff 48 8b 1b 4c 39 f3 
> 74 65 <4c> 8b 7b 10 4d 85 ff 74 ef 49 8b 97 c0 00 00 00 48 85 d2 74 e3 8b
> <4>[   74.046407] RSP: 0018:ffffd12644d13880 EFLAGS: 00010286
> <4>[   74.046410] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
> 0000000000000000
> <4>[   74.046412] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
> 0000000000000000
> <4>[   74.046413] RBP: ffffd12644d138b0 R08: 0000000000000000 R09: 
> 0000000000000000
> <4>[   74.046415] R10: 0000000000000000 R11: 0000000000000000 R12: 
> ffffffffbd774fd0
> <4>[   74.046416] R13: ffff8a9f13051e00 R14: ffff8a9f13051e50 R15: 
> 0000000000000010
> <4>[   74.046418] FS:  0000799af9db9740(0000) GS:ffff8aa486e9d000(0000) 
> knlGS:0000000000000000
> <4>[   74.046420] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[   74.046421] CR2: 0000000000000010 CR3: 000000016dfaa000 CR4: 
> 0000000000f50ef0
> <4>[   74.046423] PKRU: 55555554
> <4>[   74.046425] Call Trace:
> <4>[   74.046427]  <TASK>
> <4>[   74.046432]  ? __pfx_platform_pm_resume+0x10/0x10
> <4>[   74.046439]  platform_pm_resume+0x28/0x60
> <4>[   74.046443]  dpm_run_callback+0x63/0x160
> <4>[   74.046447]  device_resume+0x15c/0x260
> <4>[   74.046450]  dpm_resume+0x15d/0x230
> <4>[   74.046453]  dpm_resume_end+0x11/0x30
> <4>[   74.046456]  suspend_devices_and_enter+0x1ea/0x2c0
> <4>[   74.046460]  enter_state+0x223/0x560
> Oops#1 Part3
> <4>[   74.046463]  pm_suspend+0x4e/0x80
> 
> We'll need some more time to dig into it, but I wanted to share the 
> trace in case it makes it jump out to anyone what's going on.
> 
> Just looking at git blame from that function is this perhaps 
> 8fd0e127d8da856e34391399df40b33af2b307e0?

Reverting a95a1dbbd3d64adf392fed13c8eef4f72b4e5b90 seems to help the 
issue on S16 here.

Jacek - can you reproduce with that reverted?

> 
>>
>>>
>>>> I've managed to bisect this as close as possible to following commits:
>>>> - [f8b4f3f525e82d78079a6ebbde68e4a0d79fd1c0] ASoC: amd: acp: Refactor
>>>> acp70 platform resource structure
>>>> - [c8b5f251f0e53edab220ac4edf444120815fed3c] ASoC: amd: acp: Remove 
>>>> white line
>>>> - [a95a1dbbd3d64adf392fed13c8eef4f72b4e5b90] ASoC: amd: acp: Move
>>>> spin_lock and list initialization to acp-pci driver
>>>> - [e3933683b25e2cc94485da4909e3338e1a177b39] ASoC: amd: acp: Remove
>>>> redundant acp_dev_data structure
>>>> - [aaf7a668bb3814f084f9f6f673567f6aa316632f] ASoC: amd: acp: Add new
>>>> interrupt handle callbacks in acp_common_hw_ops
>>>>
>>>> Attached lspci and bisection log.
>>>>
>>>> Regards,
>>>> -jacek
>>>>
>>>> [1] https://bbs.archlinux.org/viewtopic.php?id=304816
>>>
>>>> git bisect start
>>>> # status: waiting for both good and bad commits
>>>> # good: [ed92bc5264c4357d4fca292c769ea9967cd3d3b6] ASoC: codecs: 
>>>> wm0010: Fix error handling path in wm0010_spi_probe()
>>>> git bisect good ed92bc5264c4357d4fca292c769ea9967cd3d3b6
>>>> # status: waiting for bad commit, 1 good commit known
>>>> # bad: [47c4f9b1722fd883c9745d7877cb212e41dd2715] Tidy up ASoC 
>>>> control get and put handlers
>>>> git bisect bad 47c4f9b1722fd883c9745d7877cb212e41dd2715
>>>> # good: [74da545ec6a8b41de96b4c350bb59dfe45c0d822] ASoC: codec: 
>>>> madera: use inclusive language for SND_SOC_DAIFMT_CBx_CFx
>>>> git bisect good 74da545ec6a8b41de96b4c350bb59dfe45c0d822
>>>> # bad: [a935b3f981809272d2649ad9c27a751685137846] ASoC: SOF: ipc4- 
>>>> topology: Allocate ref_params on stack
>>>> git bisect bad a935b3f981809272d2649ad9c27a751685137846
>>>> # good: [24056de9976dfc33801d2574c1672d91f840277a] ASoC: codecs: 
>>>> Update device_id tables for Realtek
>>>> git bisect good 24056de9976dfc33801d2574c1672d91f840277a
>>>> # good: [a1462fb8b5dd1018e3477a6861822d75c6a59449] ASoC: Intel: 
>>>> boards: updates for 6.15
>>>> git bisect good a1462fb8b5dd1018e3477a6861822d75c6a59449
>>>> # skip: [8a7e7a03e3c53cd9abbbf233899cc2e05b2c6ec0] ASoC: SOF: Intel: 
>>>> Add support for ACE3+ mic privacy
>>>> git bisect skip 8a7e7a03e3c53cd9abbbf233899cc2e05b2c6ec0
>>>> # skip: [aaf7a668bb3814f084f9f6f673567f6aa316632f] ASoC: amd: acp: 
>>>> Add new interrupt handle callbacks in acp_common_hw_ops
>>>> git bisect skip aaf7a668bb3814f084f9f6f673567f6aa316632f
>>>> # good: [c6141ba0110f98266106699aca071fed025c3d64] ASoC: Merge up fixes
>>>> git bisect good c6141ba0110f98266106699aca071fed025c3d64
>>>> # skip: [ad5a0970f86d82e39ebd06d45a1f7aa48a1316f8] ASoC: cs35l41: 
>>>> check the return value from spi_setup()
>>>> git bisect skip ad5a0970f86d82e39ebd06d45a1f7aa48a1316f8
>>>> # good: [269b844239149a9bbaba66518db99ebb06554a15] ASoC: dapm: Fix 
>>>> changes to DECLARE_ADAU17X1_DSP_MUX_CTRL
>>>> git bisect good 269b844239149a9bbaba66518db99ebb06554a15
>>>> # skip: [89be3c15a58b2ccf31e969223c8ac93ca8932d81] ASoC: qcom: 
>>>> sm8250: explicitly set format in sm8250_be_hw_params_fixup()
>>>> git bisect skip 89be3c15a58b2ccf31e969223c8ac93ca8932d81
>>>> # bad: [02e1cf7a352a3ba5f768849f2b4fcaaaa19f89e3] ASoC: amd: acp: 
>>>> Fix for enabling DMIC on acp platforms via _DSD entry
>>>> git bisect bad 02e1cf7a352a3ba5f768849f2b4fcaaaa19f89e3
>>>> # good: [7a2ff0510c51462c0a979f5006d375a2b23d46e9] ASoC: soc-pcm: 
>>>> reuse dpcm_state_string()
>>>> git bisect good 7a2ff0510c51462c0a979f5006d375a2b23d46e9
>>>> # good: [a8fed0bddf8fa239fc71dc5c035d2e078c597369] ASoC: dt- 
>>>> bindings: add regulator support to dmic codec
>>>> git bisect good a8fed0bddf8fa239fc71dc5c035d2e078c597369
>>>> # bad: [ee7ab0fd540877fceb3d51f87016e6531d86406f] ASoC: amd: acp: 
>>>> Refactor rembrant platform resource structure
>>>> git bisect bad ee7ab0fd540877fceb3d51f87016e6531d86406f
>>>> # good: [0d2d276f53ea3ba1686619cde503d9748f58a834] ASoC: SOF: Intel: 
>>>> lnl/ptl: Only set dsp_ops which differs from MTL
>>>> git bisect good 0d2d276f53ea3ba1686619cde503d9748f58a834
>>>> # good: [8aeb7d2c3fc315e629d252cd601598a5af74bbb0] ASoC: SOF: Intel: 
>>>> Create ptl.c as placeholder for Panther Lake features
>>>> git bisect good 8aeb7d2c3fc315e629d252cd601598a5af74bbb0
>>>> # skip: [ac5b4a24f16f2f56b5cc5092969930b867274edc] ASoC: Intel: soc- 
>>>> acpi-intel-ptl-match: Add cs42l43 support
>>>> git bisect skip ac5b4a24f16f2f56b5cc5092969930b867274edc
>>>> # skip: [8ae746fe51041484e52eba99bed15a444c7d4372] ASoC: amd: acp: 
>>>> Implement acp_common_hw_ops support for acp platforms
>>>> git bisect skip 8ae746fe51041484e52eba99bed15a444c7d4372
>>>> # good: [0978e8207b61ac6d51280e5d28ccfff75d653363] ASoC: SOF: Intel: 
>>>> hda-mlink: Add support for mic privacy in VS SHIM registers
>>>> git bisect good 0978e8207b61ac6d51280e5d28ccfff75d653363
>>>> # good: [4a43c3241ec3465a501825ecaf051e5a1d85a60b] ASoC: SOF: Intel: 
>>>> ptl: Add support for mic privacy
>>>> git bisect good 4a43c3241ec3465a501825ecaf051e5a1d85a60b
>>>> # skip: [1ec3f1dc215d4b3d3679ecdc4a549d4e82b3a609] ASoC: dmic: add 
>>>> regulator support
>>>> git bisect skip 1ec3f1dc215d4b3d3679ecdc4a549d4e82b3a609
>>>> # good: [e2cda461765692757cd5c3b1fc80bd260ffe1394] ASoC: amd: acp: 
>>>> Refactor dmic-codec platform device creation
>>>> git bisect good e2cda461765692757cd5c3b1fc80bd260ffe1394
>>>> # skip: [a95a1dbbd3d64adf392fed13c8eef4f72b4e5b90] ASoC: amd: acp: 
>>>> Move spin_lock and list initialization to acp-pci driver
>>>> git bisect skip a95a1dbbd3d64adf392fed13c8eef4f72b4e5b90
>>>> # bad: [f8b4f3f525e82d78079a6ebbde68e4a0d79fd1c0] ASoC: amd: acp: 
>>>> Refactor acp70 platform resource structure
>>>> git bisect bad f8b4f3f525e82d78079a6ebbde68e4a0d79fd1c0
>>>> # good: [6e60db74b69c29b528c8d10d86108f78f2995dcb] ASoC: amd: acp: 
>>>> Refactor acp machine select
>>>> git bisect good 6e60db74b69c29b528c8d10d86108f78f2995dcb
>>>> # skip: [e3933683b25e2cc94485da4909e3338e1a177b39] ASoC: amd: acp: 
>>>> Remove redundant acp_dev_data structure
>>>> git bisect skip e3933683b25e2cc94485da4909e3338e1a177b39
>>>> # skip: [c8b5f251f0e53edab220ac4edf444120815fed3c] ASoC: amd: acp: 
>>>> Remove white line
>>>> git bisect skip c8b5f251f0e53edab220ac4edf444120815fed3c
>>>> # only skipped commits left to test
>>>> # possible first bad commit: 
>>>> [f8b4f3f525e82d78079a6ebbde68e4a0d79fd1c0] ASoC: amd: acp: Refactor 
>>>> acp70 platform resource structure
>>>> # possible first bad commit: 
>>>> [c8b5f251f0e53edab220ac4edf444120815fed3c] ASoC: amd: acp: Remove 
>>>> white line
>>>> # possible first bad commit: 
>>>> [a95a1dbbd3d64adf392fed13c8eef4f72b4e5b90] ASoC: amd: acp: Move 
>>>> spin_lock and list initialization to acp-pci driver
>>>> # possible first bad commit: 
>>>> [e3933683b25e2cc94485da4909e3338e1a177b39] ASoC: amd: acp: Remove 
>>>> redundant acp_dev_data structure
>>>> # possible first bad commit: 
>>>> [aaf7a668bb3814f084f9f6f673567f6aa316632f] ASoC: amd: acp: Add new 
>>>> interrupt handle callbacks in acp_common_hw_ops
>>>
>>>> 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>>> Strix/ Strix Halo Root Complex [1022:1507]
>>>> 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Strix/Strix 
>>>> Halo IOMMU [1022:1508]
>>>> 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>>> Strix/ Strix Halo Dummy Host Bridge [1022:1509]
>>>> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/ 
>>>> Strix Halo PCIe USB4 Bridge [1022:150a]
>>>> 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>>> Strix/ Strix Halo Dummy Host Bridge [1022:1509]
>>>> 00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/ 
>>>> Strix Halo GPP Bridge [1022:150b]
>>>> 00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/ 
>>>> Strix Halo GPP Bridge [1022:150b]
>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>>> Strix/ Strix Halo Dummy Host Bridge [1022:1509]
>>>> 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>>> Strix/ Strix Halo Dummy Host Bridge [1022:1509]
>>>> 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/ 
>>>> Strix Halo Internal GPP Bridge to Bus [C:A] [1022:150c]
>>>> 00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/ 
>>>> Strix Halo Internal GPP Bridge to Bus [C:A] [1022:150c]
>>>> 00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/ 
>>>> Strix Halo Internal GPP Bridge to Bus [C:A] [1022:150c]
>>>> 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus 
>>>> Controller [1022:790b] (rev 71)
>>>> 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH 
>>>> LPC Bridge [1022:790e] (rev 51)
>>>> 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix 
>>>> Data Fabric; Function 0 [1022:16f8]
>>>> 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix 
>>>> Data Fabric; Function 1 [1022:16f9]
>>>> 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix 
>>>> Data Fabric; Function 2 [1022:16fa]
>>>> 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix 
>>>> Data Fabric; Function 3 [1022:16fb]
>>>> 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix 
>>>> Data Fabric; Function 4 [1022:16fc]
>>>> 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix 
>>>> Data Fabric; Function 5 [1022:16fd]
>>>> 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix 
>>>> Data Fabric; Function 6 [1022:16fe]
>>>> 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix 
>>>> Data Fabric; Function 7 [1022:16ff]
>>>> 61:00.0 Non-Volatile memory controller [0108]: Micron Technology Inc 
>>>> 2400 NVMe SSD (DRAM-less) [1344:5413] (rev 03)
>>>> 62:00.0 Network controller [0280]: MEDIATEK Corp. MT7922 802.11ax 
>>>> PCI Express Wireless Network Adapter [14c3:0616]
>>>> 63:00.0 Display controller [0380]: Advanced Micro Devices, Inc. 
>>>> [AMD/ ATI] Strix [Radeon 880M / 890M] [1002:150e] (rev c1)
>>>> 63:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] 
>>>> Rembrandt Radeon High Definition Audio Controller [1002:1640]
>>>> 63:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. 
>>>> [AMD] Strix/Krackan/Strix Halo CCP/ASP [1022:17e0]
>>>> 63:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
>>>> Device [1022:151e]
>>>> 63:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. 
>>>> [AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 70)
>>>> 63:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] 
>>>> Family 17h/19h/1ah HD Audio Controller [1022:15e3]
>>>> 64:00.0 Non-Essential Instrumentation [1300]: Advanced Micro 
>>>> Devices, Inc. [AMD] Strix/Strix Halo PCIe Dummy Function [1022:150d]
>>>> 64:00.1 Signal processing controller [1180]: Advanced Micro Devices, 
>>>> Inc. [AMD] Strix/Krackan/Strix Halo Neural Processing Unit 
>>>> [1022:17f0] (rev 10)
>>>> 65:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
>>>> Device [1022:151f]
>>>> 65:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
>>>> Device [1022:151a]
>>>> 65:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
>>>> Device [1022:151b]
>>>> 65:00.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
>>>> Device [1022:151c]
>>
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ