[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANiDSCvFzrEMwbpbA-gpUr2xSNU1J1q2nx6Mvpev-jtbvMPrcA@mail.gmail.com>
Date: Mon, 28 Nov 2022 14:34:21 +0100
From: Ricardo Ribalda <ribalda@...omium.org>
To: Takashi Iwai <tiwai@...e.de>
Cc: Takashi Iwai <tiwai@...e.com>, Len Brown <len.brown@...el.com>,
Pierre-Louis Bossart <pierre-louis.bossart@...ux.intel.com>,
Kai Vehmanen <kai.vehmanen@...ux.intel.com>,
Ranjani Sridharan <ranjani.sridharan@...ux.intel.com>,
Mark Brown <broonie@...nel.org>,
Jaroslav Kysela <perex@...ex.cz>, Pavel Machek <pavel@....cz>,
"Rafael J. Wysocki" <rafael@...nel.org>,
alsa-devel@...a-project.org,
"Joel Fernandes (Google)" <joel@...lfernandes.org>,
linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: [PATCH v3 0/2] ALSA: core: Fix deadlock when shutdown a frozen userspace
Hi Takashi
Thanks for your response
On Mon, 28 Nov 2022 at 10:53, Takashi Iwai <tiwai@...e.de> wrote:
>
> On Mon, 28 Nov 2022 10:26:36 +0100,
> Ricardo Ribalda wrote:
> >
> > Hi Takashi
> >
> > Thanks for your prompt reply
> >
> > On Mon, 28 Nov 2022 at 10:24, Takashi Iwai <tiwai@...e.de> wrote:
> > >
> > > On Mon, 28 Nov 2022 10:10:12 +0100,
> > > Ricardo Ribalda wrote:
> > > >
> > > > Since 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown")
> > > > we wait for userspace to close its fds.
> > >
> > > IMO, the fix above brought more problem. If you'd need to want to
> > > avoid later accesses during shutdown, the driver should rather just
> > > disconnect devices without waiting for the user-space completion.
> > > And, for that, a simple call of snd_card_disconnect() should suffice.
> > >
> > > > But that will never occur with a frozen userspace (like during kexec()).
> > > >
> > > > Lets detect the frozen userpace and act accordingly.
> > >
> > > ... and skipping the user-space sync at snd_card_disconnect_sync() as
> > > of this patch set is a dangerous move, I'm afraid. The user-space
> > > gets frozen also at the normal suspend/resume, and it implies that the
> > > sync will be lost even for the normal PM, too (although it must be a
> > > very corner case).
> > >
> >
> > And what about checking kexec_in_progress instead?
>
> I still think that the call of snd_card_disconnect_sync() itself at
> shutdown is somehow wrong. If this only comes from the SOF code path
> above, we should address in that code path instead.
>
> OTOH, you showed two code paths: one is
>
> [ 84.943749] Freezing user space processes ... (elapsed 0.111 seconds) done.
> [ 246.784446] INFO: task kexec-lite:5123 blocked for more than 122 seconds.
> [ 246.819035] Call Trace:
> [ 246.821782] <TASK>
> [ 246.824186] __schedule+0x5f9/0x1263
> [ 246.828231] schedule+0x87/0xc5
> [ 246.831779] snd_card_disconnect_sync+0xb5/0x127
> ...
> [ 246.889249] snd_sof_device_shutdown+0xb4/0x150
> [ 246.899317] pci_device_shutdown+0x37/0x61
> [ 246.903990] device_shutdown+0x14c/0x1d6
> [ 246.908391] kernel_kexec+0x45/0xb9
>
> and another is
>
> [ 246.893222] INFO: task kexec-lite:4891 blocked for more than 122 seconds.
> [ 246.927709] Call Trace:
> [ 246.930461] <TASK>
> [ 246.932819] __schedule+0x5f9/0x1263
> [ 246.936855] ? fsnotify_grab_connector+0x5c/0x70
> [ 246.942045] schedule+0x87/0xc5
> [ 246.945567] schedule_timeout+0x49/0xf3
> [ 246.949877] wait_for_completion+0x86/0xe8
> [ 246.954463] snd_card_free+0x68/0x89
> ...
> [ 247.001080] platform_device_unregister+0x12/0x35
>
> The former is likely the SOF code path by the commit you mentioned
> (but it's not 100% clear because you trimmed the stack trace), and
> this should be reconsidered.
Let me add both traces completely:
[ 247.080494] INFO: task kexec-lite:5441 blocked for more than 122 seconds.
[ 247.088100] Tainted: G U 5.15.79-14932-g6eecac25030c #5
[ 247.095984] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 247.104745] task:kexec-lite state:D stack: 0 pid: 5441
ppid: 5390 flags:0x00004002
[ 247.114083] Call Trace:
[ 247.116833] <TASK>
[ 247.119185] __schedule+0x5f9/0x1263
[ 247.123193] schedule+0x8e/0xdb
[ 247.126698] snd_card_disconnect_sync+0xb5/0x127
[ 247.131877] ? init_wait_entry+0x31/0x31
[ 247.136270] soc_cleanup_card_resources+0x27/0x1d3
[ 247.141636] snd_soc_unbind_card+0xa6/0xfc
[ 247.146218] snd_soc_unregister_card+0x26/0x34
[ 247.151192] release_nodes+0x43/0x62
[ 247.155200] devres_release_all+0x8b/0xc4
[ 247.159695] device_release_driver_internal+0x11a/0x1be
[ 247.165546] bus_remove_device+0xd9/0x103
[ 247.170038] device_del+0x1f7/0x355
[ 247.173946] platform_device_del+0x28/0x8e
[ 247.178539] platform_device_unregister+0x12/0x35
[ 247.183807] snd_sof_device_shutdown+0x57/0x94 [snd_sof
7d15170beb0e48032711b3230201744d5e4e590d]
[ 247.193742] pci_device_shutdown+0x37/0x61
[ 247.198335] device_shutdown+0x14c/0x1d6
[ 247.202735] kernel_kexec+0x45/0xb9
[ 247.206646] __se_sys_reboot+0x173/0x1f6
[ 247.211041] ? syscall_enter_from_user_mode+0x1a6/0x1ab
[ 247.216885] do_syscall_64+0x55/0x9d
[ 247.220879] ? exit_to_user_mode_prepare+0x3c/0x8b
[ 247.226245] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 247.231889] RIP: 0033:0x79b6a8563693
[ 247.235897] RSP: 002b:00007fff9ab771b8 EFLAGS: 00000202 ORIG_RAX:
00000000000000a9
[ 247.244375] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000079b6a8563693
[ 247.252363] RDX: 0000000045584543 RSI: 0000000028121969 RDI: 00000000fee1dead
[ 247.260337] RBP: 00007fff9ab77200 R08: 0000000000000004 R09: 00007fff9ab7930b
[ 247.268323] R10: 00007fff9ab7936a R11: 0000000000000202 R12: 00000000ffffffff
[ 247.276297] R13: 0000000000000004 R14: 00007fff9ab7936a R15: 0000000000000003
[ 247.284272] </TASK>
[ 247.286729] task:init state:D stack: 0 pid: 1
ppid: 0 flags:0x00004006
[ 247.296079] Call Trace:
[ 247.298810] <TASK>
[ 247.301155] __schedule+0x5f9/0x1263
[ 247.305163] ? core_sys_select+0x278/0x343
[ 247.309754] schedule+0x8e/0xdb
[ 247.313261] __refrigerator+0x5e/0x97
[ 247.317364] get_signal+0x5e4/0x5e9
[ 247.321280] arch_do_signal_or_restart+0x4a/0x27a
[ 247.326550] exit_to_user_mode_loop+0x76/0xde
[ 247.331433] exit_to_user_mode_prepare+0x61/0x8b
[ 247.336606] syscall_exit_to_user_mode+0x26/0x168
[ 247.341877] do_syscall_64+0x63/0x9d
[ 247.345871] ? exit_to_user_mode_prepare+0x3c/0x8b
[ 247.351227] entry_SYSCALL_64_after_hwframe+0x61/0xcb
Then, after adding the (pm_freezing hack) in card_disconnect_sync
[ 247.081334] INFO: task kexec-lite:5568 blocked for more than 122 seconds.
[ 247.088940] Tainted: G U
5.15.79-14932-g6eecac25030c-dirty #6
[ 247.097428] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 247.106198] task:kexec-lite state:D stack: 0 pid: 5568
ppid: 5389 flags:0x00004002
[ 247.115552] Call Trace:
[ 247.118297] <TASK>
[ 247.120651] __schedule+0x5f9/0x1263
[ 247.124655] ? fsnotify_grab_connector+0x5c/0x70
[ 247.129829] schedule+0x8e/0xdb
[ 247.133354] schedule_timeout+0x49/0xf3
[ 247.137655] wait_for_completion+0x86/0xe8
[ 247.142249] snd_card_free+0x68/0x89
[ 247.146258] soc_cleanup_card_resources+0x1b5/0x1d3
[ 247.151725] snd_soc_unbind_card+0xa6/0xfc
[ 247.156314] snd_soc_unregister_card+0x26/0x34
[ 247.161293] release_nodes+0x43/0x62
[ 247.165303] devres_release_all+0x8b/0xc4
[ 247.169797] device_release_driver_internal+0x11a/0x1be
[ 247.175654] bus_remove_device+0xd9/0x103
[ 247.180147] device_del+0x1f7/0x355
[ 247.184058] platform_device_del+0x28/0x8e
[ 247.188649] platform_device_unregister+0x12/0x35
[ 247.193923] snd_sof_device_shutdown+0x57/0x94 [snd_sof
e88f872f6695f09a149f1a182cec1fe8e07d5eea]
[ 247.203858] pci_device_shutdown+0x37/0x61
[ 247.208455] device_shutdown+0x14c/0x1d6
[ 247.212858] kernel_kexec+0x45/0xb9
[ 247.216759] __se_sys_reboot+0x173/0x1f6
[ 247.221157] ? syscall_enter_from_user_mode+0x1a6/0x1ab
[ 247.227016] do_syscall_64+0x55/0x9d
[ 247.231020] ? exit_to_user_mode_prepare+0x3c/0x8b
[ 247.236386] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 247.242048] RIP: 0033:0x7d2d2634a693
[ 247.246056] RSP: 002b:00007ffdceeb6338 EFLAGS: 00000202 ORIG_RAX:
00000000000000a9
[ 247.254532] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007d2d2634a693
[ 247.262520] RDX: 0000000045584543 RSI: 0000000028121969 RDI: 00000000fee1dead
[ 247.270500] RBP: 00007ffdceeb6380 R08: 0000000000000004 R09: 00007ffdceeb8305
[ 247.278489] R10: 00007ffdceeb836a R11: 0000000000000202 R12: 00000000ffffffff
[ 247.286468] R13: 0000000000000004 R14: 00007ffdceeb836a R15: 0000000000000003
[ 247.294458] </TASK>
[ 247.296909] task:init state:D stack: 0 pid: 1
ppid: 0 flags:0x00004006
[ 247.306253] Call Trace:
[ 247.308994] <TASK>
[ 247.311337] __schedule+0x5f9/0x1263
[ 247.315330] ? core_sys_select+0x278/0x343
[ 247.319924] schedule+0x8e/0xdb
[ 247.323445] __refrigerator+0x5e/0x97
[ 247.327537] get_signal+0x5e4/0x5e9
[ 247.331449] arch_do_signal_or_restart+0x4a/0x27a
[ 247.336721] exit_to_user_mode_loop+0x76/0xde
[ 247.341602] exit_to_user_mode_prepare+0x61/0x8b
[ 247.346763] syscall_exit_to_user_mode+0x26/0x168
[ 247.352035] do_syscall_64+0x63/0x9d
[ 247.356043] ? exit_to_user_mode_prepare+0x3c/0x8b
[ 247.361410] entry_SYSCALL_64_after_hwframe+0x61/0xcb
Simply commenting out snd_sof_machine_unregister(sdev, pdata); does
the trick though... will send a new version with just that.
>
> But, the latter seems to be independent from that. If that's the code
> path where the unbind is triggered before kexec, your fix might not
> work, either; it could be already at the wait_event*() when kexec
> starts.
>
> Maybe a simpler workaround would be to replace it with
> wait_event_killable*() stuff. But whether we can discontinue the sync
> there is still another thing to consider...
>
>
> Takashi
>
> >
> > Thanks!
> >
> > >
> > > thanks,
> > >
> > > Takashi
> > >
> > > >
> > > > To: Jaroslav Kysela <perex@...ex.cz>
> > > > To: Takashi Iwai <tiwai@...e.com>
> > > > To: "Rafael J. Wysocki" <rafael@...nel.org>
> > > > To: Pavel Machek <pavel@....cz>
> > > > To: Len Brown <len.brown@...el.com>
> > > > To: Kai Vehmanen <kai.vehmanen@...ux.intel.com>
> > > > To: Ranjani Sridharan <ranjani.sridharan@...ux.intel.com>
> > > > To: Pierre-Louis Bossart <pierre-louis.bossart@...ux.intel.com>
> > > > To: Mark Brown <broonie@...nel.org>
> > > > Cc: alsa-devel@...a-project.org
> > > > Cc: linux-kernel@...r.kernel.org
> > > > Cc: "Joel Fernandes (Google)" <joel@...lfernandes.org>
> > > > Cc: linux-pm@...r.kernel.org
> > > > Signed-off-by: Ricardo Ribalda <ribalda@...omium.org>
> > > > ---
> > > > Changes in v3:
> > > > - Wrap pm_freezing in a function
> > > > - Link to v2: https://lore.kernel.org/r/20221127-snd-freeze-v2-0-d8a425ea9663@chromium.org
> > > >
> > > > Changes in v2:
> > > > - Only use pm_freezing if CONFIG_FREEZER
> > > > - Link to v1: https://lore.kernel.org/r/20221127-snd-freeze-v1-0-57461a366ec2@chromium.org
> > > >
> > > > ---
> > > > Ricardo Ribalda (2):
> > > > freezer: Add processes_frozen()
> > > > ALSA: core: Fix deadlock when shutdown a frozen userspace
> > > >
> > > > include/linux/freezer.h | 2 ++
> > > > kernel/freezer.c | 11 +++++++++++
> > > > sound/core/init.c | 13 +++++++++++++
> > > > 3 files changed, 26 insertions(+)
> > > > ---
> > > > base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
> > > > change-id: 20221127-snd-freeze-1ee143228326
> > > >
> > > > Best regards,
> > > > --
> > > > Ricardo Ribalda <ribalda@...omium.org>
> > > >
> >
> >
> >
> > --
> > Ricardo Ribalda
> >
--
Ricardo Ribalda
Powered by blists - more mailing lists