[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240724004446.GA21424@workstation.local>
Date: Wed, 24 Jul 2024 09:44:46 +0900
From: Takashi Sakamoto <o-takashi@...amocchi.jp>
To: Edmund Raile <edmund.raile@...ton.me>
Cc: clemens@...isch.de, tiwai@...e.com, alsa-devel@...a-project.org,
linux-sound@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ALSA: firewire-lib: restore process context workqueue to
prevent deadlock
Hi,
Thank you for your posting the patch, however I have some nitpickings
(and some requests to you).
On Thu, Jul 18, 2024 at 11:56:54AM +0000, Edmund Raile wrote:
> Commit b5b519965c4c ("ALSA: firewire-lib: operate for period elapse event
> in process context") removed the process context workqueue from
> amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
> its overhead.
>
> With RME Fireface 800, this lead to a regression since Kernels 5.14.0,
> causing a deadlock with eventual system freeze under ALSA operation:
>
> ? tasklet_unlock_spin_wait
> </NMI>
> <TASK>
> ohci_flush_iso_completions firewire_ohci
> amdtp_domain_stream_pcm_pointer snd_firewire_lib
> snd_pcm_update_hw_ptr0 snd_pcm
> snd_pcm_status64 snd_pcm
>
> ? native_queued_spin_lock_slowpath
> </NMI>
> <IRQ>
> _raw_spin_lock_irqsave
> snd_pcm_period_elapsed snd_pcm
> process_rx_packets snd_firewire_lib
> irq_target_callback snd_firewire_lib
> handle_it_packet firewire_ohci
> context_tasklet firewire_ohci
>
> Restore the work queue to prevent deadlock between ALSA substream
> process context spin_lock of snd_pcm_stream_lock_irq() in snd_pcm_status64()
> and OHCI 1394 IT softIRQ context spin_lock of snd_pcm_stream_lock_irqsave()
> in snd_pcm_period_elapsed().
>
> to reproduce the issue:
> direct ALSA playback to the device:
> mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac
> Time to occurrence: 2s to 30m
> Likelihood increased by:
> - high CPU load
> stress --cpu $(nproc)
> - switching between applications via workspaces
> tested with i915 in Xfce
> PulsaAudio / PipeWire conceal the issue as they run PCM substream
> without period wakeup mode, issuing less hardIRQs.
>
> Closes: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u
> Fixes: 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context")
> Signed-off-by: Edmund Raile <edmund.raile@...ton.me>
> ---
> This is the follow-up patch to the 5.14.0 regression I reported:
> https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u
> ("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock with Fireface 800")
>
> Takashi Sakamoto explained the issue in his response to the regression:
> A. In the process context
> * (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in
> snd_pcm_status64()
> * (lock B) Then attempt to enter tasklet
>
> B. In the softIRQ context
> * (lock B) Enter tasklet
> * (lock A) Attempt to acquire spin_lock by snd_pcm_stream_lock_irqsave() in
> snd_pcm_period_elapsed()
>
> This leads me to believe this isn't just an issue limited to the RME Fireface
We are in the merge window to Linux kernel 6.11[1]. I prefer reviewing a
small and trivial patches in the weeks, thus I would like to postpone
applying this kind of patches after releasing -rc1 even if they look
good.
In Linux kernel development, we have the process to apply patches into
released kernels to fix bugs and regressions. As of today, the developers
pay some of their efforts to maintain the following kernels[2]:
* 6.10
* 6.9.10
* 6.6.41
* 6.1.100
* 5.15.163
* 5.10.222
* 5.4.280
* 4.19.318
It is cleared that the issued changes were applied to v5.14 kernel, thus we
should lead the developers to find the posted patches and apply them to
future releases of each version. The process[3] have come into existence
enough before the regression report procedure[4]. I would like you to read
some documentation about the process and add more care for stable kernel
maintainers.
Well, the issued commits are (older at first):
* 7ba5ca32fe6
* b5b519965c4
As long as I can see, these commits can be reverted per each, with a
slight handy-changes. In the case, it is preferable to make a patchset
including these two revert patches. I would like to request it to you,
instead of the all-in-one patch, so that developers easily find the issued
commits (and work to apply these patches into kernels maintained
publicly/locally).
At last, I prefer that the whole patch comment is written by the
posters, instead of referring to comments by the others. I know that the
description about AB/BA deadlock is a bit hard to write, but it is enough
and satisfied for you to write what you understand about the issue. I'd
accept it.
[1] https://lore.kernel.org/lkml/CAHk-=wjV_O2g_K19McjGKrxFxMFDqex+fyGcKc3uac1ft_O2gg@mail.gmail.com/
[2] https://kernel.org/
[3] https://docs.kernel.org/process/stable-kernel-rules.html
[4] https://docs.kernel.org/process/handling-regressions.html
Thanks
Takashi Sakamoto
Powered by blists - more mailing lists