linux-kernel - Re: [PATCH v3] soundwire: stream: Prepare ports in parallel to reduce stream start latency

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <028ae5fc-5ce1-4fbb-a09c-88700f2d77b1@opensource.cirrus.com>
Date: Tue, 23 Dec 2025 10:47:29 +0000
From: Richard Fitzgerald <rf@...nsource.cirrus.com>
To: Pierre-Louis Bossart <pierre-louis.bossart@...ux.dev>, vkoul@...nel.org,
        yung-chuan.liao@...ux.intel.com
Cc: linux-sound@...r.kernel.org, linux-kernel@...r.kernel.org,
        patches@...nsource.cirrus.com
Subject: Re: [PATCH v3] soundwire: stream: Prepare ports in parallel to reduce
 stream start latency

On 23/12/25 10:29, Pierre-Louis Bossart wrote:
<SNIP>

>>> I can see how preparing all ports in parallel would reduce the total time compared to a serial approach, even with a timeout, but if we end-up always timing out even in the case where there is a single device it'd be quite odd.
>>
>> Yes, but that's a separate problem that needs fixing. The current code
>> in the kernel does the timeout for every amp. So if you have 4 amps with
>> a timeout of 10ms it takes >40ms to do the port prepare.
> 
> so is this fair to say that by not testing the return value of wait_for_completion_timeout(), we never detected the 'always-timeout' behavior since the initial contribution circa 2018, even with a single device per link?
> 
> Wow, that'd be a big 7-year old conceptual miss...

It's just  a design flaw in the code. It wasn't found because it was
tested against Realtek codecs, which use the simplified CP_SM so the
code can skip the Prepare stage.

The original code checked for timeout and then always failed, because it
always timed out. So none of the Cirrus amps and codecs worked. I sent
the "temporary" workaround of reading the status register after the
wait_for_completion() and ignoring the timeout if the device is now
reporting prepared, until I could fix it properly. I've only just got
around to looking at this again.

>>> In other words does this patch reduce the start latency only in the case of multiple devices, but adds a 'tax' for all other cases?
>>
>> This only affects peripherals that use the full CP_SM (not simplified).
>>
>> If you have only 1 peripheral there is either no change or a possibility
>> of skipping the wait. Before this change it would always wait the full
>> timeout before noticing that it is ready.
>>
>> After this change, you might be able to skip the wait. If it prepared
>> immediately, so that it is already reporting prepared when the first
>> read tests for that, it will skip the wait.
>> (BTW I see that happening with CS35L56. It prepares so fast that it
>> is already done, without waiting for any of the timeouts.)
>>
>> If it isn't ready that early, you get the timeout as before.
>>
>> It isn't adding any penalty to single devices, unless you count the
>> trivial time it takes to do the register read, which is negligible
>> compared to getting deadlocked in the wait_for_completion_timeout().
> 
> ok, now I get the point that the use of sdw_acquire_bus_lock() in sdw_prepare_stream() prevents the use of device-initiated interrupts.
> 
> One possible change is a departure from the use of the PORT_READY alert. This was meant to be a worst-case scenario for cases where the prepare time is in hundreds of ms - IIRC we thought devices would need to adjust their internal clock to that of the bus, which might change during a prepare operation. But as you mentioned it in practice devices are prepared much faster. So we could have a typical loop with multiple read/sleep instead. See for example cdns_clear_bit() as an example of what I have in mind, classic kernel code really.
> 
> That might speed-up the prepare time even for the single device case - and then that may be enough to avoid the deferred check on PORT_READY and make the code simpler for multiple devices as well.
>

I've got an experimental patch to replace the
wait_for_completion_timeout() with a polling loop. It's a brute-force
approach but reduces the overhead without the scary rewrite of the bus
locking.

> At any rate you're onto something, thanks for submitting this and bearing with my questions.

I need to send a new version of this patch anyway because it will
need rebasing onto 
https://lore.kernel.org/linux-sound/20251219173830.407-1-niranjan.hy@ti.com/T/#u
That one is fixing a problem so makes sense to take it first.

I'll improve the commenting in V2 to explain better what the code
is doing.