linux-kernel - Re: [PATCH v3] soundwire: stream: Prepare ports in parallel to reduce stream start latency

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <319caf4d-a96d-4c25-b912-e00c76ad8038@linux.dev>
Date: Tue, 23 Dec 2025 11:29:29 +0100
From: Pierre-Louis Bossart <pierre-louis.bossart@...ux.dev>
To: Richard Fitzgerald <rf@...nsource.cirrus.com>, vkoul@...nel.org,
 yung-chuan.liao@...ux.intel.com
Cc: linux-sound@...r.kernel.org, linux-kernel@...r.kernel.org,
 patches@...nsource.cirrus.com
Subject: Re: [PATCH v3] soundwire: stream: Prepare ports in parallel to reduce
 stream start latency


>>>>>>> +    val = sdw_read_no_pm(s_rt->slave, SDW_DPN_PREPARESTATUS(p_rt->num));
>>>>>>> +    if (val < 0) {
>>>>>>> +        ret = val;
>>>>>>> +        goto err;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (val & p_rt->ch_mask) {
>>>>>>
>>>>>> Can you explain why we don't use the ch_mask in the already-prepared case? I am missing something.
>>>>>>
>>>>> I'm not sure what you mean here. The if() immediately above your comment
>>>>> uses ch_mask to check the already-prepared state.
>>>>
>>>> I was referring to the 1. above, you read the prepare status without checking for ch_mask first.
>>>>
>>>
>>> What would be the purpose of checking ch_mask before the read?
>>
>> I don't know - why do we need to read it in the second case and not the first is all I am asking.
>>
> 
> What the code is doing is:
> 
> 1. Read the current prepare status
> 2. Check if any channels we prepared are still reporting NOT_PREPARED
> 3. If there are any NOT_PREPARED channels we need to do the wait, else
>    we can skip it

ok, sorry for being thick. The code is just fine, I misread it with the test split in two.

>>>>>>> +        /* Wait for completion on port ready */
>>>>>>> +        port_ready = &s_rt->slave->port_ready[p_rt->num];
>>>>>>> +        wait_for_completion_timeout(port_ready, msecs_to_jiffies(ch_prep_timeout));
>>>>>>
>>>>>> I understand the code is the same as before but would there be any merit in checking the timeout before starting a read? If the device is already in the weeds, doing another read adds even more time before reporting an error.
>>>>>>
>>>>> Do you mean save the system time when the DPN_PREPARE was written to
>>>>> that peripheral and then check here whether the timeout period has
>>>>> already elapsed?
>>>>
>>>> I meant testing the return value of wait_for_completion_timeout(). If you already timed out at this point with a return value of zero, there's no point in checking the status any more, the system is in the weeds.
>>>>
>>>
>>> Wait completion will _always_ timeout because this code is holding the
>>> bus lock, which blocks the ALERT handler from running and signalling
>>> the completion. The wait_for_completion_timeout() is effectively
>>> msleep(msecs_to_jiffies(ch_prep_timeout));
>>> So we have to read the register afterwards to see whether the peripheral
>>> actually prepared.
>>>
>>> I've left the useless wait_for_completion_timeout() in the code so this
>>> commit is only changing what it says it is changing, and nothing else.
>>>
>>> What to do about the deadlocked wait_for_completion_timeout() is a
>>> separate problem.
>>
>> Humm, what happens if you have a single peripheral, does this result in an increase of the prepare time all the way to the timeout value?
>> I can see how preparing all ports in parallel would reduce the total time compared to a serial approach, even with a timeout, but if we end-up always timing out even in the case where there is a single device it'd be quite odd.
> 
> Yes, but that's a separate problem that needs fixing. The current code
> in the kernel does the timeout for every amp. So if you have 4 amps with
> a timeout of 10ms it takes >40ms to do the port prepare.

so is this fair to say that by not testing the return value of wait_for_completion_timeout(), we never detected the 'always-timeout' behavior since the initial contribution circa 2018, even with a single device per link? 

Wow, that'd be a big 7-year old conceptual miss...
 
>> In other words does this patch reduce the start latency only in the case of multiple devices, but adds a 'tax' for all other cases?
> 
> This only affects peripherals that use the full CP_SM (not simplified).
> 
> If you have only 1 peripheral there is either no change or a possibility
> of skipping the wait. Before this change it would always wait the full
> timeout before noticing that it is ready.
> 
> After this change, you might be able to skip the wait. If it prepared
> immediately, so that it is already reporting prepared when the first
> read tests for that, it will skip the wait.
> (BTW I see that happening with CS35L56. It prepares so fast that it
> is already done, without waiting for any of the timeouts.)
> 
> If it isn't ready that early, you get the timeout as before.
> 
> It isn't adding any penalty to single devices, unless you count the
> trivial time it takes to do the register read, which is negligible
> compared to getting deadlocked in the wait_for_completion_timeout().

ok, now I get the point that the use of sdw_acquire_bus_lock() in sdw_prepare_stream() prevents the use of device-initiated interrupts.

One possible change is a departure from the use of the PORT_READY alert. This was meant to be a worst-case scenario for cases where the prepare time is in hundreds of ms - IIRC we thought devices would need to adjust their internal clock to that of the bus, which might change during a prepare operation. But as you mentioned it in practice devices are prepared much faster. So we could have a typical loop with multiple read/sleep instead. See for example cdns_clear_bit() as an example of what I have in mind, classic kernel code really.

That might speed-up the prepare time even for the single device case - and then that may be enough to avoid the deferred check on PORT_READY and make the code simpler for multiple devices as well.

At any rate you're onto something, thanks for submitting this and bearing with my questions.