[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c68e5f78-51cc-6b16-dc7d-4540b78e4d2f@linux.intel.com>
Date: Mon, 12 Sep 2022 19:09:00 +0200
From: Pierre-Louis Bossart <pierre-louis.bossart@...ux.intel.com>
To: Richard Fitzgerald <rf@...nsource.cirrus.com>, vkoul@...nel.org,
yung-chuan.liao@...ux.intel.com, sanyog.r.kale@...el.com
Cc: patches@...nsource.cirrus.com, alsa-devel@...a-project.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs
were programmed
On 9/12/22 14:25, Richard Fitzgerald wrote:
> On 12/09/2022 12:43, Pierre-Louis Bossart wrote:
>>
>>
>> On 9/7/22 10:52, Richard Fitzgerald wrote:
>>> Only exit sdw_handle_slave_status() right after calling
>>> sdw_program_device_num() if it actually programmed an ID into at
>>> least one device.
>>>
>>> sdw_handle_slave_status() should protect itself against phantom
>>> device #0 ATTACHED indications. In that case there is no actual
>>> device still on #0. The early exit relies on there being a status
>>> change to ATTACHED on the reprogrammed device to trigger another
>>> call to sdw_handle_slave_status() which will then handle the status
>>> of all peripherals. If no device was actually programmed with an
>>> ID there won't be a new ATTACHED indication. This can lead to the
>>> status of other peripherals not being handled.
>>>
>>> The status passed to sdw_handle_slave_status() is obviously always
>>> from a point of time in the past, and may indicate accumulated
>>> unhandled events (depending how the bus manager operates). It's
>>> possible that a device ID is reprogrammed but the last PING status
>>> captured state just before that, when it was still reporting on
>>> ID #0. Then sdw_handle_slave_status() is called with this PING info,
>>> just before a new PING status is available showing it now on its new
>>> ID. So sdw_handle_slave_status() will receive a phantom report of a
>>> device on #0, but it will not find one.
>>>
>>> Signed-off-by: Richard Fitzgerald <rf@...nsource.cirrus.com>
>>> ---
>>> drivers/soundwire/bus.c | 27 +++++++++++++++------------
>>> 1 file changed, 15 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
>>> index 6e569a875a9b..0bcc2d161eb9 100644
>>> --- a/drivers/soundwire/bus.c
>>> +++ b/drivers/soundwire/bus.c
>>> @@ -736,20 +736,19 @@ static int sdw_program_device_num(struct
>>> sdw_bus *bus)
>>> struct sdw_slave_id id;
>>> struct sdw_msg msg;
>>> bool found;
>>> - int count = 0, ret;
>>> + int count = 0, num_programmed = 0, ret;
>>> u64 addr;
>>> /* No Slave, so use raw xfer api */
>>> ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
>>> SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ, buf);
>>> if (ret < 0)
>>> - return ret;
>>> + return 0;
>>
>> this doesn't seem quite right to me, there are multiple -EINVAL cases
>> handled in sdw_fill_msg().
>>
>> I didn't check if all these error cases are irrelevant in that specific
>> enumeration case, if that was the case maybe we need to break that
>> function in two helpers so that all the checks can be skipped.
>>
>
> I don't think that there's anything useful that
> sdw_modify_slave_status() could do to recover from an error.
>
> If any device IDs were programmed then, according to the statement in
> sdw_modify_slave_status()
>
> * programming a device number will have side effects,
> * so we deal with other devices at a later time
>
> if this is true, then we need to exit to deal with what _was_
> programmed, even if one of them failed.
>
> If nothing was programmed, and there was an error, we can't bail out of
> sdw_modify_slave_status(). We have status for other devices which
> we can't simply ignore.
>
> Ultimately I can't see how pushing the error code up is useful.
> sdw_modify_slave_status() can't really do any effective recovery action,
> and the original behavior of giving up and returning means that
> an error in programming dev ID potentially causes collateral damage to
> the status of other peripherals.
I was suggesting something like
void sdw_fill_msg_data(...)
{
copy data in the msg structure
}
int sdw_fill_msg(...)
{
sdw_fill_msg_data();
handle_error_cases
}
and in sdw sdw_program_device_num() we call directly sdw_fill_msg_data()
So no change in functionality beyond explicit skip of error checks that
are not relevant and cannot be handled even if they were.
Powered by blists - more mailing lists