linux-kernel - Re: [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs were programmed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <11e57078-78c7-6f99-8633-e5e945330550@opensource.cirrus.com>
Date:   Tue, 13 Sep 2022 16:30:07 +0100
From:   Richard Fitzgerald <rf@...nsource.cirrus.com>
To:     Pierre-Louis Bossart <pierre-louis.bossart@...ux.intel.com>,
        <vkoul@...nel.org>, <yung-chuan.liao@...ux.intel.com>,
        <sanyog.r.kale@...el.com>
CC:     <patches@...nsource.cirrus.com>, <alsa-devel@...a-project.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs
 were programmed

On 12/09/2022 18:09, Pierre-Louis Bossart wrote:
> 
> 
> On 9/12/22 14:25, Richard Fitzgerald wrote:
>> On 12/09/2022 12:43, Pierre-Louis Bossart wrote:
>>>
>>>
>>> On 9/7/22 10:52, Richard Fitzgerald wrote:
>>>> Only exit sdw_handle_slave_status() right after calling
>>>> sdw_program_device_num() if it actually programmed an ID into at
>>>> least one device.
>>>>
>>>> sdw_handle_slave_status() should protect itself against phantom
>>>> device #0 ATTACHED indications. In that case there is no actual
>>>> device still on #0. The early exit relies on there being a status
>>>> change to ATTACHED on the reprogrammed device to trigger another
>>>> call to sdw_handle_slave_status() which will then handle the status
>>>> of all peripherals. If no device was actually programmed with an
>>>> ID there won't be a new ATTACHED indication. This can lead to the
>>>> status of other peripherals not being handled.
>>>>
>>>> The status passed to sdw_handle_slave_status() is obviously always
>>>> from a point of time in the past, and may indicate accumulated
>>>> unhandled events (depending how the bus manager operates). It's
>>>> possible that a device ID is reprogrammed but the last PING status
>>>> captured state just before that, when it was still reporting on
>>>> ID #0. Then sdw_handle_slave_status() is called with this PING info,
>>>> just before a new PING status is available showing it now on its new
>>>> ID. So sdw_handle_slave_status() will receive a phantom report of a
>>>> device on #0, but it will not find one.
>>>>
>>>> Signed-off-by: Richard Fitzgerald <rf@...nsource.cirrus.com>
>>>> ---
>>>>    drivers/soundwire/bus.c | 27 +++++++++++++++------------
>>>>    1 file changed, 15 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
>>>> index 6e569a875a9b..0bcc2d161eb9 100644
>>>> --- a/drivers/soundwire/bus.c
>>>> +++ b/drivers/soundwire/bus.c
>>>> @@ -736,20 +736,19 @@ static int sdw_program_device_num(struct
>>>> sdw_bus *bus)
>>>>        struct sdw_slave_id id;
>>>>        struct sdw_msg msg;
>>>>        bool found;
>>>> -    int count = 0, ret;
>>>> +    int count = 0, num_programmed = 0, ret;
>>>>        u64 addr;
>>>>          /* No Slave, so use raw xfer api */
>>>>        ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
>>>>                   SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ, buf);
>>>>        if (ret < 0)
>>>> -        return ret;
>>>> +        return 0;
>>>
>>> this doesn't seem quite right to me, there are multiple -EINVAL cases
>>> handled in sdw_fill_msg().
>>>
>>> I didn't check if all these error cases are irrelevant in that specific
>>> enumeration case, if that was the case maybe we need to break that
>>> function in two helpers so that all the checks can be skipped.
>>>
>>
>> I don't think that there's anything useful that
>> sdw_modify_slave_status() could do to recover from an error.
>>
>> If any device IDs were programmed then, according to the statement in
>> sdw_modify_slave_status()
>>
>>      * programming a device number will have side effects,
>>      * so we deal with other devices at a later time
>>
>> if this is true, then we need to exit to deal with what _was_
>> programmed, even if one of them failed.
>>
>> If nothing was programmed, and there was an error, we can't bail out of
>> sdw_modify_slave_status(). We have status for other devices which
>> we can't simply ignore.
>>
>> Ultimately I can't see how pushing the error code up is useful.
>> sdw_modify_slave_status() can't really do any effective recovery action,
>> and the original behavior of giving up and returning means that
>> an error in programming dev ID potentially causes collateral damage to
>> the status of other peripherals.
> 
> I was suggesting something like
> 
> 
> void sdw_fill_msg_data(...)
> {
>    copy data in the msg structure
> }
> 
> int sdw_fill_msg(...)
> {
>      sdw_fill_msg_data();
>      handle_error_cases
> }
> 
> and in sdw sdw_program_device_num() we call directly sdw_fill_msg_data()
> 
> So no change in functionality beyond explicit skip of error checks that
> are not relevant and cannot be handled even if they were.
> 

sdw_fill_msg() will never report an error during
sdw_program_device_num() because the first check is to return if
the address doesn't need paging, and sdw_program_device_num() only
accesses SCP registers.

I don't want to mix coding improvements with bugfixes. Splitting
sdw_fill_msg() isn't needed to fix this bug.

> 
>