lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8c055a1d-e8f5-6d23-18c4-cb87d95bbc5a@lucaceresoli.net>
Date:   Wed, 19 Aug 2020 18:32:24 +0200
From:   Luca Ceresoli <luca@...aceresoli.net>
To:     Tom Rix <trix@...hat.com>, linux-fpga@...r.kernel.org
Cc:     Moritz Fischer <mdf@...nel.org>,
        Michal Simek <michal.simek@...inx.com>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Anatolij Gustschin <agust@...x.de>, linux-gpio@...r.kernel.org,
        Linus Walleij <linus.walleij@...aro.org>,
        Bartosz Golaszewski <bgolaszewski@...libre.com>
Subject: Re: [PATCH 2/3] fpga manager: xilinx-spi: provide better diagnostics
 on programming failure

On 18/08/20 16:21, Tom Rix wrote:
> 
> On 8/18/20 3:20 AM, Luca Ceresoli wrote:
>> [a question for GPIO maintainers below]
>>
>> Hi Tom,
>>
>> thanks for your review!
>>
>> On 17/08/20 20:15, Tom Rix wrote:
>>> The other two patches are fine.
>>>
>>> On 8/17/20 9:59 AM, Luca Ceresoli wrote:
>>>> When the DONE pin does not go high after programming to confirm programming
>>>> success, the INIT_B pin provides some info on the reason. Use it if
>>>> available to provide a more explanatory error message.
>>>>
>>>> Signed-off-by: Luca Ceresoli <luca@...aceresoli.net>
>>>> ---
>>>>  drivers/fpga/xilinx-spi.c | 11 ++++++++++-
>>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/fpga/xilinx-spi.c b/drivers/fpga/xilinx-spi.c
>>>> index 502fae0d1d85..2aa942bb1114 100644
>>>> --- a/drivers/fpga/xilinx-spi.c
>>>> +++ b/drivers/fpga/xilinx-spi.c
>>>> @@ -169,7 +169,16 @@ static int xilinx_spi_write_complete(struct fpga_manager *mgr,
>>>>  			return xilinx_spi_apply_cclk_cycles(conf);
>>>>  	}
>>>>  
>>>> -	dev_err(&mgr->dev, "Timeout after config data transfer.\n");
>>>> +	if (conf->init_b) {
>>>> +		int init_b_asserted = gpiod_get_value(conf->init_b);
>>> gpiod_get_value can fail. So maybe need split the first statement.
>>>
>>> init_b_asserted < 0 ? "invalid device"
>>>
>>> As the if-else statement is getting complicated, embedding the ? : makes this hard to read.  'if,else if, else' would be better.
>> Thanks for the heads up. However I'm not sure which is the best thing to
>> do here.
>>
>> First, I've been reading the libgpiod code after your email and yes, the
>> libgpiod code _could_ return runtime errors received from the gpiochip
>> driver, even though the docs state:
>>
>>> The get/set calls do not return errors because “invalid GPIO”> should have been reported earlier from gpiod_direction_*().
>> (https://www.kernel.org/doc/html/latest/driver-api/gpio/consumer.html)
>>
>> On the other hand there are plenty of calls to gpiod_get/set_value in
>> the kernel that don't check for error values. I guess this is because
>> failures getting/setting a GPIO are very uncommon (perhaps impossible
>> with platform GPIO).
>>
>> When still a GPIO get/set operation fails I'm not sure adding thousands
>> of error-checking code lines in hundreds of drivers is the best way to
>> go. I feel like we should have a unique, noisy dev_err() in the error
>> path in libgpio but I was surprised in not finding any [1].
>>
>> Linus, Bartosz, what's your opinion? Should all drivers check for errors
>> after every gpiod_[sg]et_value*() call?
> 
> My opinion is that you know the driver / hw is in a bad state and you
> 
> are trying to convey useful information.  So you should
> 
> be as careful as possible and not assume gpio did not fail.

This patch aims at providing better diagnostics after programming has
already gone bad. Neglecting an error might lead to a misleading error
message, but this doesn't lead programming to fail -- it has failed already.

On the other hand a gpiod_get/set_value() call might fail earlier, along
the normal execution path, and lead to real failures without an error
message emitted after the gpiod call that failed.

Which doesn't mean I'm against your proposal of adding error checking
code. Rather, if we want error checking, we want it mainly in other
places: at the very least at the first usage of each of the GPIOs, maybe
at each usage. Have a look at the beginning of
xilinx_spi_write_complete() [0] for example: if gpiod_get_value() fails
there the driver would think programming has been successfully completed
(DONE asserted). To me this is worse than just printing the wrong error
message.

[0]
https://elixir.bootlin.com/linux/v5.8.2/source/drivers/fpga/xilinx-spi.c#L114

-- 
Luca

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ