linux-kernel - Re: [PATCH v3] platform/chrome: Use proper protocol transfer function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fb73cca6-7760-3360-6061-411dd522839e@nvidia.com>
Date:   Wed, 8 Nov 2017 10:20:30 +0000
From:   Jon Hunter <jonathanh@...dia.com>
To:     Doug Anderson <dianders@...omium.org>
CC:     Shawn N <shawnn@...omium.org>, Olof Johansson <olof@...om.net>,
        "Benson Leung" <bleung@...omium.org>,
        Lee Jones <lee.jones@...aro.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Brian Norris <computersforpeace@...il.com>,
        Brian Norris <briannorris@...omium.org>,
        Gwendal Grignou <gwendal@...omium.org>,
        Enric Balletbo <enric.balletbo@...labora.co.uk>,
        Tomeu Vizoso <tomeu.vizoso@...labora.com>,
        "linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>,
        Alexandru M Stan <amstan@...omium.org>
Subject: Re: [PATCH v3] platform/chrome: Use proper protocol transfer function

Hi Doug,

On 07/11/17 17:22, Doug Anderson wrote:
> On Tue, Nov 7, 2017 at 3:28 AM, Jon Hunter <jonathanh@...dia.com> wrote:
>> On 10/10/17 17:52, Doug Anderson wrote:
>>
>> ...
>>
>>>> I'm still not clear on why we see an error only on the first
>>>> transaction after boot. In this case, the embedded controller
>>>> previously handled host commands from firmware just fine, and the
>>>> handoff between firmware and the kernel shouldn't be significant, from
>>>> the EC point of view. If host command timing is consistent from the
>>>> master, I would expect to see some constant error rate, eg. some
>>>> chance any host command will fail, rather than the first host command
>>>> always failing.
>>>
>>> The AP itself is often quite busy at boot and so the timings for
>>> everything change.  That could easily explain the problems.
>>
>> Sorry for the delay, but I have finally had some time to look at this a
>> bit closer. I have been able to track down where the additional delay is
>> really needed and seems to explain what is going on.
>>
>> For starters, the SPI chip-select is under h/w control and so the
>> software delay has no impact on the timing between the chip-select
>> going active and the transaction starting as I had first thought.
>>
>> I found that a delay is needed between completing the probe the Tegra
>> SPI device and the first SPI transaction issued to the EC. In the Tegra
>> SPI probe the SPI controller is programmed for master mode, but at the
>> same time it clears the chip-select inactive polarity bit meaning that
>> initially the SPI chip-select default to active-high (rather that low
>> which seems odd). I believe that this then drives the chip-select low
>> (active for the EC) and until it is then configured when spi_setup() is
>> called which configures it as active-low for the EC.
>> To get the first transaction to work for the EC there needs to be a
>> delay after we program the chip-select polarity when spi_setup() is
>> called. For example ...
>>
>> diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c
>> index 584367f3a0ed..c1075c3c60c8 100644
>> --- a/drivers/mfd/cros_ec_spi.c
>> +++ b/drivers/mfd/cros_ec_spi.c
>> @@ -648,6 +648,8 @@ static int cros_ec_spi_probe(struct spi_device *spi)
>>         if (err < 0)
>>                 return err;
>>
>> +       udelay(100);
>> +
> 
> This isn't totally crazy, but actually you could probably do this:
> 
> ec_spi->last_transfer_ns = ktime_get_ns();
> 
> ...that will leverage already existing code and constants and also
> will avoid doing a delay if it wasn't needed.  You could also then get
> rid of some "if (ec_spi->last_transfer_ns)" tests in the code.  I'd
> support landing that.
> 
> 
>>         ec_spi = devm_kzalloc(dev, sizeof(*ec_spi), GFP_KERNEL);
>>         if (ec_spi == NULL)
>>                 return -ENOMEM;
>>


OK, yes and that does work well too. I will send a patch with the
following for review.

diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c
index 584367f3a0ed..477f8e81dc34 100644
--- a/drivers/mfd/cros_ec_spi.c
+++ b/drivers/mfd/cros_ec_spi.c
@@ -671,6 +671,7 @@ static int cros_ec_spi_probe(struct spi_device *spi)
                           sizeof(struct ec_response_get_protocol_info);
        ec_dev->dout_size = sizeof(struct ec_host_request);

+       ec_spi->last_transfer_ns = ktime_get_ns();

        err = cros_ec_register(ec_dev);
        if (err) {

>> You may say why not put a delay in the tegra_spi_setup() itself, but we
>> have some other SPI devices such as flash devices which are also use an
>> active-low chip-select which don't have any issues with this to date.
>> Furthermore, this delay is also probably device specific.
>>
>> From an EC perspective, if the chip-select is asserted is there a
>> turnaround time before it can be asserted again? Or in this case maybe
>> the issue is that the chip-select is asserted but no transaction occurs
>> before it is de-asserted and so is causing a problem. Please note that a
>> delay of around ~50us above still fails but 100us seems to be ok.
> 
> Really nice detective work!
> 
> 
>> Finally, this also works-around the problem by avoiding the chip-select
>> from being pulsed low by defaulting all chip-selects to active-low but
>> maybe this just masks the problem.
>>
>> diff --git a/drivers/spi/spi-tegra114.c b/drivers/spi/spi-tegra114.c
>> index a76acedd7e2f..7c18204e61d9 100644
>> --- a/drivers/spi/spi-tegra114.c
>> +++ b/drivers/spi/spi-tegra114.c
>> @@ -1117,7 +1117,7 @@ static int tegra_spi_probe(struct platform_device
>>                 goto exit_pm_disable;
>>         }
>> -       tspi->def_command1_reg  = SPI_M_S;
>> +       tspi->def_command1_reg = SPI_M_S | SPI_CS_POL_INACTIVE_MASK;
> 
> Note that even though I'd support adding some sort of delay to the
> cros_ec driver, I'd also say that it _might_ make sense to mess with
> the SPI driver too, just to avoid glitching the lines at bootup.  As
> you said, you shouldn't just willy nilly change the default but it
> seems like it could make sense to define an initial (board level)
> pinctrl state that's in effect until the first call to setup_bus().

I am thinking about making that above change as well, because the reset
value of the chip-select polarity bits default to active-low. For
active-high devices I don't think that you can ever avoid the
chip-select asserting for a period after reset as that is the default
setting, but I don't see why we are clearing these bits in probe.

I will do a bit more testing with this to avoid any regressions, but
both changes seem worthwhile IMO.

Cheers
Jon

-- 
nvpublic