linux-kernel - Re: [PATCH v3] platform/chrome: Use proper protocol transfer function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 7 Nov 2017 11:28:24 +0000
From:   Jon Hunter <jonathanh@...dia.com>
To:     Doug Anderson <dianders@...omium.org>,
        Shawn N <shawnn@...omium.org>
CC:     Olof Johansson <olof@...om.net>,
        Benson Leung <bleung@...omium.org>,
        "Lee Jones" <lee.jones@...aro.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Brian Norris <computersforpeace@...il.com>,
        Brian Norris <briannorris@...omium.org>,
        Gwendal Grignou <gwendal@...omium.org>,
        Enric Balletbo <enric.balletbo@...labora.co.uk>,
        Tomeu Vizoso <tomeu.vizoso@...labora.com>,
        "linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>,
        Alexandru M Stan <amstan@...omium.org>
Subject: Re: [PATCH v3] platform/chrome: Use proper protocol transfer function

On 10/10/17 17:52, Doug Anderson wrote:

...

>> I'm still not clear on why we see an error only on the first
>> transaction after boot. In this case, the embedded controller
>> previously handled host commands from firmware just fine, and the
>> handoff between firmware and the kernel shouldn't be significant, from
>> the EC point of view. If host command timing is consistent from the
>> master, I would expect to see some constant error rate, eg. some
>> chance any host command will fail, rather than the first host command
>> always failing.
> 
> The AP itself is often quite busy at boot and so the timings for
> everything change.  That could easily explain the problems.

Sorry for the delay, but I have finally had some time to look at this a
bit closer. I have been able to track down where the additional delay is
really needed and seems to explain what is going on.

For starters, the SPI chip-select is under h/w control and so the
software delay has no impact on the timing between the chip-select
going active and the transaction starting as I had first thought.

I found that a delay is needed between completing the probe the Tegra
SPI device and the first SPI transaction issued to the EC. In the Tegra
SPI probe the SPI controller is programmed for master mode, but at the
same time it clears the chip-select inactive polarity bit meaning that
initially the SPI chip-select default to active-high (rather that low
which seems odd). I believe that this then drives the chip-select low
(active for the EC) and until it is then configured when spi_setup() is
called which configures it as active-low for the EC.
To get the first transaction to work for the EC there needs to be a
delay after we program the chip-select polarity when spi_setup() is
called. For example ...

diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c
index 584367f3a0ed..c1075c3c60c8 100644
--- a/drivers/mfd/cros_ec_spi.c
+++ b/drivers/mfd/cros_ec_spi.c
@@ -648,6 +648,8 @@ static int cros_ec_spi_probe(struct spi_device *spi)
        if (err < 0)
                return err;

+       udelay(100);
+
        ec_spi = devm_kzalloc(dev, sizeof(*ec_spi), GFP_KERNEL);
        if (ec_spi == NULL)
                return -ENOMEM;

You may say why not put a delay in the tegra_spi_setup() itself, but we
have some other SPI devices such as flash devices which are also use an
active-low chip-select which don't have any issues with this to date.
Furthermore, this delay is also probably device specific.

>From an EC perspective, if the chip-select is asserted is there a
turnaround time before it can be asserted again? Or in this case maybe
the issue is that the chip-select is asserted but no transaction occurs
before it is de-asserted and so is causing a problem. Please note that a
delay of around ~50us above still fails but 100us seems to be ok.

Finally, this also works-around the problem by avoiding the chip-select
from being pulsed low by defaulting all chip-selects to active-low but
maybe this just masks the problem.

diff --git a/drivers/spi/spi-tegra114.c b/drivers/spi/spi-tegra114.c
index a76acedd7e2f..7c18204e61d9 100644
--- a/drivers/spi/spi-tegra114.c
+++ b/drivers/spi/spi-tegra114.c
@@ -1117,7 +1117,7 @@ static int tegra_spi_probe(struct platform_device
                goto exit_pm_disable;
        }
-       tspi->def_command1_reg  = SPI_M_S;
+       tspi->def_command1_reg = SPI_M_S | SPI_CS_POL_INACTIVE_MASK;
        tegra_spi_writel(tspi, tspi->def_command1_reg, SPI_COMMAND1);
        pm_runtime_put(&pdev->dev);

Cheers
Jon

--
nvpublic