linux-kernel - Re: [PATCH 3/8] mfd: ocelot: rework SPI (re-)initialization after chip reset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87a599dsy7.fsf@prevas.dk>
Date: Tue, 25 Mar 2025 16:35:12 +0100
From: Rasmus Villemoes <ravi@...vas.dk>
To: Colin Foster <colin.foster@...advantage.com>
Cc: Lee Jones <lee@...nel.org>,  linux-kernel@...r.kernel.org,
  devicetree@...r.kernel.org,  Felix Blix Everberg <felix.blix@...vas.dk>
Subject: Re: [PATCH 3/8] mfd: ocelot: rework SPI (re-)initialization after
 chip reset

On Sat, Mar 22 2025, Colin Foster <colin.foster@...advantage.com> wrote:

> On Thu, Mar 20, 2025 at 12:17:37PM +0100, Rasmus Villemoes wrote:
>> Hi Colin
>> 
>> On Wed, Mar 19 2025, Colin Foster <colin.foster@...advantage.com> wrote:
>> 
>> > On Wed, Mar 19, 2025 at 01:30:53PM +0100, Rasmus Villemoes wrote:
>> >> As the comments in ocelot-spi.c explain, after a chip reset, the
>> >> CFGSTAT register must be written again setting the appropriate number
>> >> of padding bytes; otherwise reads are not reliable.
>> >> 
>> >> However, the way the code is currently structured violates that: After
>> >> the BIT_SOFT_CHIP_RST is written, ocelot_chip_reset() immediately
>> >> enters a readx_poll_timeout().
>> >
>> > I ran this new version and everything worked - and I've not seen an
>> > issue in previous versions. I'm looking for guidance as to whether this
>> > should include a Fixes tag and be backported.
>> 
>> Thanks a lot for testing and reviewing! As for backporting, IDK, I think
>> we'd at least first have to know that it really fixes a bug for somebody.
>> 
>> > Great find, by the way! Is there any information you would like from my
>> > setup?
>> 
>> Certainly I'd like to know if you do in fact use a SPI clock > 500 kHz?
>
> Yep, looks like 2.5MHz
>
> &spi0 {
>         #address-cells = <1>;
>         #size-cells = <0>;
>         status = "okay";
>
>         soc@0 {
>                 compatible = "mscc,vsc7512";
>                 spi-max-frequency = <2500000>;
>
>> 
>> And if so, could you try inserting a read and printk of e.g. CHIP_REGS.CHIP_ID
>> immediately after the fsleep(), but before the re-initialization, just
>> so we can see if my theory that the values are off-by-8-bits plus 8 bits
>> of MISO "garbage" is correct? Because that register should have a fairly
>> easily recognizable value.
>
> diff --git a/drivers/mfd/ocelot-core.c b/drivers/mfd/ocelot-core.c
> index c00d30dbfca8..5a2762b6ecac 100644
> --- a/drivers/mfd/ocelot-core.c
> +++ b/drivers/mfd/ocelot-core.c
> @@ -115,6 +115,8 @@ static int ocelot_chip_reset(struct device *dev)
>
>         if (ddata->init_bus) {
>                 fsleep(VSC7512_GCB_RST_SLEEP_US);
> +               regmap_read(ddata->gcb_regmap, 0, &val);
> +               printk("7512 Chip ID after sleep: 0x%08x\n", val);
>                 ret = ddata->init_bus(dev);
>                 if (ret)
>                         return dev_err_probe(dev, ret,
>
>
> Prints out this:
>
> [    3.360986] 7512 Chip ID after sleep: 0xf0e94051
>
> That doesn't seem right. I added a print after init and it makes more sense.
>
> [    3.351656] 7512 Chip ID after sleep: 0xf0e94051
> [    3.356828] 7512 Chip ID after init: 0x175140e9

Thanks for testing. I hadn't realized that another thing the spi bus init
does is setting the endianness, but this clearly shows both the
off-by-one-byte and that the bytes are sent in the wrong order.

It's hard to know how you end up with that f0 garbage byte, I'd assume
either all-1s or all-0s when MISO is no longer driven explicitly. A wild
guess could be that it's leftover capacitance (the last actually-driven
bit is 1), which could explain why you haven't had a problem when
reading the reset register and expected all zeroes, because in that case
the device only sends 0s, and thus the garbage byte ends up also being a
0x00.

So yes, it does seem like this warrants a backport. I'll add a Fixes tag
for the next iteration, plus a link to this thread which demonstrates
the problem. I suppose this goes back to f3e89362.

Thanks,
Rasmus