[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eb0cd539-9d76-489a-b5f4-ecef2a6d32dd@csgroup.eu>
Date: Sat, 8 Nov 2025 11:05:04 +0100
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: "Sverdlin, Alexander" <alexander.sverdlin@...mens.com>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>
Cc: "hui.wang@...onical.com" <hui.wang@...onical.com>,
"mwalle@...nel.org" <mwalle@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"florent.trinh-thai@...soprasteria.com"
<florent.trinh-thai@...soprasteria.com>, "arnd@...db.de" <arnd@...db.de>
Subject: Re: [PATCH] eeprom: at25: convert to spi-mem API
Hi Alexander,
Le 07/11/2025 à 12:49, Sverdlin, Alexander a écrit :
> Hi Greg, Christophe,
>
> On Wed, 2025-11-05 at 08:20 +0100, Alexander Sverdlin wrote:
>>>>> Replace the RAW SPI accesses with spi-mem API. The latter will fall back to
>>>>> RAW SPI accesses if spi-mem callbacks are not implemented by a controller
>>>>> driver.
>>>>
>>>> With this patch (kernel v6.17.1) our powerpc boards are totally unstable, we
>>>> get multiple random Oops due to bad memory accesses.
>>>>
>>>> With this commit reverted the board is stable again.
>>>>
>>>> The SPI driver is:
>>>>
>>>> CONFIG_SPI=y
>>>> CONFIG_SPI_MASTER=y
>>>> CONFIG_SPI_MEM=y
>>>> CONFIG_SPI_FSL_LIB=y
>>>> CONFIG_SPI_FSL_CPM=y
>>>> CONFIG_SPI_FSL_SPI=y
>>>>
>>>> How can we further investigate the issue ?
>>>
>>> We can revert it until it comes back working properly. Can you send a
>>> revert so that I can apply it?
>>
>> what is known for sure is that it triggers a bug in SPI_FSL_CPM [1],
>> which probably justifies a revert in -stable. But there are no indications the
>> patch in question misbehaves itself as of now. I'm going to KASAN it on all the
>> HW I can get my hands on this week.
>>
>> [1] https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F764858d5-5633-4aeb-aabe-52f9eb1eb530%40csgroup.eu%2F&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7Cf9772eb8a7ac440e64ef08de1df3c518%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638981129997670721%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=VnQ2%2B063DdxLpc0vMopRRqGGHXt6zgqUV%2FITU19Zg0s%3D&reserved=0
>
> just letting you know that I stress-tested the at25 driver with KASAN on two ARM64
> platforms, TI AM62 and i.MX8QXP, in the latter case it's even spi-nxp-fspi driver
> only providing spi-mem API, while the TI SoC goes over normal SPI. Up to now it
> went smoothly.
>
> Christophe, while I'm trying to get my hands on a PPC32 HW similar to yours, would
> you be able to test with CONFIG_DMA_API_DEBUG=y? The fact the KASAN doesn't detect
> anything after the fix to spi-fsl-cpm you've mentioned makes me think the corruption
> is external to CPU core? Will you post you fix to fsl-cpm code?
>
I'm now back from travelling and have been able to dig more into the
problem.
So it seems the (only) problem is due to the buffer overrun when t->len
is over 256 and odd. And it is only on CPM1 (mpc8xx), you won't see it
on CPM2 (mpx82xx)
It looks like when this overrun is properly fixed the board is stable
again. The collegue of mine who tried to fix the KASAN report used AI,
leading to a stupid fix: it just truncated the transfer to the below
even length, which fixed the KASAN report but created more problems.
The problem was introduced by commit fc96ec826bce ("spi: fsl-cpm: Use 16
bit mode for large transfers with even size") which fails to check the
length is even before switching to 16 bits per word. The right fix is:
diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c
index 2f2082652a1a2..81d5c3f4eb6fe 100644
--- a/drivers/spi/spi-fsl-spi.c
+++ b/drivers/spi/spi-fsl-spi.c
@@ -335,7 +335,7 @@ static int fsl_spi_prepare_message(struct
spi_controller *ctlr,
if (t->bits_per_word == 16 || t->bits_per_word == 32)
t->bits_per_word = 8; /* pretend its 8 bits */
if (t->bits_per_word == 8 && t->len >= 256 &&
- (mpc8xxx_spi->flags & SPI_CPM1))
+ (t->len & 1) == 0 && (mpc8xxx_spi->flags & SPI_CPM1))
t->bits_per_word = 16;
}
}
Now I'm trying to understand why the problem surfaced with commit
8ad6249c51d0 ("eeprom: at25: convert to spi-mem API")
Christophe
Powered by blists - more mailing lists