[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67e2aee8-1ef9-4256-94ca-dbf42f62cbf7@cjdns.fr>
Date: Thu, 30 Oct 2025 00:06:16 +0100
From: Caleb James DeLisle <cjd@...ns.fr>
To: Jonas Gorski <jonas.gorski@...il.com>
Cc: nbd@....name, lorenzo@...nel.org, ryder.lee@...iatek.com,
 shayne.chen@...iatek.com, sean.wang@...iatek.com, matthias.bgg@...il.com,
 angelogioacchino.delregno@...labora.com, linux-wireless@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-mediatek@...ts.infradead.org,
 Daniel Golle <daniel@...rotopia.org>
Subject: Re: [PATCH] wifi: mt76: mmio_(read|write)_copy byte swap when on Big
 Endian
On 29/10/2025 21:40, Caleb James DeLisle wrote:
>
> On 29/10/2025 21:12, Jonas Gorski wrote:
>> On Wed, Oct 29, 2025 at 4:24 PM Caleb James DeLisle <cjd@...ns.fr> 
>> wrote:
>>>
>>> On 29/10/2025 10:15, Jonas Gorski wrote:
>>>> On Tue, Oct 28, 2025 at 10:42 PM Caleb James DeLisle <cjd@...ns.fr> 
>>>> wrote:
>>>>> On 28/10/2025 21:19, Jonas Gorski wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Mon, Oct 27, 2025 at 6:19 PM Caleb James DeLisle 
>>>>>> <cjd@...ns.fr> wrote:
>>>>>>> When on a Big Endian machine, PCI swaps words to/from LE when
>>>>>>> reading/writing them. This presents a problem when we're trying
>>>>>>> to copy an opaque byte array such as firmware or encryption key.
>>>>>>>
>>>>>>> Byte-swapping during copy results in two swaps, but solves the
>>>>>>> problem.
>>>>>>>
>>>>>>> Fixes:
>>>>>>> mt76x2e 0000:02:00.0: ROM patch build: 20141115060606a
>>>>>>> mt76x2e 0000:02:00.0: Firmware Version: 0.0.00
>>>>>>> mt76x2e 0000:02:00.0: Build: 1
>>>>>>> mt76x2e 0000:02:00.0: Build Time: 201607111443____
>>>>>>> mt76x2e 0000:02:00.0: Firmware failed to start
>>>>>>> mt76x2e 0000:02:00.0: probe with driver mt76x2e failed with 
>>>>>>> error -145
>>>>>>>
>>>>>>> Signed-off-by: Caleb James DeLisle <cjd@...ns.fr>
>>>>>>> ---
>>>>>>>     drivers/net/wireless/mediatek/mt76/mmio.c | 34 
>>>>>>> +++++++++++++++++++++++
>>>>>>>     1 file changed, 34 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/net/wireless/mediatek/mt76/mmio.c 
>>>>>>> b/drivers/net/wireless/mediatek/mt76/mmio.c
>>>>>>> index cd2e9737c3bf..776dbaacc8a3 100644
>>>>>>> --- a/drivers/net/wireless/mediatek/mt76/mmio.c
>>>>>>> +++ b/drivers/net/wireless/mediatek/mt76/mmio.c
>>>>>>> @@ -30,15 +30,49 @@ static u32 mt76_mmio_rmw(struct mt76_dev 
>>>>>>> *dev, u32 offset, u32 mask, u32 val)
>>>>>>>            return val;
>>>>>>>     }
>>>>>>>
>>>>>>> +static void mt76_mmio_write_copy_portable(void __iomem *dst,
>>>>>>> +                                         const u8 *src, int len)
>>>>>>> +{
>>>>>>> +       __le32 val;
>>>>>>> +       int i = 0;
>>>>>>> +
>>>>>>> +       for (i = 0; i < ALIGN(len, 4); i += 4) {
>>>>>>> +               memcpy(&val, src + i, sizeof(val));
>>>>>>> +               writel(cpu_to_le32(val), dst + i);
>>>>>>> +       }
>>>>>>> +}
>>>>>>> +
>>>>>>>     static void mt76_mmio_write_copy(struct mt76_dev *dev, u32 
>>>>>>> offset,
>>>>>>>                                     const void *data, int len)
>>>>>>>     {
>>>>>>> +       if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) {
>>>>>>> + mt76_mmio_write_copy_portable(dev->mmio.regs + offset, data,
>>>>>>> +                                             len);
>>>>>>> +               return;
>>>>>>> +       }
>>>>>>>            __iowrite32_copy(dev->mmio.regs + offset, data, 
>>>>>>> DIV_ROUND_UP(len, 4));
>>>>>> Maybe just replace this with memcpy_toio() which does no swapping at
>>>>>> all instead of double swapping on BE?
>>>>> I'm not that informed about how PCI works so I had to test to confirm
>>>>> my understanding, but I can confirm that memcpy_toio() does not solve
>>>>> the problem.
>>>> Ah, right, I misread _iowrite32_copy() to do conversion to LE, but 
>>>> it doesn't.
>>>>
>>>> What architecture is this you have? PowerPC? ARM? MIPS? 32 bit? 64 
>>>> bit?
>>>
>>> MIPS32 (EcoNet EN751221 34Kc)
>>>
>>>
>>>> So the differences I see are:
>>>>
>>>> 1. __iowrite32_copy() uses __raw_writel(), which has different memory
>>>> semantics than writel()
>>>> 2. __iowrite32_copy() assumed src is aligned to 32 bit, while you
>>>> explicitly align it
>>>> 3. memcpy_toio() will handle unaligned src properly, but does 64 bit
>>>> accesses on 64 bit systems (and uses __raw* again).
>>>>
>>>> Is src aligned? If not, then the issue might be 2. And if your system
>>>> is 64 bit, it would explain why 3 didn't help.
>>>
>>> I'm not a regular developer of mt76 so I wasn't sure if that was
>>> guaranteed and I just wanted to code for safety.
>>>
>>> After reviewing the code, I see that there are a few places where
>>> mt76_mmio_write_copy is being called with stack-allocated u8 arrays
>>> so it's pretty clear to me that this is being treated as a memcpy-like
>>> function and we should be handling unaligned inputs.
>>>
>>>
>>>> As a first step you could try to replace the writel(cpu_to_le32(val)
>>>> with a iowrite32be(val, ...) which should do the same except avoiding
>>>> the doubled byte swapping. If that works, you can try to replace it
>>> This works.
>>>
>>> These symbols are a bit of a nightmare to trace, so I ended up making
>>> an .i file so I could confirm what's happening.
>>>
>>> iowrite32be() uses the version in iomap.c so I understand that's using
>>> writel(swab32(val),port), so a writel with an unconditional byte swap.
>>> writel() is more complicated, it's an inline function that is generated
>>> in a rat's nest of preprocessor macros in mips/include/asm/io.h
>>>
>>> The preprocessed is this:
>>>
>>> __mem = (void *)((unsigned long)(mem)); __val = (val); if (sizeof(u32)
>>> != sizeof(u64) || sizeof(u64) == sizeof(long)) { *__mem = __val;
>>>
>>> The source is this:
>>>
>>>       __mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem));    \
>>>       __val = pfx##ioswab##bwlq(__mem, val);                \
>>>       if (sizeof(type) != sizeof(u64) || sizeof(u64) == sizeof(long)) \
>>>           *__mem = __val;                        \
>>>
>>> The line "pfx##ioswab##bwlq(__mem, val);" is ioswabl() and the source
>>> of that explains the issue:
>>>
>>>    * Sane hardware offers swapping of PCI/ISA I/O space accesses in 
>>> hardware;
>>>    * less sane hardware forces software to fiddle with this...
>>>
>>> So this confirms my initial understanding, the PCI hardware is doing 
>>> the
>>> byte swapping unconditionally.
>>>
>>>
>>>> with __raw_writel(), which then would make this the same as
>>>> __iowrite32_copy, except making sure that src is aligned.
>>>
>>> This fails.
>>>
>>> Since I'm the maintainer of this SoC and it's still fairly new, I wrote
>>> a trivial kmod to verify that unaligned access is not just silently
>>> returning trash, it works as though it were aligned so alignment is
>>> not the issue.
>>>
>>>
>>>> Also you could replace your memcpy() with get_unaligned((u32 *)(src +
>>>> i)); Should do the same but inline.
>>> Good idea, I will do this.
>>>>> The issue as I understand it is that rather than making every driver
>>>>> carefully call cpu_to_le*() every MMIO write, someone decided to make
>>>>> the PCI host bridge itself transparently byte-swap all MMIO on the
>>>>> wire. Since most MMIO is hitting registers and most buffers are
>>>>> transferred by DMA, for the most part everything works and nobody
>>>>> notices.
>>>>>
>>>>> But in the rare case that we need to write a blob to MMIO, it gets
>>>>> transparently swapped in hardware so you need to use cpu_to_le in 
>>>>> that
>>>>> case. Doing a search of ./drivers for write.*cpu_to_le I can see this
>>>>> issue comes up a bit.
>>>> Every (PCI) driver does conversion to LE implicitly by using
>>>> writel/readl (or iowrite32/ioread32) etc do the conversion to/from LE.
>>>> So writel(foo, dst )is a __raw_writel(cpu_to_le32(foo), dst) etc. PCI
>>>> memory is assumed to be in LE. If you are on a little endian system,
>>>> then no byte swapping happens, and on BE it will do byte swapping
>>>> before writing the value.
>>> Okay so it seems that in the case of MIPS, that's not always how it
>>> works.
>>>
>>> https://github.com/torvalds/linux/blob/e53642b87a4f4b03a8d7e5f8507fc3cd0c595ea6/arch/mips/include/asm/mach-generic/mangle-port.h#L17 
>>>
>>>
>>> Since we don't know if the swap will happen in hardware or software
>>> and (AFAIK) this is not a hot enough path that double-swapping will
>>> have a notable performance penalty, I think the most sane thing to
>>> do is use writel(cpu_to_le32()) and not care if it's swapped back
>>> in the kernel or hardware.
>> Oh, I think I see what it happening here. ECONET is a Big Endian MIPS
>> platform, but does not select SWAP_IO_SPACE (most other BE platforms
>> do).
>>
>> Does that mean the PCI space is swapped in hardware there?
>>
>> I guess that means that anything that uses __raw accessors to PCI
>> space directly or indirectly is broken, as the raw data is now
>> actually in the wrong order and needs to be swab'd.
>>
>> I don't know if it is a good idea to change this in __iowrite32_copy()
>> / __ioread32_copy() (and other helpers), or if there are drivers that
>> use it on non-PCI spaces and would be broken by that.
>>
>> If there is a way, I would suggest disabling hardware conversion and
>> selecting SWAP_IO_SPACE, but that will affect a lot of your code that
>> assumes that writel() etc don't convert to/from little endian.
>
>
> I can look around in the hardware registers and see if I can shut it
> off for EcoNet, but if you're saying MT76 should not support BE unless
> they disable hardware swapping and use SWAP_IO_SPACE, that means the
> majority of BE hardware on OpenWrt is not going to be supported. If
> that's the decision then it at least warrants clear documentation.
Update: I tested an MT7921 USB and it works fine with the patched
driver, I checked the codepath and it is never used for USB anyway,
it's only for PCI and mt7628-wmac, a direct-wired wlan on the MT7628
which is Little Endian.
Also there's no way I can switch this to use SWAP_IO_SPACE because it
uses mtk-xhci for USB and that expects writel() to not swap bytes.
I don't see why this patch should be controversial - it fixes a bug
that breaks 6 OpenWrt platforms and there's no evidence that it breaks
anything. Not only that but even if these platforms are considered low
priority because Big Endian is a historical artifact, that's all the
more reason why nobody should mind the addition of a cpu_to_le32()
call which on Little Endian is a no-op.
In any case, I would appreciate if you could look at my v2 because
I switched to using (get|put)_unaligned_le32() which is much nicer.
Thank you for the advice on that point.
Thanks,
Caleb
>
> Thanks,
> Caleb
>
>
> user@...-dev:~/en7526/openwrt$ find ./ -name 'config-6.12' | while 
> read x; do grep -q 'CPU_BIG_ENDIAN=y' "$x" && ( grep -q 
> 'SWAP_IO_SPACE=y' "$x" || echo "$x
>  does not use SWAP_IO_SPACE" ) ; done
> ./target/linux/apm821xx/config-6.12 does not use SWAP_IO_SPACE
> ./target/linux/realtek/rtl931x/config-6.12 does not use SWAP_IO_SPACE
> ./target/linux/realtek/rtl930x_nand/config-6.12 does not use 
> SWAP_IO_SPACE
> ./target/linux/realtek/rtl839x/config-6.12 does not use SWAP_IO_SPACE
> ./target/linux/realtek/rtl931x_nand/config-6.12 does not use 
> SWAP_IO_SPACE
> ./target/linux/realtek/rtl930x/config-6.12 does not use SWAP_IO_SPACE
> ./target/linux/realtek/rtl838x/config-6.12 does not use SWAP_IO_SPACE
> ./target/linux/ath79/config-6.12 does not use SWAP_IO_SPACE
> ./target/linux/octeon/config-6.12 does not use SWAP_IO_SPACE
> ./target/linux/ixp4xx/config-6.12 does not use SWAP_IO_SPACE
> ./target/linux/econet/en751221/config-6.12 does not use SWAP_IO_SPACE
> user@...-dev:~/en7526/openwrt$ find ./ -name 'config-6.12' | while 
> read x; do grep -q 'CPU_BIG_ENDIAN=y' "$x" && ( grep -q 
> 'SWAP_IO_SPACE=y' "$x" && echo "$x
>  does use SWAP_IO_SPACE" ) ; done
> ./target/linux/bmips/bcm6358/config-6.12 does use SWAP_IO_SPACE
> ./target/linux/bmips/bcm6328/config-6.12 does use SWAP_IO_SPACE
> ./target/linux/bmips/bcm6318/config-6.12 does use SWAP_IO_SPACE
> ./target/linux/bmips/bcm6368/config-6.12 does use SWAP_IO_SPACE
> ./target/linux/bmips/bcm6362/config-6.12 does use SWAP_IO_SPACE
> ./target/linux/bmips/bcm63268/config-6.12 does use SWAP_IO_SPACE
> ./target/linux/lantiq/config-6.12 does use SWAP_IO_SPACE
> user@...-dev:~/en7526/openwrt$
>
>
>>
>> Best regards,
>> Jonas
>>
Powered by blists - more mailing lists
 
