lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOiHx=kfAwLzuoP2Y-AnGz4GysmszXq-f_et0rgd1j0thYv4Ew@mail.gmail.com>
Date: Wed, 29 Oct 2025 21:12:10 +0100
From: Jonas Gorski <jonas.gorski@...il.com>
To: Caleb James DeLisle <cjd@...ns.fr>
Cc: nbd@....name, lorenzo@...nel.org, ryder.lee@...iatek.com, 
	shayne.chen@...iatek.com, sean.wang@...iatek.com, matthias.bgg@...il.com, 
	angelogioacchino.delregno@...labora.com, linux-wireless@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-mediatek@...ts.infradead.org, 
	Daniel Golle <daniel@...rotopia.org>
Subject: Re: [PATCH] wifi: mt76: mmio_(read|write)_copy byte swap when on Big Endian

On Wed, Oct 29, 2025 at 4:24 PM Caleb James DeLisle <cjd@...ns.fr> wrote:
>
>
> On 29/10/2025 10:15, Jonas Gorski wrote:
> > On Tue, Oct 28, 2025 at 10:42 PM Caleb James DeLisle <cjd@...ns.fr> wrote:
> >>
> >> On 28/10/2025 21:19, Jonas Gorski wrote:
> >>> Hi,
> >>>
> >>> On Mon, Oct 27, 2025 at 6:19 PM Caleb James DeLisle <cjd@...ns.fr> wrote:
> >>>> When on a Big Endian machine, PCI swaps words to/from LE when
> >>>> reading/writing them. This presents a problem when we're trying
> >>>> to copy an opaque byte array such as firmware or encryption key.
> >>>>
> >>>> Byte-swapping during copy results in two swaps, but solves the
> >>>> problem.
> >>>>
> >>>> Fixes:
> >>>> mt76x2e 0000:02:00.0: ROM patch build: 20141115060606a
> >>>> mt76x2e 0000:02:00.0: Firmware Version: 0.0.00
> >>>> mt76x2e 0000:02:00.0: Build: 1
> >>>> mt76x2e 0000:02:00.0: Build Time: 201607111443____
> >>>> mt76x2e 0000:02:00.0: Firmware failed to start
> >>>> mt76x2e 0000:02:00.0: probe with driver mt76x2e failed with error -145
> >>>>
> >>>> Signed-off-by: Caleb James DeLisle <cjd@...ns.fr>
> >>>> ---
> >>>>    drivers/net/wireless/mediatek/mt76/mmio.c | 34 +++++++++++++++++++++++
> >>>>    1 file changed, 34 insertions(+)
> >>>>
> >>>> diff --git a/drivers/net/wireless/mediatek/mt76/mmio.c b/drivers/net/wireless/mediatek/mt76/mmio.c
> >>>> index cd2e9737c3bf..776dbaacc8a3 100644
> >>>> --- a/drivers/net/wireless/mediatek/mt76/mmio.c
> >>>> +++ b/drivers/net/wireless/mediatek/mt76/mmio.c
> >>>> @@ -30,15 +30,49 @@ static u32 mt76_mmio_rmw(struct mt76_dev *dev, u32 offset, u32 mask, u32 val)
> >>>>           return val;
> >>>>    }
> >>>>
> >>>> +static void mt76_mmio_write_copy_portable(void __iomem *dst,
> >>>> +                                         const u8 *src, int len)
> >>>> +{
> >>>> +       __le32 val;
> >>>> +       int i = 0;
> >>>> +
> >>>> +       for (i = 0; i < ALIGN(len, 4); i += 4) {
> >>>> +               memcpy(&val, src + i, sizeof(val));
> >>>> +               writel(cpu_to_le32(val), dst + i);
> >>>> +       }
> >>>> +}
> >>>> +
> >>>>    static void mt76_mmio_write_copy(struct mt76_dev *dev, u32 offset,
> >>>>                                    const void *data, int len)
> >>>>    {
> >>>> +       if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) {
> >>>> +               mt76_mmio_write_copy_portable(dev->mmio.regs + offset, data,
> >>>> +                                             len);
> >>>> +               return;
> >>>> +       }
> >>>>           __iowrite32_copy(dev->mmio.regs + offset, data, DIV_ROUND_UP(len, 4));
> >>> Maybe just replace this with memcpy_toio() which does no swapping at
> >>> all instead of double swapping on BE?
> >>
> >> I'm not that informed about how PCI works so I had to test to confirm
> >> my understanding, but I can confirm that memcpy_toio() does not solve
> >> the problem.
> > Ah, right, I misread _iowrite32_copy() to do conversion to LE, but it doesn't.
> >
> > What architecture is this you have? PowerPC? ARM? MIPS? 32 bit? 64 bit?
>
>
> MIPS32 (EcoNet EN751221 34Kc)
>
>
> >
> > So the differences I see are:
> >
> > 1. __iowrite32_copy() uses __raw_writel(), which has different memory
> > semantics than writel()
> > 2. __iowrite32_copy() assumed src is aligned to 32 bit, while you
> > explicitly align it
> > 3. memcpy_toio() will handle unaligned src properly, but does 64 bit
> > accesses on 64 bit systems (and uses __raw* again).
> >
> > Is src aligned? If not, then the issue might be 2. And if your system
> > is 64 bit, it would explain why 3 didn't help.
>
>
> I'm not a regular developer of mt76 so I wasn't sure if that was
> guaranteed and I just wanted to code for safety.
>
> After reviewing the code, I see that there are a few places where
> mt76_mmio_write_copy is being called with stack-allocated u8 arrays
> so it's pretty clear to me that this is being treated as a memcpy-like
> function and we should be handling unaligned inputs.
>
>
> >
> > As a first step you could try to replace the writel(cpu_to_le32(val)
> > with a iowrite32be(val, ...) which should do the same except avoiding
> > the doubled byte swapping. If that works, you can try to replace it
>
> This works.
>
> These symbols are a bit of a nightmare to trace, so I ended up making
> an .i file so I could confirm what's happening.
>
> iowrite32be() uses the version in iomap.c so I understand that's using
> writel(swab32(val),port), so a writel with an unconditional byte swap.

>
> writel() is more complicated, it's an inline function that is generated
> in a rat's nest of preprocessor macros in mips/include/asm/io.h
>
> The preprocessed is this:
>
> __mem = (void *)((unsigned long)(mem)); __val = (val); if (sizeof(u32)
> != sizeof(u64) || sizeof(u64) == sizeof(long)) { *__mem = __val;
>
> The source is this:
>
>      __mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem));    \
>      __val = pfx##ioswab##bwlq(__mem, val);                \
>      if (sizeof(type) != sizeof(u64) || sizeof(u64) == sizeof(long)) \
>          *__mem = __val;                        \
>
> The line "pfx##ioswab##bwlq(__mem, val);" is ioswabl() and the source
> of that explains the issue:
>
>   * Sane hardware offers swapping of PCI/ISA I/O space accesses in hardware;
>   * less sane hardware forces software to fiddle with this...
>
> So this confirms my initial understanding, the PCI hardware is doing the
> byte swapping unconditionally.
>
>
> > with __raw_writel(), which then would make this the same as
> > __iowrite32_copy, except making sure that src is aligned.
>
>
> This fails.
>
> Since I'm the maintainer of this SoC and it's still fairly new, I wrote
> a trivial kmod to verify that unaligned access is not just silently
> returning trash, it works as though it were aligned so alignment is
> not the issue.
>
>
> >
> > Also you could replace your memcpy() with get_unaligned((u32 *)(src +
> > i)); Should do the same but inline.
> Good idea, I will do this.
> >> The issue as I understand it is that rather than making every driver
> >> carefully call cpu_to_le*() every MMIO write, someone decided to make
> >> the PCI host bridge itself transparently byte-swap all MMIO on the
> >> wire. Since most MMIO is hitting registers and most buffers are
> >> transferred by DMA, for the most part everything works and nobody
> >> notices.
> >>
> >> But in the rare case that we need to write a blob to MMIO, it gets
> >> transparently swapped in hardware so you need to use cpu_to_le in that
> >> case. Doing a search of ./drivers for write.*cpu_to_le I can see this
> >> issue comes up a bit.
> > Every (PCI) driver does conversion to LE implicitly by using
> > writel/readl (or iowrite32/ioread32) etc do the conversion to/from LE.
> > So writel(foo, dst )is a __raw_writel(cpu_to_le32(foo), dst) etc. PCI
> > memory is assumed to be in LE. If you are on a little endian system,
> > then no byte swapping happens, and on BE it will do byte swapping
> > before writing the value.
>
> Okay so it seems that in the case of MIPS, that's not always how it
> works.
>
> https://github.com/torvalds/linux/blob/e53642b87a4f4b03a8d7e5f8507fc3cd0c595ea6/arch/mips/include/asm/mach-generic/mangle-port.h#L17
>
> Since we don't know if the swap will happen in hardware or software
> and (AFAIK) this is not a hot enough path that double-swapping will
> have a notable performance penalty, I think the most sane thing to
> do is use writel(cpu_to_le32()) and not care if it's swapped back
> in the kernel or hardware.

Oh, I think I see what it happening here. ECONET is a Big Endian MIPS
platform, but does not select SWAP_IO_SPACE (most other BE platforms
do).

Does that mean the PCI space is swapped in hardware there?

I guess that means that anything that uses __raw accessors to PCI
space directly or indirectly is broken, as the raw data is now
actually in the wrong order and needs to be swab'd.

I don't know if it is a good idea to change this in __iowrite32_copy()
/ __ioread32_copy() (and other helpers), or if there are drivers that
use it on non-PCI spaces and would be broken by that.

If there is a way, I would suggest disabling hardware conversion and
selecting SWAP_IO_SPACE, but that will affect a lot of your code that
assumes that writel() etc don't convert to/from little endian.

Best regards,
Jonas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ