[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260109125337.62466956@pumpkin>
Date: Fri, 9 Jan 2026 12:53:37 +0000
From: David Laight <david.laight.linux@...il.com>
To: "Maciej W. Rozycki" <macro@...am.me.uk>
Cc: Ziming Du <duziming2@...wei.com>, Bjorn Helgaas <bhelgaas@...gle.com>,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
liuyongqiang13@...wei.com
Subject: Re: [PATCH v3 3/3] PCI/sysfs: Prohibit unaligned access to I/O port
on non-x86
On Fri, 9 Jan 2026 00:38:05 +0000 (GMT)
"Maciej W. Rozycki" <macro@...am.me.uk> wrote:
> On Thu, 8 Jan 2026, David Laight wrote:
>
> > I'm not sure it makes any real sense for x86 either.
>
> FWIW I agree.
The interface could have allowed arbitrary transfers and split them into
aligned bus cycles.
That would let you hexdump an io bar (useful for diagnostics, but some
reads end up being destructive - caveat emptor).
But it doesn't....
It is also of limited use because (IIRC) you can't access the device this
way once a driver has mapped the bar.
The driver can, of course, support applications using mmap() to directly
access device memory over PCIe.
(The difficulty is mmap() of kernel memory allocated with dma_alloc_coherent().)
> > IIRC io space is just like memory space, so a 16bit io access looks the
> > same as two 8bit accesses to an 8bit device (some put the 'data fifo' on
> > addresses 0 and 1 so the code could use 16bit io accesses to speed things up).
>
> Huh? A 16-bit port I/O access will have the byte enables set accordingly
> on PCI and the target device's data lines are driven accordingly in the
> data cycle. Just as with MMIO; it's just a different bus command (or TLP
> type for PCIe).
I was going back to the historic implementations - like 8086 and 8088.
For pcie I know it is all different and the cpu just generates read/write TLP
with the required byte enables and the tlp type marked 'io'.
(I can't remember whether IO writes end up 'not posted' - failed to find it in
a 5cm thick book on my shelf.)
> There's no data FIFO or anything as exotic in normal hardware to drive or
> collect data for port I/O accesses wider than 8 bits. Some peripheral
> hardware may ignore byte enables though to simplify logic and e.g. assume
> that all port I/O or MMIO accesses are of a certain width, such as 16-bit
> or 32-bit.
Not to mention an fpga fabric that converted the TLP for an (aligned) 32bit
access into a pair of 32bit accesses - one of which had no byte enables set!
Confused the hw engineers who had ignored the byte enables...
> > The same will have applied to misaligned accesses.
>
> Misaligned accesses may or may not have to be split depending on whether
> they span the data bus width boundary or not. E.g. a 16-bit access to
> port I/O location 1 won't be split on 32-bit PCI as it fits on the bus:
> byte enables #1 and #2 will be driven active and byte enables #0 and #3
> will be left inactive. Conversely such an access to location 3 needs to
> be split into two cycles, with byte enables #3 and #0 only driven active
> respectively in the first and the second cycle.
And on PCIe (which is 64bit) a misaligned transfer that crosses a 64bit
boundary generates a single 16 byte TLP (not sure about page boundaries).
> The x86 BIU will do the split automatically for port I/O instructions as
> will some other CPU architectures that use memory access instructions to
> reach the PCI port I/O decoding window in their memory address space (this
> is a simplified view, as the split may have to be done in the chipset when
> passing the boundary between data buses of a different width each).
Yes, and I don't understand the HAS_IOPORT option.
Pretty much only x86 has separate instructions, but a lot of others will
have PCI/PCIe interface logic that can convert cpu memory accesses into
'io' accesses - so the pci_map_bar() should be able to transparently map
an io bar into kernel address space.
So x86 should be the outlier because it can't do that!
Even the strongarm system I used years ago has an address window that
generated 'io' cycles on a pcmcia bus.
I think a host PCI/PCIe interface could do io accesses for the bottom
64k of its memory window - but I don't know any that work that way.
>
> With other architectures such as MIPS designated instructions need to be
> used to drive the byte enables by hand for individual partial accesses in
> a split access, and the remaining architectures cannot drive some of the
> byte-enable patterns needed for such split accesses at all (and do masking
> in software instead for unaligned accesses to regular memory).
>
> > But, in reality, all device registers are aligned.
>
> True, sometimes beyond their width too.
Which means you don't want the multiple cycles that happen when someone
marks a structure as 'packed' even though it is completely aligned.
David
>
> Maciej
>
Powered by blists - more mailing lists