lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <83b60736-3856-4804-b25d-d0c8cbf9b185@app.fastmail.com>
Date: Fri, 31 Jan 2025 10:37:55 +0100
From: "Arnd Bergmann" <arnd@...db.de>
To: "Segher Boessenkool" <segher@...nel.crashing.org>,
 "Julian Vetter" <julian@...er-limits.org>
Cc: "Madhavan Srinivasan" <maddy@...ux.ibm.com>,
 "Michael Ellerman" <mpe@...erman.id.au>,
 "Nicholas Piggin" <npiggin@...il.com>,
 "Christophe Leroy" <christophe.leroy@...roup.eu>,
 "Naveen N Rao" <naveen@...nel.org>, linuxppc-dev@...ts.ozlabs.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] powerpc: Remove eieio() in PowerPC IO functions

On Wed, Jan 29, 2025, at 12:14, Segher Boessenkool wrote:
> On Wed, Jan 29, 2025 at 10:45:10AM +0100, Julian Vetter wrote:
>> Remove the eieio() calls in IO functions for PowerPC. While other
>> architectures permit prefetching, combining, and reordering, the eieio()
>> calls on PowerPC prevent such optimizations.
>
> Yes, and it is crucial to prevent combining, it is part of the semantics
> of these functions.  This is a much bigger problem on PowerPC than on
> architectures which optimise memory accesses much less.  So most other
> archs can get away with it much easier (but it is still completely wrong
> there).

Unfortunately it's not well documented what the actual behavior is
supposed to be across architectures, so part of the work that Julian
is doing is to make them behave consistently. My impression is that
we actually do want combining (and reordering) here in memcpy_fromio,
unless there are specific drivers that depend on the non-combining
behavior. My earlier observations towards this are:

- memcpy_fromio() is almost always used on RAM behind a bus, not MMIO
  registers.

- there has been a push towards using instruction sequences on arm64
  make sure we get the most (write) combining for the I/O string functions
  to get better performance

- There is only an eieio in powerpc memcpy_fromio() but no barrier
  inside the memcpy_toio() or memset_io() loops. If it was necessary
  to prevent combining, this would likely be for both load and store
  loops here.

- I tried to trace the history of memcpy_fromio() and found that
  it was originally just a loop around readl(), which at the time
  was eieio() and a pointer dereference and byteswap. The readl()
  definition has changed many times after that, but memcpy_fromio()
  never changed again, which makes it most likely that this was
  just forgotten in the conversion rather than intentional.

If you can think of callers of memcpy_fromio() that depend on the
eieio(), we probably need to fix other architectures as well and
go back to Julian's original idea of adding a new barrier into the
common code and have an architecture specific definition for that
barrier.

For the insb()/insw()/insl() case, I think you are correct that
the eieio is required on powerpc and likely others as well, since
the CPU combining multiple reads on a single address into a single
one would clearly break the concept.
On Arm and other architectures that prevent combining of MMIO
reads based on page table flags, we don't need a barrier here,
but there is a good chance that these barriers are in fact
missing on some alpha and some mips implementations of
readsl() etc: The alpha ioread32_rep() and insl() have barriers
inside, but it also uses the generic readsl() which does not.

> You are keeping the trap;isync things, which a) have a way bigger
> performance impact, and b) are merely a debugging aid (if some i/o
> access kills the system, it will be clear where that came from).  And
> that isn't even the biggest thing of course, there is a heavyweight
> sync in there as well.

The barriers around memcpy_toio()/memcpy_fromio() are a separate
question that we should address as well. I somehow thought that
other architectures had the same barriers as readl/writel around
the string functions, but it does seem like it's only powerpc
at this point.

Intuitively I would have expected that a memcpy_toio() etc
has the same barriers as a writel() around it for consistency, on
the other hand it's only powerpc that has these, and if the
functions are indeed only meant to work on RAM behind MMIO,
they would not not have to serialize against DMA the same way
that writel does because there are no side-effects.

I'm still looking for more insights on that, but if we can
agree that memcpy_{from,to}io don't need the outer barriers,
it seems best to address this at the same time as the internal
eieio() in memcpy_fromio(), provided we agree on removing
those as well.

      Arnd

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ