lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250715115200.GJ2067380@nvidia.com>
Date: Tue, 15 Jul 2025 08:52:00 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Will Deacon <will@...nel.org>
Cc: Catalin Marinas <catalin.marinas@....com>,
	Alexander Gordeev <agordeev@...ux.ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christian Borntraeger <borntraeger@...ux.ibm.com>,
	Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
	Vasily Gorbik <gor@...ux.ibm.com>,
	Heiko Carstens <hca@...ux.ibm.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Justin Stitt <justinstitt@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Leon Romanovsky <leon@...nel.org>,
	linux-rdma@...r.kernel.org, linux-s390@...r.kernel.org,
	llvm@...ts.linux.dev, Ingo Molnar <mingo@...hat.com>,
	Bill Wendling <morbo@...gle.com>,
	Nathan Chancellor <nathan@...nel.org>,
	Nick Desaulniers <ndesaulniers@...gle.com>, netdev@...r.kernel.org,
	Paolo Abeni <pabeni@...hat.com>,
	Salil Mehta <salil.mehta@...wei.com>,
	Sven Schnelle <svens@...ux.ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
	Yisen Zhuang <yisen.zhuang@...wei.com>,
	Arnd Bergmann <arnd@...db.de>,
	Leon Romanovsky <leonro@...lanox.com>, linux-arch@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org,
	Mark Rutland <mark.rutland@....com>,
	Michael Guralnik <michaelgur@...lanox.com>, patches@...ts.linux.dev,
	Niklas Schnelle <schnelle@...ux.ibm.com>,
	Jijie Shao <shaojijie@...wei.com>
Subject: Re: [PATCH v3 6/6] IB/mlx5: Use __iowrite64_copy() for write
 combining stores

On Tue, Jul 15, 2025 at 11:15:25AM +0100, Will Deacon wrote:
> > Since STP was rejected alread we've only tested the Neon version. It
> > does make a huge improvement, but it still somehow fails to combine
> > rarely sometimes. The CPU is really bad at this :(
> 
> I think the thread was from last year so I've forgotten most of the
> details, but wasn't STP rejected because it wasn't virtualisable? 

Yes, that was the claim.

> In which case, doesn't NEON suffer from exactly the same (or possibly
> worse) problem?

In general yes, in specific no.

mlx5 (and other RDMA devices) have long used Neon for MMIO in
userspace, so any VMM assigning mlx5 devices simply must make this
work - it is already not optional. So we know that all VMs out there
with mlx5 support neon for mlx5, and it is safe for mlx5 to use.

Typically this is trivally done in a VMM by never emulating mlx5's
MMIO space. If the VMM takes a fault on a MMIO page it fixes the fault
and restarts the neon instruction.

The generality was the notion that there could be other devices in a
VM that are fully emulated and using these challenging instructions
would break the simple emulation. This is why the general purpose
__iowrite64_copy() didn't use STP.

> Also, have you managed to investigate why the CPU tends not to get this
> right? 

I have asked but our CPU architects have said it is too complex to
analyze, but they admit it doesn't work entirely well :(

The belief is some micro-architectural condition is breaking it as we
see even neon instructions failing during every test.

They say it is fully fixed with ST64B in the future.

> Do we e.g. end up taking interrupts/exceptions while the self
> test is running or something like that?

I doubt it, the test is running in kernel mode during boot for
hundreds of iterations. An interrupt on every interation is not
likely. Any single successful combine is a pass for the test.

Even an interrupt shouldn't disrupt a single instruction Neon store,
yet we can still mesure a low rate of neon failures.

> Sorry for the wall of questions!

No worries! It's weird and definately complicated.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ