netdev - RE: [PATCH 4/6] arm64/io: Provide a WC friendly __iowriteXX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <18248cc6f411441c8a68a55f68416150@AcuMS.aculab.com>
Date: Fri, 23 Feb 2024 13:52:37 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Jason Gunthorpe' <jgg@...dia.com>
CC: 'Niklas Schnelle' <schnelle@...ux.ibm.com>, Alexander Gordeev
	<agordeev@...ux.ibm.com>, Andrew Morton <akpm@...ux-foundation.org>,
	Christian Borntraeger <borntraeger@...ux.ibm.com>, Borislav Petkov
	<bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, "David S. Miller"
	<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Gerald Schaefer
	<gerald.schaefer@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>, "Heiko
 Carstens" <hca@...ux.ibm.com>, "H. Peter Anvin" <hpa@...or.com>, Justin Stitt
	<justinstitt@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Leon Romanovsky
	<leon@...nel.org>, "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
	"linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
	"llvm@...ts.linux.dev" <llvm@...ts.linux.dev>, Ingo Molnar
	<mingo@...hat.com>, Bill Wendling <morbo@...gle.com>, Nathan Chancellor
	<nathan@...nel.org>, Nick Desaulniers <ndesaulniers@...gle.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, Paolo Abeni
	<pabeni@...hat.com>, Salil Mehta <salil.mehta@...wei.com>, Jijie Shao
	<shaojijie@...wei.com>, Sven Schnelle <svens@...ux.ibm.com>, Thomas Gleixner
	<tglx@...utronix.de>, "x86@...nel.org" <x86@...nel.org>, Yisen Zhuang
	<yisen.zhuang@...wei.com>, Arnd Bergmann <arnd@...db.de>, Catalin Marinas
	<catalin.marinas@....com>, Leon Romanovsky <leonro@...lanox.com>,
	"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, Mark Rutland <mark.rutland@....com>,
	Michael Guralnik <michaelgur@...lanox.com>, "patches@...ts.linux.dev"
	<patches@...ts.linux.dev>, Will Deacon <will@...nel.org>
Subject: RE: [PATCH 4/6] arm64/io: Provide a WC friendly __iowriteXX_copy()

From: Jason Gunthorpe
> Sent: 23 February 2024 13:03
> 
> On Fri, Feb 23, 2024 at 12:19:24PM +0000, David Laight wrote:
> 
> > Since writes get 'posted' all over the place.
> > How many writes do you need to do before write-combining makes a
> > difference?
> 
> The issue is that the HW can optimize if the entire transaction is
> presented in one TLP, if it has to reassemble the transaction it takes
> a big slow path hit.

Ah, so you aren't optimising to reduce the number of TLP for
(effectively) a write to a memory buffer, but have a pcie slave
that really want to see (for example) the writes for a ring buffer
entry in a single TLP?

So you really want something that (should) generate a 16 (or 32)
byte TLP? Rather than abusing the function that is expected to
generate multiple 8 byte TLP to generate larger TLP.

I'm guessing that on arm64 the ldp/stp instructions will generate
a single 16 byte TLP regardless of write combining?
They would definitely help memcpy_fromio().

Are they enough for arm64?
Getting but TLP on x86 is probably harder.
(Unless you use AVX512 registers and aligned accesses.)

It is rather a shame that there isn't an efficient way to get
access to a couple of large SIMD registers.
(eg save on stack and have the fpu code where they are for
a lazy fpu switch.)
There is quite a bit of code that would benefit, but kernel_fpu_begin()
is just too expensive.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)