lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 12 Jun 2021 12:17:22 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Palmer Dabbelt' <palmer@...belt.com>,
        "akira.tsukamoto@...il.com" <akira.tsukamoto@...il.com>
CC:     "akira.tsukamoto@...il.com" <akira.tsukamoto@...il.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        "aou@...s.berkeley.edu" <aou@...s.berkeley.edu>,
        "gary@...yguo.net" <gary@...yguo.net>,
        "nickhu@...estech.com" <nickhu@...estech.com>,
        "nylon7@...estech.com" <nylon7@...estech.com>,
        "linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH 1/1] riscv: prevent pipeline stall in
 __asm_to/copy_from_user

From: Palmer Dabbelt
> Sent: 12 June 2021 05:05
...
> > I don't know the architecture, but unless there is a stunning
> > pipeline delay for memory reads a simple interleaved copy
> > may be fast enough.
> > So something like:
> > 	a = src[0];
> > 	do {
> > 		b = src[1];
> > 		src += 2;
> > 		dst[0] = a;
> > 		dst += 2;
> > 		a = src[0];
> > 		dst[-1] = b;
> > 	} while (src != src_end);
> > 	dst[0] = a;
> >
> > It is probably worth doing benchmarks of the copy loop
> > in userspace.
> 
> I also don't know this microarchitecture, but this seems like a pretty
> wacky load-use delay.

It is quite sane really.

While many cpu can use the result of the ALU in the next clock
(there is typically special logic to bypass the write to the
register file) this isn't always true for memory (cache) reads.
It may even be that the read itself takes more than one cycle
(probably pipelined so they can happen every cycle).

So a simple '*dest = *src' copy loop suffers the 'memory read'
penalty ever iteration.
At out-of-order execution unit that uses register renames
(like most x86) will just defer the writes until the data
is available - so isn't impacted.

Interleaving the reads and writes means you issue a read
while waiting for the value from the previous read to
get to the register file - and be available for the
write instruction.

Moving the 'src/dst += 2' into the loop gives a reasonable
chance that they are executed in parallel with a memory
access (on in-order superscaler cpu) rather than bunching
them up at the end where the start adding clocks.

If your cpu can only do one memory read or one memory write
per clock then you only need it to execute two instructions
per clock for the loop above to run at maximum speed.
Even with a 'read latency' of two clocks.
(Especially since riscv has 'mips like' 'compare and branch'
instructions that probably execute in 1 clock when predicted
taken.)

If the cpu can do a read and a write in one clock then the
loop may still run at the maximum speed.
For this to happen you do need he read data to be available
next clock and to run load, store, add and compare instructions
in a single clock.
Without that much parallelism it might be necessary to add
an extra read/write interleave (an maybe a 4th to avoid a
divide by three).

The 'elephant in the room' is a potential additional stall
on reads if the previous cycle is a write to the same cache area.
For instance the nios2 (a soft cpu for altera fpga) can do
back to back reads or back to back writes, but since the reads
are done speculatively (regardless of the opcode!) they have to
be deferred when a write is using the memory block.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ