lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <44e4e70491164ef5b777d06f48b6684f@AcuMS.aculab.com>
Date:   Tue, 15 Jun 2021 14:08:27 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Bin Meng' <bmeng.cn@...il.com>, Gary Guo <gary@...yguo.net>
CC:     Palmer Dabbelt <palmer@...belt.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        "aou@...s.berkeley.edu" <aou@...s.berkeley.edu>,
        "nickhu@...estech.com" <nickhu@...estech.com>,
        "nylon7@...estech.com" <nylon7@...estech.com>,
        "linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] riscv: fix memmove and optimise memcpy when misalign

From: Bin Meng
> Sent: 15 June 2021 14:40
...
> > I prefer C versions as well, and actually before commit 04091d6 we are
> > indeed using the generic C version. The issue is that 04091d6
> > introduces an assembly version that's very broken. It does not offer
> > and performance improvement to the C version, and breaks all processors
> > without hardware misalignment support

There may need to be a few C implementations for different cpu
instruction sets.
While the compiler might manage to DTRT (or the wrong thing given
the right source) using a loop that matches the instruction set
is a good idea.

For instance, x86 can do *(reg_1 + reg_2 * (1|2|4|8) + constant)
so you can increment reg_2 and use it for both buffers while
still unrolling enough to hit memory bandwidth.

With only *(reg_1 + constant) you need to increment both the
source and destination addresses.

OTOH you can save an instruction on x86 by adding to 'reg_2'
until it becomes zero (so you don't need add, cmp and jmp).

But a mips-like instruction set (includes riscv and nios2)
has 'compare and branch' so you only ever need one instruction
at the end of the loop.

Having to handle misaligned copies is another distinct issue.
For some 32bit cpu byte copies may be as fast as any shift
and mask code.

> > (yes, firmware is expected to
> > trap and handle these, but they are painfully slow).

Yes, to the point where the system should just panic and
force you to fix the code.

When I were a lad we forced everyone to fix there code
so it would run on sparc.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ