lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <88d0b61ca01d4c1a87a17d3c34baa2a6@AcuMS.aculab.com>
Date:   Sun, 25 Jun 2023 21:42:07 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Evan Green' <evan@...osinc.com>,
        Palmer Dabbelt <palmer@...osinc.com>
CC:     Simon Hosie <shosie@...osinc.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Alexandre Ghiti <alexghiti@...osinc.com>,
        Andrew Jones <ajones@...tanamicro.com>,
        Andy Chiu <andy.chiu@...ive.com>,
        Anup Patel <apatel@...tanamicro.com>,
        Conor Dooley <conor.dooley@...rochip.com>,
        Greentime Hu <greentime.hu@...ive.com>,
        Guo Ren <guoren@...nel.org>,
        "Heiko Stuebner" <heiko.stuebner@...ll.eu>,
        Jisheng Zhang <jszhang@...nel.org>,
        Jonathan Corbet <corbet@....net>,
        Ley Foon Tan <leyfoon.tan@...rfivetech.com>,
        Li Zhengyu <lizhengyu3@...wei.com>,
        "Masahiro Yamada" <masahiroy@...nel.org>,
        Palmer Dabbelt <palmer@...belt.com>,
        "Paul Walmsley" <paul.walmsley@...ive.com>,
        Sia Jee Heng <jeeheng.sia@...rfivetech.com>,
        Sunil V L <sunilvl@...tanamicro.com>,
        Xianting Tian <xianting.tian@...ux.alibaba.com>,
        Yangyu Chen <cyy@...self.name>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>
Subject: RE: [PATCH 1/2] RISC-V: Probe for unaligned access speed

From: Evan Green
> Sent: 23 June 2023 23:20
> 
> Rather than deferring misaligned access speed determinations to a vendor
> function, let's probe them and find out how fast they are. If we
> determine that a misaligned word access is faster than N byte accesses,
> mark the hardware's misaligned access as "fast".
> 
> Fix the documentation as well to reflect this bar. Previously the only
> SoC that returned "fast" was the THead C906. The change to the
> documentation is more a clarification, since the C906 is fast in the
> sense of the corrected documentation.
> 
> Signed-off-by: Evan Green <evan@...osinc.com>
> ---
...
> diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> index 19165ebd82ba..710325751766 100644
> --- a/Documentation/riscv/hwprobe.rst
> +++ b/Documentation/riscv/hwprobe.rst
> @@ -88,12 +88,12 @@ The following keys are defined:
>      always extremely slow.
> 
>    * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned accesses are supported
> -    in hardware, but are slower than the cooresponding aligned accesses
> -    sequences.
> +    in hardware, but are slower than N byte accesses, where N is the native
> +    word size.
> 
>    * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned accesses are supported
> -    in hardware and are faster than the cooresponding aligned accesses
> -    sequences.
> +    in hardware and are faster than N byte accesses, where N is the native
> +    word size.

I think I'd just say 'faster/slower than using byte accesses' (ie no N).

There are two obvious FAST cases:
1) the misaligned access takes an extra clock - worth aligning copies.
2) the misaligned access is pretty much as fast as an aligned one.

Even if you find it hard to distinguish them you should probably
allow for them both.

x86 (on Intel (non-atom) cpu) is definitely in the latter camp.
Misaligned copies are measurable slower - but not enough to
ever worry about.
I think that misaligned transfers get spilt into 8 byte accesses
(pretty irrelevant in the kernel) and then accesses that cross
cache line boundaries are split on the boundary.
With pipelined writes and two reads/clock it doesn't often
make a measurable difference.
That is definitely what I see for uncached accesses to PCIe space.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ