linux-kernel - RE: [PATCH 1/2] RISC-V: Probe for unaligned access speed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <88d0b61ca01d4c1a87a17d3c34baa2a6@AcuMS.aculab.com>
Date:   Sun, 25 Jun 2023 21:42:07 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Evan Green' <evan@...osinc.com>,
        Palmer Dabbelt <palmer@...osinc.com>
CC:     Simon Hosie <shosie@...osinc.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Alexandre Ghiti <alexghiti@...osinc.com>,
        Andrew Jones <ajones@...tanamicro.com>,
        Andy Chiu <andy.chiu@...ive.com>,
        Anup Patel <apatel@...tanamicro.com>,
        Conor Dooley <conor.dooley@...rochip.com>,
        Greentime Hu <greentime.hu@...ive.com>,
        Guo Ren <guoren@...nel.org>,
        "Heiko Stuebner" <heiko.stuebner@...ll.eu>,
        Jisheng Zhang <jszhang@...nel.org>,
        Jonathan Corbet <corbet@....net>,
        Ley Foon Tan <leyfoon.tan@...rfivetech.com>,
        Li Zhengyu <lizhengyu3@...wei.com>,
        "Masahiro Yamada" <masahiroy@...nel.org>,
        Palmer Dabbelt <palmer@...belt.com>,
        "Paul Walmsley" <paul.walmsley@...ive.com>,
        Sia Jee Heng <jeeheng.sia@...rfivetech.com>,
        Sunil V L <sunilvl@...tanamicro.com>,
        Xianting Tian <xianting.tian@...ux.alibaba.com>,
        Yangyu Chen <cyy@...self.name>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>
Subject: RE: [PATCH 1/2] RISC-V: Probe for unaligned access speed

From: Evan Green
> Sent: 23 June 2023 23:20
> 
> Rather than deferring misaligned access speed determinations to a vendor
> function, let's probe them and find out how fast they are. If we
> determine that a misaligned word access is faster than N byte accesses,
> mark the hardware's misaligned access as "fast".
> 
> Fix the documentation as well to reflect this bar. Previously the only
> SoC that returned "fast" was the THead C906. The change to the
> documentation is more a clarification, since the C906 is fast in the
> sense of the corrected documentation.
> 
> Signed-off-by: Evan Green <evan@...osinc.com>
> ---
...
> diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> index 19165ebd82ba..710325751766 100644
> --- a/Documentation/riscv/hwprobe.rst
> +++ b/Documentation/riscv/hwprobe.rst
> @@ -88,12 +88,12 @@ The following keys are defined:
>      always extremely slow.
> 
>    * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned accesses are supported
> -    in hardware, but are slower than the cooresponding aligned accesses
> -    sequences.
> +    in hardware, but are slower than N byte accesses, where N is the native
> +    word size.
> 
>    * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned accesses are supported
> -    in hardware and are faster than the cooresponding aligned accesses
> -    sequences.
> +    in hardware and are faster than N byte accesses, where N is the native
> +    word size.

I think I'd just say 'faster/slower than using byte accesses' (ie no N).

There are two obvious FAST cases:
1) the misaligned access takes an extra clock - worth aligning copies.
2) the misaligned access is pretty much as fast as an aligned one.

Even if you find it hard to distinguish them you should probably
allow for them both.

x86 (on Intel (non-atom) cpu) is definitely in the latter camp.
Misaligned copies are measurable slower - but not enough to
ever worry about.
I think that misaligned transfers get spilt into 8 byte accesses
(pretty irrelevant in the kernel) and then accesses that cross
cache line boundaries are split on the boundary.
With pipelined writes and two reads/clock it doesn't often
make a measurable difference.
That is definitely what I see for uncached accesses to PCIe space.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)