lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALs-Hsvu7BsK8P0+xeuLmKEqg-q=kQANbf8FkiPGPhwhnSXpmA@mail.gmail.com>
Date:   Wed, 13 Sep 2023 10:46:13 -0700
From:   Evan Green <evan@...osinc.com>
To:     Geert Uytterhoeven <geert@...ux-m68k.org>
Cc:     Palmer Dabbelt <palmer@...osinc.com>,
        Heiko Stuebner <heiko@...ech.de>, linux-doc@...r.kernel.org,
        Björn Töpel <bjorn@...osinc.com>,
        Conor Dooley <conor.dooley@...rochip.com>,
        Guo Ren <guoren@...nel.org>,
        Jisheng Zhang <jszhang@...nel.org>,
        linux-riscv@...ts.infradead.org, Jonathan Corbet <corbet@....net>,
        Sia Jee Heng <jeeheng.sia@...rfivetech.com>,
        Marc Zyngier <maz@...nel.org>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Greentime Hu <greentime.hu@...ive.com>,
        Simon Hosie <shosie@...osinc.com>,
        Andrew Jones <ajones@...tanamicro.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Alexandre Ghiti <alexghiti@...osinc.com>,
        Ley Foon Tan <leyfoon.tan@...rfivetech.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Anup Patel <apatel@...tanamicro.com>,
        linux-kernel@...r.kernel.org,
        Xianting Tian <xianting.tian@...ux.alibaba.com>,
        David Laight <David.Laight@...lab.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Andy Chiu <andy.chiu@...ive.com>
Subject: Re: [PATCH v4 1/2] RISC-V: Probe for unaligned access speed

On Wed, Sep 13, 2023 at 5:36 AM Geert Uytterhoeven <geert@...ux-m68k.org> wrote:
>
> Hi Evan,
>
> On Fri, Aug 18, 2023 at 9:44 PM Evan Green <evan@...osinc.com> wrote:
> > Rather than deferring unaligned access speed determinations to a vendor
> > function, let's probe them and find out how fast they are. If we
> > determine that an unaligned word access is faster than N byte accesses,
> > mark the hardware's unaligned access as "fast". Otherwise, we mark
> > accesses as slow.
> >
> > The algorithm itself runs for a fixed amount of jiffies. Within each
> > iteration it attempts to time a single loop, and then keeps only the best
> > (fastest) loop it saw. This algorithm was found to have lower variance from
> > run to run than my first attempt, which counted the total number of
> > iterations that could be done in that fixed amount of jiffies. By taking
> > only the best iteration in the loop, assuming at least one loop wasn't
> > perturbed by an interrupt, we eliminate the effects of interrupts and
> > other "warm up" factors like branch prediction. The only downside is it
> > depends on having an rdtime granular and accurate enough to measure a
> > single copy. If we ever manage to complete a loop in 0 rdtime ticks, we
> > leave the unaligned setting at UNKNOWN.
> >
> > There is a slight change in user-visible behavior here. Previously, all
> > boards except the THead C906 reported misaligned access speed of
> > UNKNOWN. C906 reported FAST. With this change, since we're now measuring
> > misaligned access speed on each hart, all RISC-V systems will have this
> > key set as either FAST or SLOW.
> >
> > Currently, we don't have a way to confidently measure the difference between
> > SLOW and EMULATED, so we label anything not fast as SLOW. This will
> > mislabel some systems that are actually EMULATED as SLOW. When we get
> > support for delegating misaligned access traps to the kernel (as opposed
> > to the firmware quietly handling it), we can explicitly test in Linux to
> > see if unaligned accesses trap. Those systems will start to report
> > EMULATED, though older (today's) systems without that new SBI mechanism
> > will continue to report SLOW.
> >
> > I've updated the documentation for those hwprobe values to reflect
> > this, specifically: SLOW may or may not be emulated by software, and FAST
> > represents means being faster than equivalent byte accesses. The change
> > in documentation is accurate with respect to both the former and current
> > behavior.
> >
> > Signed-off-by: Evan Green <evan@...osinc.com>
> > Acked-by: Conor Dooley <conor.dooley@...rochip.com>
>
> Thanks for your patch, which is now commit 584ea6564bcaead2 ("RISC-V:
> Probe for unaligned access speed") in v6.6-rc1.
>
> On the boards I have, I get:
>
>     rzfive:
>         cpu0: Ratio of byte access time to unaligned word access is
> 1.05, unaligned accesses are fast

Hrm, I'm a little surprised to be seeing this number come out so close
to 1. If you reboot a few times, what kind of variance do you get on
this?

>
>     icicle:
>
>         cpu1: Ratio of byte access time to unaligned word access is
> 0.00, unaligned accesses are slow
>         cpu2: Ratio of byte access time to unaligned word access is
> 0.00, unaligned accesses are slow
>         cpu3: Ratio of byte access time to unaligned word access is
> 0.00, unaligned accesses are slow
>
>         cpu0: Ratio of byte access time to unaligned word access is
> 0.00, unaligned accesses are slow
>
>     k210:
>
>         cpu1: Ratio of byte access time to unaligned word access is
> 0.02, unaligned accesses are slow
>         cpu0: Ratio of byte access time to unaligned word access is
> 0.02, unaligned accesses are slow
>
>     starlight:
>
>         cpu1: Ratio of byte access time to unaligned word access is
> 0.01, unaligned accesses are slow
>         cpu0: Ratio of byte access time to unaligned word access is
> 0.02, unaligned accesses are slow
>
>     vexriscv/orangecrab:
>
>         cpu0: Ratio of byte access time to unaligned word access is
> 0.00, unaligned accesses are slow
>
> I am a bit surprised by the near-zero values.  Are these expected?
> Thanks!

This could be expected, if firmware is trapping the unaligned accesses
and coming out >100x slower than a native access. If you're interested
in getting a little more resolution, you could try to print a few more
decimal places with something like (sorry gmail mangles the whitespace
on this):

diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 1cfbba65d11a..2c094037658a 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -632,11 +632,11 @@ void check_unaligned_access(int cpu)
        if (word_cycles < byte_cycles)
                speed = RISCV_HWPROBE_MISALIGNED_FAST;

-       ratio = div_u64((byte_cycles * 100), word_cycles);
-       pr_info("cpu%d: Ratio of byte access time to unaligned word
access is %d.%02d, unaligned accesses are %s\n",
+       ratio = div_u64((byte_cycles * 100000), word_cycles);
+       pr_info("cpu%d: Ratio of byte access time to unaligned word
access is %d.%05d, unaligned accesses are %s\n",
                cpu,
-               ratio / 100,
-               ratio % 100,
+               ratio / 100000,
+               ratio % 100000,
                (speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow");

        per_cpu(misaligned_access_speed, cpu) = speed;

If you did, I'd be interested to see the results.
-Evan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ