lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALs-Hstcz3OAxUi80nm+U0R56VBUUPQT=+XMOLpVJsn2ZOcM1A@mail.gmail.com>
Date:   Thu, 14 Sep 2023 08:01:01 -0700
From:   Evan Green <evan@...osinc.com>
To:     David Laight <David.Laight@...lab.com>
Cc:     Geert Uytterhoeven <geert@...ux-m68k.org>,
        Palmer Dabbelt <palmer@...osinc.com>,
        Heiko Stuebner <heiko@...ech.de>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        Björn Töpel <bjorn@...osinc.com>,
        Conor Dooley <conor.dooley@...rochip.com>,
        Guo Ren <guoren@...nel.org>,
        Jisheng Zhang <jszhang@...nel.org>,
        "linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>,
        Jonathan Corbet <corbet@....net>,
        Sia Jee Heng <jeeheng.sia@...rfivetech.com>,
        Marc Zyngier <maz@...nel.org>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Greentime Hu <greentime.hu@...ive.com>,
        Simon Hosie <shosie@...osinc.com>,
        Andrew Jones <ajones@...tanamicro.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Alexandre Ghiti <alexghiti@...osinc.com>,
        Ley Foon Tan <leyfoon.tan@...rfivetech.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Anup Patel <apatel@...tanamicro.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Xianting Tian <xianting.tian@...ux.alibaba.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Andy Chiu <andy.chiu@...ive.com>
Subject: Re: [PATCH v4 1/2] RISC-V: Probe for unaligned access speed

On Thu, Sep 14, 2023 at 1:47 AM David Laight <David.Laight@...lab.com> wrote:
>
> From: Geert Uytterhoeven
> > Sent: 14 September 2023 08:33
> ...
> > > >     rzfive:
> > > >         cpu0: Ratio of byte access time to unaligned word access is
> > > > 1.05, unaligned accesses are fast
> > >
> > > Hrm, I'm a little surprised to be seeing this number come out so close
> > > to 1. If you reboot a few times, what kind of variance do you get on
> > > this?
> >
> > Rock-solid at 1.05 (even with increased resolution: 1.05853 on 3 tries)
>
> Would that match zero overhead unless the access crosses a
> cache line boundary?
> (I can't remember whether the test is using increasing addresses.)

Yes, the test does use increasing addresses, it copies across 4 pages.
We start with a warmup, so caching effects beyond L1 are largely not
taken into account.

>
> ...
> > > >     vexriscv/orangecrab:
> > > >
> > > >         cpu0: Ratio of byte access time to unaligned word access is
> > > > 0.00, unaligned accesses are slow
> >
> > cpu0: Ratio of byte access time to unaligned word access is 0.00417,
> > unaligned accesses are slow
> >
> > > > I am a bit surprised by the near-zero values.  Are these expected?
> > >
> > > This could be expected, if firmware is trapping the unaligned accesses
> > > and coming out >100x slower than a native access. If you're interested
> > > in getting a little more resolution, you could try to print a few more
> > > decimal places with something like (sorry gmail mangles the whitespace
> > > on this):
>
> I'd expect one of three possible values:
> - 1.0x: Basically zero cost except for cache line/page boundaries.
> - ~2: Hardware does two reads and merges the values.
> - >100: Trap fixed up in software.
>
> I'd think the '2' case could be considered fast.
> You only need to time one access to see if it was a fault.

We're comparing misaligned word accesses with byte accesses of the
same total size. So 1.0 means a misaligned load is basically no
different from 8 byte loads. The goal was to help people that are
forced to do odd loads and stores decide whether they are better off
moving by bytes or by misaligned words. (In contrast, the answer to
"should I do a misaligned word load or an aligned word load" is
generally always "do the aligned one if you can", so comparing those
two things didn't seem as useful).

We opted for 1.0 as a cutoff, since even at 1.05, you get a boost from
doing misaligned word loads over byte copies. I asked about the
variance because I don't want to see machines that change their mind
from boot to boot. I originally considered trying to create a "gray
zone" where the answer goes back to UNKNOWN, but in the end that just
moves the fiddly point rather than really eliminating it.

You're right that in theory we just need one perfect access to test,
but testing only once makes it susceptible to hiccups. We went with
doing it many times in a fixed period and taking the minimum to
hopefully remove noise like NMI-like things, branch prediction misses,
or cache eviction.

Geert,
Thanks for providing the numbers. Yes, we could add another digit to
the print. Though if you already know you're at least 100x slower,
maybe knowing exactly how much slower isn't super meaningful, just
very much avoid unaligned accesses on these systems :). Hopefully over
time the number of systems like this will dwindle.

-Evan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ