[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <abdde70ac5b947508c8c71d72ec4f294@AcuMS.aculab.com>
Date: Fri, 15 Sep 2023 07:57:22 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Evan Green' <evan@...osinc.com>
CC: Geert Uytterhoeven <geert@...ux-m68k.org>,
Palmer Dabbelt <palmer@...osinc.com>,
Heiko Stuebner <heiko@...ech.de>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
Björn Töpel <bjorn@...osinc.com>,
Conor Dooley <conor.dooley@...rochip.com>,
Guo Ren <guoren@...nel.org>,
Jisheng Zhang <jszhang@...nel.org>,
"linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>,
Jonathan Corbet <corbet@....net>,
"Sia Jee Heng" <jeeheng.sia@...rfivetech.com>,
Marc Zyngier <maz@...nel.org>,
"Masahiro Yamada" <masahiroy@...nel.org>,
Greentime Hu <greentime.hu@...ive.com>,
"Simon Hosie" <shosie@...osinc.com>,
Andrew Jones <ajones@...tanamicro.com>,
"Albert Ou" <aou@...s.berkeley.edu>,
Alexandre Ghiti <alexghiti@...osinc.com>,
"Ley Foon Tan" <leyfoon.tan@...rfivetech.com>,
Paul Walmsley <paul.walmsley@...ive.com>,
Anup Patel <apatel@...tanamicro.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Xianting Tian <xianting.tian@...ux.alibaba.com>,
Palmer Dabbelt <palmer@...belt.com>,
"Andy Chiu" <andy.chiu@...ive.com>
Subject: RE: [PATCH v4 1/2] RISC-V: Probe for unaligned access speed
From: Evan Green
> Sent: 14 September 2023 17:37
>
> On Thu, Sep 14, 2023 at 8:55 AM David Laight <David.Laight@...lab.com> wrote:
> >
> > From: Evan Green
> > > Sent: 14 September 2023 16:01
> > >
> > > On Thu, Sep 14, 2023 at 1:47 AM David Laight <David.Laight@...lab.com> wrote:
> > > >
> > > > From: Geert Uytterhoeven
> > > > > Sent: 14 September 2023 08:33
> > > > ...
> > > > > > > rzfive:
> > > > > > > cpu0: Ratio of byte access time to unaligned word access is
> > > > > > > 1.05, unaligned accesses are fast
> > > > > >
> > > > > > Hrm, I'm a little surprised to be seeing this number come out so close
> > > > > > to 1. If you reboot a few times, what kind of variance do you get on
> > > > > > this?
> > > > >
> > > > > Rock-solid at 1.05 (even with increased resolution: 1.05853 on 3 tries)
> > > >
> > > > Would that match zero overhead unless the access crosses a
> > > > cache line boundary?
> > > > (I can't remember whether the test is using increasing addresses.)
> > >
> > > Yes, the test does use increasing addresses, it copies across 4 pages.
> > > We start with a warmup, so caching effects beyond L1 are largely not
> > > taken into account.
> >
> > That seems entirely excessive.
> > If you want to avoid data cache issues (which probably do)
> > then just repeating a single access would almost certainly
> > suffice.
> > Repeatedly using a short buffer (say 256 bytes) won't add
> > much loop overhead.
> > Although you may want to do a test that avoids transfers
> > that cross cache line and especially page boundaries.
> > Either of those could easily be much slower than a read
> > that is entirely within a cache line.
>
> We won't be faulting on any of these pages, and they should remain in
> the TLB, so I don't expect many page boundary specific effects. If
> there is a steep penalty for misaligned loads across a cache line,
> such that it's worse than doing byte accesses, I want the test results
> to be dinged for that.
That is an entirely different issue.
Are you absolutely certain that the reason 8 byte loads take
as long as a 64-bit mis-aligned load isn't because the entire
test is limited by L1 cache fills?
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists