[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53dc6959cc8849d6b66676ad48c1376a@AcuMS.aculab.com>
Date: Thu, 29 Jun 2023 12:05:14 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Evan Green' <evan@...osinc.com>,
Jessica Clarke <jrtc27@...c27.com>
CC: Palmer Dabbelt <palmer@...osinc.com>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
Yangyu Chen <cyy@...self.name>,
Conor Dooley <conor.dooley@...rochip.com>,
Guo Ren <guoren@...nel.org>,
Jisheng Zhang <jszhang@...nel.org>,
linux-riscv <linux-riscv@...ts.infradead.org>,
"Jonathan Corbet" <corbet@....net>,
Xianting Tian <xianting.tian@...ux.alibaba.com>,
Masahiro Yamada <masahiroy@...nel.org>,
Greentime Hu <greentime.hu@...ive.com>,
Simon Hosie <shosie@...osinc.com>,
Li Zhengyu <lizhengyu3@...wei.com>,
Andrew Jones <ajones@...tanamicro.com>,
Albert Ou <aou@...s.berkeley.edu>,
Alexandre Ghiti <alexghiti@...osinc.com>,
"Ley Foon Tan" <leyfoon.tan@...rfivetech.com>,
Paul Walmsley <paul.walmsley@...ive.com>,
Heiko Stuebner <heiko.stuebner@...ll.eu>,
Anup Patel <apatel@...tanamicro.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Sia Jee Heng <jeeheng.sia@...rfivetech.com>,
Palmer Dabbelt <palmer@...belt.com>,
Andy Chiu <andy.chiu@...ive.com>
Subject: RE: [PATCH 1/2] RISC-V: Probe for unaligned access speed
From: Evan Green
> Sent: 27 June 2023 20:12
>
> On Mon, Jun 26, 2023 at 2:42 PM Jessica Clarke <jrtc27@...c27.com> wrote:
> >
> > On 23 Jun 2023, at 23:20, Evan Green <evan@...osinc.com> wrote:
> > >
> > > Rather than deferring misaligned access speed determinations to a vendor
> > > function, let's probe them and find out how fast they are. If we
> > > determine that a misaligned word access is faster than N byte accesses,
> > > mark the hardware's misaligned access as "fast".
> >
> > How sure are you that your measurements can be extrapolated and aren’t
> > an artefact of the testing process? For example, off the top of my head:
> >
> > * The first run will potentially be penalised by data cache misses,
> > untrained prefetchers, TLB misses, branch predictors, etc. compared
> > with later runs. You have one warmup, but who knows how many
> > iterations it will take to converge?
>
> I'd expect the cache penalties to be reasonably covered by a single
> warmup. You're right about branch prediction, which is why I tried to
> use a large-ish buffer size, minimize the ratio of conditionals to
> loads/stores, and do the test for a decent number of iterations (on my
> THead, about 1800 and 400 for words and bytes).
>
> When I ran the test a handful of times, I did see variation on the
> order of ~5%. But the comparison of the two numbers doesn't seem to be
> anywhere near that margin (THead C906 was ~4x faster doing misaligned
> word accesses, others with slow misaligned accesses also reporting
> numbers not anywhere close to each other).
Isn't the EMULATED case so much slower than anything else that
it is even pretty obvious from a single access?
(Possibly the 2nd access to avoid 'cold cache'.)
One of the things that can perturb measurements is hardware
interrupts. That can be mitigated by counting clocks for a few
(10 is plenty) iterations of a short request and taking the
fastest value.
For short hot-cache code sequences you can actually compare the
actual clock counts with theoretical minimum values.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists