[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b9a23f8947014fdbb625d67134ed796d@AcuMS.aculab.com>
Date: Fri, 30 Jun 2023 08:29:42 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Evan Green' <evan@...osinc.com>
CC: Jessica Clarke <jrtc27@...c27.com>,
Palmer Dabbelt <palmer@...osinc.com>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
Yangyu Chen <cyy@...self.name>,
Conor Dooley <conor.dooley@...rochip.com>,
Guo Ren <guoren@...nel.org>,
Jisheng Zhang <jszhang@...nel.org>,
linux-riscv <linux-riscv@...ts.infradead.org>,
Jonathan Corbet <corbet@....net>,
"Xianting Tian" <xianting.tian@...ux.alibaba.com>,
Masahiro Yamada <masahiroy@...nel.org>,
Greentime Hu <greentime.hu@...ive.com>,
Simon Hosie <shosie@...osinc.com>,
Li Zhengyu <lizhengyu3@...wei.com>,
Andrew Jones <ajones@...tanamicro.com>,
Albert Ou <aou@...s.berkeley.edu>,
Alexandre Ghiti <alexghiti@...osinc.com>,
Ley Foon Tan <leyfoon.tan@...rfivetech.com>,
"Paul Walmsley" <paul.walmsley@...ive.com>,
Heiko Stuebner <heiko.stuebner@...ll.eu>,
Anup Patel <apatel@...tanamicro.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Sia Jee Heng <jeeheng.sia@...rfivetech.com>,
Palmer Dabbelt <palmer@...belt.com>,
"Andy Chiu" <andy.chiu@...ive.com>
Subject: RE: [PATCH 1/2] RISC-V: Probe for unaligned access speed
...
> Yeah, one thing I could do is disable interrupts, measure the cycle
> count of doing an individual iteration, do this N times, and take the
> minimum value as the time to compare. In the end I'll then have two
> numbers to compare, like I do in this patch. In theory the variance on
> that should be really tight. N will have to depend on the overall
> amount of time I'm taking so as not to shut interrupts off for very
> long. Let me experiment with this and see how the results look.
> -Evan
I doubt you'll need many iterations or a long test.
You can do tests in userspace without disabling pre-emption
or interrupts - the large/silly values they generate are
easily ignored.
I suspect you'll get enough info from something like:
unsigned long x[2];
volatile unsigned long *p = (void *)((unsigned char *)x + 1);
full_cpu_barrier()
start = rdtsc();
full_cpu_barrier();
*p; *p; *p; *p; *p; *p; *p; *p;
*p; *p; *p; *p; *p; *p; *p; *p;
full_cpu_barrier()
elapsed = rdtsc() - start;
Once the i-cache is loaded it should be pretty constant.
For aligned addresses I'd expect each extra '*p' to be
one more clock.
With hardware support for misaligned transfers at most
2 clocks (test on x86 and it will be 1 clock).
The emulated version will be 100s or 1000s.
I'm not sure how much of a cpu barrier you need.
Definitely needs to wait for all memory accesses
and the rdtsc().
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists