[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <176714835.9396.1530219040151.JavaMail.zimbra@efficios.com>
Date: Thu, 28 Jun 2018 16:50:40 -0400 (EDT)
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Will Deacon <will.deacon@....com>
Cc: linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Arnd Bergmann <arnd@...db.de>,
Peter Zijlstra <peterz@...radead.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Boqun Feng <boqun.feng@...il.com>,
Catalin Marinas <catalin.marinas@....com>,
peter maydell <peter.maydell@...aro.org>,
Mark Rutland <mark.rutland@....com>
Subject: Re: [PATCH 3/3] rseq/selftests: Add support for arm64
----- On Jun 28, 2018, at 12:47 PM, Will Deacon will.deacon@....com wrote:
> Hi Mathieu,
>
> On Tue, Jun 26, 2018 at 12:11:52PM -0400, Mathieu Desnoyers wrote:
>> ----- On Jun 26, 2018, at 11:14 AM, Will Deacon will.deacon@....com wrote:
>> > On Mon, Jun 25, 2018 at 02:10:10PM -0400, Mathieu Desnoyers wrote:
>> >> I notice you are using the instructions
>> >>
>> >> adrp
>> >> add
>> >> str
>> >>
>> >> to implement RSEQ_ASM_STORE_RSEQ_CS(). Did you compare
>> >> performance-wise with an approach using a literal pool
>> >> near the instruction pointer like I did on arm32 ?
>> >
>> > I didn't, no. Do you have a benchmark to hand so I can give this a go?
>>
>> see tools/testing/selftests/rseq/param_test_benchmark --help
>>
>> It's a stripped-down version of param_test, without all the code for
>> delay loops and testing checks.
>>
>> Example use for counter increment with 4 threads, doing 5G counter
>> increments per thread:
>>
>> time ./param_test_benchmark -T i -t 4 -r 5000000000
>
> Thanks. I ran that on a few arm64 systems I have access to, with three
> configurations of the selftest:
>
> 1. As I posted
> 2. With the abort signature and branch in-lined, so as to avoid the CBNZ
> address limitations in large codebases
> 3. With both the abort handler and the table inlined (i.e. the same thing
> as 32-bit).
>
> There isn't a reliably measurable difference between (1) and (2), but I take
> between 12% and 27% hit between (2) and (3).
Those results puzzle me. Do you have the actual code snippets of each
implementation nearby ?
Thanks,
Mathieu
>
> So I'll post a v2 based on (2).
>
> Will
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Powered by blists - more mailing lists