[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <BB2CD8D5-E7FF-4D25-8C83-F64960253248@amacapital.net>
Date: Thu, 27 Sep 2018 09:26:59 -0700
From: Andy Lutomirski <luto@...capital.net>
To: "Jason A. Donenfeld" <Jason@...c4.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Ard Biesheuvel <ard.biesheuvel@...aro.org>,
LKML <linux-kernel@...r.kernel.org>,
Netdev <netdev@...r.kernel.org>,
Linux Crypto Mailing List <linux-crypto@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Samuel Neves <sneves@....uc.pt>,
Andrew Lutomirski <luto@...nel.org>,
Jean-Philippe Aumasson <jeanphilippe.aumasson@...il.com>,
Russell King - ARM Linux <linux@...linux.org.uk>,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH net-next v6 07/23] zinc: ChaCha20 ARM and ARM64 implementations
> On Sep 27, 2018, at 8:19 AM, Jason A. Donenfeld <Jason@...c4.com> wrote:
>
> Hey again Thomas,
>
>> On Thu, Sep 27, 2018 at 3:26 PM Jason A. Donenfeld <Jason@...c4.com> wrote:
>>
>> Hi Thomas,
>>
>> I'm trying to optimize this for crypto performance while still taking
>> into account preemption concerns. I'm having a bit of trouble figuring
>> out a way to determine numerically what the upper bounds for this
>> stuff looks like. I'm sure I could pick a pretty sane number that's
>> arguably okay -- and way under the limit -- but I still am interested
>> in determining what that limit actually is. I was hoping there'd be a
>> debugging option called, "warn if preemption is disabled for too
>> long", or something, but I couldn't find anything like that. I'm also
>> not quite sure what the latency limits are, to just compute this with
>> a formula. Essentially what I'm trying to determine is:
>>
>> preempt_disable();
>> asm volatile(".fill N, 1, 0x90;");
>> preempt_enable();
>>
>> What is the maximum value of N for which the above is okay? What
>> technique would you generally use in measuring this?
>>
>> Thanks,
>> Jason
>
> From talking to Peter (now CC'd) on IRC, it sounds like what you're
> mostly interested in is clocktime latency on reasonable hardware, with
> a goal of around ~20µs as a maximum upper bound? I don't expect to get
> anywhere near this value at all, but if you can confirm that's a
> decent ballpark, it would make for some interesting calculations.
>
>
I would add another consideration: if you can get better latency with negligible overhead (0.1%? 0.05%), then that might make sense too. For example, it seems plausible that checking need_resched() every few blocks adds basically no overhead, and the SIMD helpers could do this themselves or perhaps only ever do a block at a time.
need_resched() costs a cacheline access, but it’s usually a hot cacheline, and the actual check is just whether a certain bit in memory is set.
Powered by blists - more mailing lists