linux-kernel - Re: [PATCH 0/4] Really lazy fpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4C189927.1010402@redhat.com>
Date:	Wed, 16 Jun 2010 12:28:07 +0300
From:	Avi Kivity <avi@...hat.com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Arjan van de Ven <arjan@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Fr??d??ric Weisbecker <fweisbec@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Nick Piggin <npiggin@...e.de>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Mike Galbraith <efault@....de>,
	"H. Peter Anvin" <hpa@...or.com>, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/4] Really lazy fpu

On 06/16/2010 11:39 AM, Ingo Molnar wrote:
> (Cc:-ed various performance/optimization folks)
>
> * Avi Kivity<avi@...hat.com>  wrote:
>
>    
>> On 06/16/2010 10:32 AM, H. Peter Anvin wrote:
>>      
>>> On 06/16/2010 12:24 AM, Avi Kivity wrote:
>>>        
>>>> Ingo, Peter, any feedback on this?
>>>>          
>>> Conceptually, this makes sense to me.  However, I have a concern what
>>> happens when a task is scheduled on another CPU, while its FPU state is
>>> still in registers in the original CPU.  That would seem to require
>>> expensive IPIs to spill the state in order for the rescheduling to
>>> proceed, and this could really damage performance.
>>>        
>> Right, this optimization isn't free.
>>
>> I think the tradeoff is favourable since task migrations are much
>> less frequent than context switches within the same cpu, can the
>> scheduler experts comment?
>>      
> This cannot be stated categorically without precise measurements of
> known-good, known-bad, average FPU usage and average CPU usage scenarios. All
> these workloads have different characteristics.
>
> I can imagine bad effects across all sorts of workloads: tcpbench, AIM7,
> various lmbench components, X benchmarks, tiobench - you name it. Combined
> with the fact that most micro-benchmarks wont be using the FPU, while in the
> long run most processes will be using the FPU due to SIMM instructions. So
> even a positive result might be skewed in practice. Has to be measured
> carefully IMO - and i havent seen a _single_ performance measurement in the
> submission mail. This is really essential.
>    

I have really no idea what to measure.  Which would you most like to see?

> So this does not look like a patch-set we could apply without gathering a
> _ton_ of hard data about advantages and disadvantages.
>    

I agree (not to mention that I'm not really close to having an applyable 
patchset).

Note some of the advantages will not be in throughput but in latency 
(making kernel_fpu_begin() preemptible, and reducing context switch time 
for event threads).

>> We can also mitigate some of the IPIs if we know that we're migrating on the
>> cpu we're migrating from (i.e. we're pushing tasks to another cpu, not
>> pulling them from their cpu).  Is that a common case, and if so, where can I
>> hook a call to unlazy_fpu() (or its new equivalent)?
>>      
> When the system goes from idle to less idle then most of the 'fast' migrations
> happen on a 'push' model - on a busy CPU we wake up a new task and push it out
> to a known-idle CPU. At that point we can indeed unlazy the FPU with probably
> little cost.
>    

Can you point me to the code which does this?

> But on busy servers where most wakeups are IRQ based the chance of being on
> the right CPU is 1/nr_cpus - i.e. decreasing with every new generation of
> CPUs.
>    

But don't we usually avoid pulls due to NUMA and cache considerations?

> If there's some sucky corner case in theory we could approach it statistically
> and measure the ratio of fast vs. slow migration vs. local context switches -
> but that looks a bit complex.
>
>    

I certainly wouldn't want to start with it.

> Dunno.
>    

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/