[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAObL_7GmMPBKa4QwcnUeehowzvTUdqXN7yswAfXsy8q71ZJzmg@mail.gmail.com>
Date: Mon, 15 Aug 2011 11:36:42 -0400
From: Andrew Lutomirski <luto@....edu>
To: Borislav Petkov <bp@...en8.de>
Cc: melwyn lobo <linux.melwyn@...il.com>,
Denys Vlasenko <vda.linux@...glemail.com>,
Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
"H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
borislav.petkov@....com
Subject: Re: x86 memcpy performance
On Mon, Aug 15, 2011 at 11:29 AM, Borislav Petkov <bp@...en8.de> wrote:
> On Mon, 15 August, 2011 4:59 pm, Andy Lutomirski wrote:
>>>> So what is the reason we cannot use sse_memcpy in interrupt context.
>>>> (fpu registers not saved ? )
>>>
>>> Because, AFAICT, when we handle an #NM exception while running
>>> sse_memcpy in an IRQ handler, we might need to allocate FPU save state
>>> area, which in turn, can sleep. Then, we might get another IRQ while
>>> sleeping and we should be deadlocked.
>>>
>>> But let me stress on the "AFAICT" above, someone who actually knows the
>>> FPU code should correct me if I'm missing something.
>>
>> I don't think you ever get #NM as a result of kernel_fpu_begin, but you
>> can certainly have problems when kernel_fpu_begin nests by accident.
>> There's irq_fpu_usable() for this.
>>
>> (irq_fpu_usable() reads cr0 sometimes and I suspect it can be slow.)
>
> Oh I didn't know about irq_fpu_usable(), thanks.
>
> But still, irq_fpu_usable() still checks !in_interrupt() which means
> that we don't want to run SSE instructions in IRQ context. OTOH, we
> still are fine when running with CR0.TS. So what happens when we get an
> #NM as a result of executing an FPU instruction in an IRQ handler? We
> will have to do init_fpu() on the current task if the last hasn't used
> math yet and do the slab allocation of the FPU context area (I'm looking
> at math_state_restore, btw).
IIRC kernel_fpu_begin does clts, so #NM won't happen. But if we're in
an interrupt and TS=1, when we know that we're not in a
kernel_fpu_begin section, so it's safe to start one (and do clts).
IMO this code is not very good, and I plan to fix it sooner or later.
I want kernel_fpu_begin (or its equivalent*) to be very fast and
usable from any context whatsoever. Mucking with TS is slower than a
complete save and restore of YMM state.
(*) kernel_fpu_begin is a bad name. It's only safe to use integer
instructions inside a kernel_fpu_begin section because MXCSR (and the
387 equivalent) could contain garbage.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists