linux-kernel - Re: x86 memcpy performance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAObL_7GmMPBKa4QwcnUeehowzvTUdqXN7yswAfXsy8q71ZJzmg@mail.gmail.com>
Date:	Mon, 15 Aug 2011 11:36:42 -0400
From:	Andrew Lutomirski <luto@....edu>
To:	Borislav Petkov <bp@...en8.de>
Cc:	melwyn lobo <linux.melwyn@...il.com>,
	Denys Vlasenko <vda.linux@...glemail.com>,
	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	borislav.petkov@....com
Subject: Re: x86 memcpy performance

On Mon, Aug 15, 2011 at 11:29 AM, Borislav Petkov <bp@...en8.de> wrote:
> On Mon, 15 August, 2011 4:59 pm, Andy Lutomirski wrote:
>>>> So what is the reason we cannot use sse_memcpy in interrupt context.
>>>> (fpu registers not saved ? )
>>>
>>> Because, AFAICT, when we handle an #NM exception while running
>>> sse_memcpy in an IRQ handler, we might need to allocate FPU save state
>>> area, which in turn, can sleep. Then, we might get another IRQ while
>>> sleeping and we should be deadlocked.
>>>
>>> But let me stress on the "AFAICT" above, someone who actually knows the
>>> FPU code should correct me if I'm missing something.
>>
>> I don't think you ever get #NM as a result of kernel_fpu_begin, but you
>> can certainly have problems when kernel_fpu_begin nests by accident.
>> There's irq_fpu_usable() for this.
>>
>> (irq_fpu_usable() reads cr0 sometimes and I suspect it can be slow.)
>
> Oh I didn't know about irq_fpu_usable(), thanks.
>
> But still, irq_fpu_usable() still checks !in_interrupt() which means
> that we don't want to run SSE instructions in IRQ context. OTOH, we
> still are fine when running with CR0.TS. So what happens when we get an
> #NM as a result of executing an FPU instruction in an IRQ handler? We
> will have to do init_fpu() on the current task if the last hasn't used
> math yet and do the slab allocation of the FPU context area (I'm looking
> at math_state_restore, btw).

IIRC kernel_fpu_begin does clts, so #NM won't happen.  But if we're in
an interrupt and TS=1, when we know that we're not in a
kernel_fpu_begin section, so it's safe to start one (and do clts).

IMO this code is not very good, and I plan to fix it sooner or later.
I want kernel_fpu_begin (or its equivalent*) to be very fast and
usable from any context whatsoever.  Mucking with TS is slower than a
complete save and restore of YMM state.

(*)  kernel_fpu_begin is a bad name.  It's only safe to use integer
instructions inside a kernel_fpu_begin section because MXCSR (and the
387 equivalent) could contain garbage.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/