linux-kernel - Re: x86 memcpy performance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3175fc05afa1c2b0defd935d0fec22d5.squirrel@www.skyhub.de>
Date:	Mon, 15 Aug 2011 20:49:56 +0200 (CEST)
From:	"Borislav Petkov" <bp@...en8.de>
To:	"Andrew Lutomirski" <luto@....edu>
Cc:	"Borislav Petkov" <bp@...en8.de>,
	"melwyn lobo" <linux.melwyn@...il.com>,
	"Denys Vlasenko" <vda.linux@...glemail.com>,
	"Ingo Molnar" <mingo@...e.hu>, linux-kernel@...r.kernel.org,
	"H. Peter Anvin" <hpa@...or.com>,
	"Thomas Gleixner" <tglx@...utronix.de>,
	"Linus Torvalds" <torvalds@...ux-foundation.org>,
	"Peter Zijlstra" <a.p.zijlstra@...llo.nl>, borislav.petkov@....com
Subject: Re: x86 memcpy performance

On Mon, 15 August, 2011 7:04 pm, Andrew Lutomirski wrote:
>> Well, I had a SSE memcpy which saved/restored the XMM regs on the stack.
>> This would obviate the need to muck with contexts but that could get
>> expensive wrt stack operations. The advantage is that I'm not dealing
>> with the whole FPU state but only with 16 XMM regs. I should probably
>> dust off that version again and retest.
>
> I bet it won't be a significant win.  On Sandy Bridge, clts/stts takes
> 80 ns and a full state save+restore is only ~60 ns.
> Without infrastructure changes, I don't think you can avoid the clts
> and stts.

Yeah, probably.

> You might be able to get away with turning off IRQs, reading CR0 to
> check TS, pushing XMM regs, and being very certain that you don't
> accidentally generate any VEX-coded instructions.

That's ok - I'm using movaps/movups. But, the problem is that I still
need to save FPU state if the task I'm interrupting has been using FPU
instructions. So, I can't get away without saving the context in which
case I don't need to save the XMM regs anyway.

>> Or, if we want to use SSE stuff in the kernel, we might think of
>> allocating its own FPU context(s) and handle those...
>
> I'm thinking of having a stack of FPU states to parallel irq stacks
> and IST stacks.

... I'm guessing with the same nesting as hardirqs? Making FPU
instructions usable in irq contexts too.

> It gets a little hairy when code inside kernel_fpu_begin traps for a
> non-irq non-IST reason, though.

How does that happen? You're in the kernel with preemption disabled and
TS cleared, what would cause the #NM? I think that if you need to switch
context, you simply "push" the current FPU context, allocate a new one
and clts as part of the FPU context switching, no?

> Fortunately, those are rare and all of the EX_TABLE users could mark
> xmm regs as clobbered (except for copy_from_user...).

Well, copy_from_user... does a bunch of rep; movsq - if the SSE version
shows reasonable speedup there, we might need to make those work too.

> Keeping kernel_fpu_begin non-preemptable makes it less bad because the
> extra FPU state can be per-cpu and not per-task.

Yep.

> This is extra fun on 32 bit, which IIRC doesn't have IST stacks.
>
> The major speedup will come from saving state in kernel_fpu_begin but
> not restoring it until the code in entry_??.S restores registers.

But you'd need to save each kernel FPU state when nesting, no?

>>> (*) kernel_fpu_begin is a bad name. It's only safe to use integer
>>> instructions inside a kernel_fpu_begin section because MXCSR (and the
>>> 387 equivalent) could contain garbage.
>>
>> Well, do we want to use floating point instructions in the kernel?
>
> The only use I could find is in staging.

Exactly my point - I think we should do it only when it's really worth
the trouble.

-- 
Regards/Gruss,
Boris.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/