netdev - Re: [PATCH 2/3] x86_64: Define 128-bit memory-mapped I/O operations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFy=ohFoZ1yZGdj1DCcNXU9gsndgH9Qod4q8+s=sbGKUzQ@mail.gmail.com>
Date:	Tue, 21 Aug 2012 20:52:50 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	"H. Peter Anvin" <hpa@...or.com>
Cc:	David Miller <davem@...emloft.net>, bhutchings@...arflare.com,
	tglx@...utronix.de, mingo@...hat.com, netdev@...r.kernel.org,
	linux-net-drivers@...arflare.com, x86@...nel.org
Subject: Re: [PATCH 2/3] x86_64: Define 128-bit memory-mapped I/O operations

On Tue, Aug 21, 2012 at 8:24 PM, H. Peter Anvin <hpa@...or.com> wrote:
>
>> I continually see more and more code that has to check this
>> irq_fpu_usable() thing, and have ugly fallback code, and therefore is
>> a sign that this really needs to be fixed properly.
>>
>
> [Cc: Linus, since he has had very strong opinions on this in the past.]

I have strong opinions, because I don't think you *can* do it any
other way on x86.

Saving FPU state on kernel entry IS NOT GOING TO HAPPEN. End of story.
It's too f*cking slow.

And if we don't save it, we cannot use the FP state in kernel mode
without explicitly checking, because the FPU state may not be there,
and touching FP registers can cause exceptions etc. In fact, under
many loads the exception is going to be the common case.

Keeping the FP state live all the time (and then perhaps just saving a
single MMX register for kernel use) and switching it synchronously on
every task switch sounds like a really really bad idea too. It
apparently works better on modern CPU's, but it sucked horribly on
older ones. The FPU state save is horribly expensive, to the point
that we saw it very clearly on benchmarks for signal handling and task
switching.

So the normal operation is going to be "touching the FPU causes a fault".

That said, we might *try* to make the kernel FPU use be cheaper. We
could do something like:

 - if we need to restore FPU state for user mode, don't do it
synchronously in kernel_fpu_end(), just set a work-flag, and do it
just once before returning to user mode

 - that makes kernel_fpu_end() a no-op, and while we can't get rid of
kernel_fpu_begin(), we could at least make it cheap to do many times
consecutively (ie we'd have to check "who owns FPU state" and perhaps
save it, but the save would have to be done only the first time).

I haven't seen the patch being discussed, or the rationale for it. But
I doubt it makes sense to do 128-bit MMIO and expect any kind of
atomicity things anyway, and I very much doubt that using SSE would
make all that much sense. What's the background, and why would you
want to do this crap?

              Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html