lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f2b55d220702241152o640abaf3jc10afefaae8476cb@mail.gmail.com>
Date:	Sat, 24 Feb 2007 11:52:41 -0800
From:	"Michael K. Edwards" <medwards.linux@...il.com>
To:	"Ingo Molnar" <mingo@...e.hu>
Cc:	"Evgeniy Polyakov" <johnpol@....mipt.ru>,
	"Ulrich Drepper" <drepper@...hat.com>,
	linux-kernel@...r.kernel.org,
	"Linus Torvalds" <torvalds@...ux-foundation.org>,
	"Arjan van de Ven" <arjan@...radead.org>,
	"Christoph Hellwig" <hch@...radead.org>,
	"Andrew Morton" <akpm@....com.au>,
	"Alan Cox" <alan@...rguk.ukuu.org.uk>,
	"Zach Brown" <zach.brown@...cle.com>,
	"David S. Miller" <davem@...emloft.net>,
	"Suparna Bhattacharya" <suparna@...ibm.com>,
	"Davide Libenzi" <davidel@...ilserver.org>,
	"Jens Axboe" <jens.axboe@...cle.com>,
	"Thomas Gleixner" <tglx@...utronix.de>
Subject: Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

On 2/23/07, Ingo Molnar <mingo@...e.hu> wrote:
> > This is a fundamental misconception. [...]
>
> > The scheduler, on the other hand, has to blow and reload all of the
> > hidden state associated with force-loading the PC and wherever your
> > architecture keeps its TLS (maybe not the whole TLB, but not nothing,
> > either). [...]
>
> please read up a bit more about how the Linux scheduler works. Maybe
> even read the code if in doubt? In any case, please direct kernel newbie
> questions to http://kernelnewbies.org/, not linux-kernel@...r.kernel.org.

This is not the first kernel I've swum around in, and I've been
mucking with the Linux kernel since early 2.2 and coding assembly for
heavily pipelined processors on and off since 1990.  So I may be a
newbie to your lingo, and I may even be a loud-mouthed idiot, but I'm
not a wet-behind-the-ears undergrad, OK?

Now, I've addressed the non-free-ness of a TLS swap elsewhere; what
about function pointers in state machines (with or without flipping
"supervisor mode" bits)?  Just because loading the PC from a data
register is one opcode in the instruction stream does not mean that it
is not quite expensive in terms of blown pipeline state and I-cache
stalls.  Really fast state machines exploit PC-relative branches that
really smart CPUs can speculatively execute past (after a few
traversals) because there are a small number of branch targets
actually hit.  The instruction prefetch / scheduler unit actually
keeps a table of PC-relative jump instructions found in I-cache, with
a little histogram of destinations eventually branched to, and
speculatively executes down the top branch or two.  (Intel Pentiums
have a fairly primitive but effective variant of this; see
http://www.x86.org/articles/branch/branchprediction.htm.)

More general mechanisms are called "branch target buffers" and US
Patent 6609194 is a good hook into the literature.  A sufficiently
smart CPU designer may have figured out how to do something similar
with computed jumps (add pc, pc, foo), but odds are high that it cuts
out when you throw function pointers around.  Syscall dispatch is a
special and heavily optimized case, though -- so it's quite
conceivable that a well designed userland switch/case state machine
that makes syscalls will outperform an in-kernel state machine data
structure traversal.  If this doesn't happen to be true on today's
desktop, it may be on tomorrow's desktop or today's NUMA monstrosity
or embedded mega-multi-MIPS.

There can also be other reasons why tabulated PC-relative jumps and
immediate PC loads are faster than PC loads from data registers.
Take, for instance, the Transmeta Crusoe, which (AIUI) used a trick
similar to the FX!32 x86 emulation on Alpha/NT.  If you're going to
"translate" CISC to RISC on the fly, you're going to recognize
switch/case idioms (including tabulated PC-relative branches), and fix
up the translated branch table to contain offsets to the
RISC-translated branch targets.  So the state transitions are just as
cheap as if they had been compiled to RISC in the first place.  Do it
with function pointers, and the the execution machine is going to have
to stall while it looks up the text location to see if it has it
translated in I-cache somewhere.  Guess what:  the PIV works the same
way (http://www.karbosguide.com/books/pcarchitecture/chapter12.htm).

Are you starting to get the picture that syslets -- clever as they
might have been on a VAX -- defeat many of the mechanisms that CPU and
compiler architects have negotiated over decades for accelerating real
code?  Especially now that we have hyper-threaded CPUs (parallel
instruction decode/issue units sharing almost all of their cache
hierarchy), you can almost treat the kernel as if it were microcode
for a syscall coprocessor.  If you try to migrate application code
across the syscall boundary, you may perform well on micro-benchmarks
but you're storing up trouble for the future.

If you don't think this kind of fallout is real, talk to whoever had
the bright idea of hijacking FPU registers to implement memcpy in
1996.  The PIII designers rolled over and added XMM so
micro-optimizers would get their dirty mitts off the FPU, which it
appears that Doug Ledford and Jim Blandy duly acted on in 1999.  Yes,
you still need to use FXSAVE/FXRSTOR when you want to mess with the
XMM stuff, but the CPU is smart enough to keep a shadow copy of all
the microstate that the flag states represent.  So if all you do
between FXSAVE and FXRSTOR is shlep bytes around with MOVAPS, the
FXRSTOR costs you little or nothing.  What hurts is an FXRSTOR from a
location that isn't the last location you FXSAVEd to, or an FXRSTOR
after actual FP arithmetic instructions have altered status flags.

The preceding may contain errors in detail -- I am neither a CPU
architect nor an x86 compiler writer nor even a serious kernel hacker.
 But hopefully it's at least food for thought.  If not, you know where
the "ignore this prolix nitwit" key is to be found on your keyboard.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ