lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 26 Nov 2007 18:14:40 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	"H. Peter Anvin" <hpa@...or.com>
cc:	Ulrich Drepper <drepper@...hat.com>,
	David Miller <davem@...emloft.net>,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	mingo@...e.hu, tglx@...utronix.de
Subject: Re: [PATCHv4 5/6] Allow setting O_NONBLOCK flag for new sockets



On Mon, 26 Nov 2007, H. Peter Anvin wrote:
> 
> I'm presuming you're not talking about some sort of syslets/fibrils/threadlets
> here (executing an interpreted thread of execution in kernel space).  That's a
> whole separate ball of wax.

Indeed. 

I'm hoping that just dies. It's too complex. But the "do this single 
system call asynchronously" isn't, and has lots of historical 
implementations, ranging from VMS to the braindead POSIX "aio" setup.

I do think that more complex threadlets could be useful in theory, I just 
doubt they'd be used in practice..

> > So the choice is basically one of:
> > 
> >  - come up with a totally new interface to system calls, and effectively
> > duplicating the whole system call table.
> > 
> >    I'd hate to do this. We already have duplicated system call tables due
> > to compat stuff, it's painful.
> 
> This would be the right thing to do if we were to redesign the system call
> interface from the ground up, which it doesn't exactly sound like we are
> intending.

Yeah. I'm also not sure it's the right thing even if we did redesign from 
scratch.

The current system call interface may look less than regular, but it has 
some very solid foundation: it's fast. Passing arguments in registers is 
by definition a lot faster *and*safer* than passing them any other way. 
There are no subtle security issues with people playing games with the 
argument base pointer (ie usually the stack pointer) and trying to fool 
the kernel into accessing kernel memory etc.

Immediately when you do anything but registers, it is much *much* more 
costly. The "get_user()" and "copy_from_user()" stuff is not exactly slow, 
but it's quite noticeable overhead for simple system calls. It gets worse 
if this all is described by some indirect table setup.

In the system call path, right now, for some system calls, the biggest two 
overheads are

 - the CPU system call overhead itself. We can't do much about this, but 
   the CPU designers do seem to be slowly getting it fixed (ie it's slower 
   than it should need to be, but it's a hell of a lot faster than a P4 
   used to be ;)

 - the cost of just the single indirect - and unpredictable - call.

(The second cost is actually often totally hidden in the trivial system 
call benchmarks people run: if the benchmark just does "getppid()" a 
million times in a tight loop, the indirect call on the system call number 
seems really quite fast, but outside of benchmarks it is generally totally 
unpredictable indeed, and a real cost for real-life system call usage).

Everything else in the system call path is generally as fast as we can 
make it. Doing more indirection and conditionals would be really quite 
nasty.

Of course, for *most* of system calls, the work the kernel actually does 
ends up being so big that it doesn't much matter, but I was literally 
chasing down why a page fault had slowed down by ~70 cycles two weeks ago. 
And it doesn't take more than a couple of unpredictable jumps to do things 
like that!

> The 6-word limit is a red herring.  There is at least two ways to deal with it
> (and this doesn't mean wiping the legacy stuff we already have):
> 
> - Let each architecture pick a calling convention and redefine the
> architecture-independent bits to take an arbitrary number of arguments.  This
> is a one-time panarchitectural change.

Not applicable on x86-32.

The six-word limit is effectively a hardware limit there. Once it goes 
past that limit, one of the words needs to be a pointer to extended 
information that is fundamentally slower to access. Happily, only very 
rare system calls do that (and none of them are of the simple variety 
where we see a few cycles easily).

On other architectures, we could more easily just use more registers. But 
x86-32 is still a big part (bulk) of what matters for most people.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ