linux-kernel - Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0702051223390.8424@woody.linux-foundation.org>
Date:	Mon, 5 Feb 2007 12:39:14 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Davide Libenzi <davidel@...ilserver.org>
cc:	Zach Brown <zach.brown@...cle.com>, Ingo Molnar <mingo@...e.hu>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-aio@...ck.org, Suparna Bhattacharya <suparna@...ibm.com>,
	Benjamin LaHaise <bcrl@...ck.org>
Subject: Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

On Mon, 5 Feb 2007, Davide Libenzi wrote:
>
> No, that's *really* it ;)
>
> The cookie you pass, and the return code of the syscall.
> If there other data transfered? Sure, but that data transfered during the 
> syscall processing, and handled by the syscall (filling up a sys_read 
> buffer just for example).

Indeed. One word is *exactly* what a normal system call returns too.

That said, normally we have a user-space library layer to turn that into 
the "errno + return value" thing, and in the case of async() calls we 
very basically wouldn't have that. So either:

 - we'd need to do it in the kernel (which is actually nasty, since 
   different system calls have slightly different semantics - some don't 
   return any error value at all, and negative numbers are real numbers)

 - we'd have to teach user space about the "negative errno" mechanism, in 
   which case one word really is alwats enough.

Quite frankly, I much prefer the second alternative. The "negative errno" 
thing has not only worked really really well inside the kernel, it's so 
obviously 100% superior to the standard UNIX "-1 + errno" approach that 
it's not even funny. 

To see why "negative errno" is better, just look at any threaded program, 
or look at any program that does multiple calls and needs to save the 
errno not from the last one, but from some earlier one (eg, it does a 
"close()" in between returning *its* error, and the real operation that 
we care about).

> Did I miss something? The async() syscall will allow (with few 
> restrictions) to execute whatever syscall in an async fashion. An syscall 
> returns a result code (long). Plus, you need to pass back the 
> userspace-provided cookie of course.

HOWEVER, they get returned differently. The cookie gets returned 
immediately, the system call result gets returned in-memory only after the 
async thing has actually completed.

I would actually argue that it's not the kernel that should generate any 
cookie, but that user-space should *pass*in* the cookie it wants to, and 
the kernel should consider it a pointer to a 64-bit entity which is the 
return code.

In other words, the only "cookie" we need is literally the pointer to the 
results. And that's obviously something that the user space has to set up 
anyway.

So how about making the async interface be:

	// returns negative for error
	// zero for "synchronous"
	// positive kernel "wait for me" cookie for success
	long sys_async_submit(
		unsigned long flags,
		long *user_result_ptr,
		long syscall,
		unsigned long *args);

and the "args" thing would literally just fill up the registers.

The real downside here is that it's very architecture-specific this way, 
and that means that x86-64 (and other 64-bit ones) would need to have 
emulation layers for the 32-bit ones, but they likely need to do that 
*anyway*, so it's probably not a huge downside. The alternative is to:

 - make a new architecture-independent system call enumeration for the 
   async interface

 - make everything use 64-bit values.

Now, making an architecture-independent system call enumeration may 
actually make sense regardless, because it would allow sys_async() to have 
its own system call table and put the limitations and rules for those 
system calls there, instead of depending on the per-architecture system 
call table that tends to have some really architecture-specific details.

Hmm?

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/