linux-kernel - Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54E0B857.30902@gmail.com>
Date:	Sun, 15 Feb 2015 16:16:39 +0100
From:	"Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>
To:	Omar Sandoval <osandov@...ndov.com>, Fam Zheng <famz@...hat.com>
CC:	mtk.manpages@...il.com, linux-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Kees Cook <keescook@...omium.org>,
	Andy Lutomirski <luto@...capital.net>,
	David Herrmann <dh.herrmann@...il.com>,
	Alexei Starovoitov <ast@...mgrid.com>,
	Miklos Szeredi <mszeredi@...e.cz>,
	David Drysdale <drysdale@...gle.com>,
	Oleg Nesterov <oleg@...hat.com>,
	"David S. Miller" <davem@...emloft.net>,
	Vivek Goyal <vgoyal@...hat.com>,
	Mike Frysinger <vapier@...too.org>,
	Theodore Ts'o <tytso@....edu>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	Rasmus Villemoes <linux@...musvillemoes.dk>,
	Rashika Kheria <rashika.kheria@...il.com>,
	Hugh Dickins <hughd@...gle.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-fsdevel@...r.kernel.org, linux-api@...r.kernel.org,
	Josh Triplett <josh@...htriplett.org>,
	Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch
 and epoll_pwait1

On 02/13/2015 10:53 AM, Omar Sandoval wrote:
> On Fri, Feb 13, 2015 at 05:03:56PM +0800, Fam Zheng wrote:
>> Hi all,
>>
>> This is the updated series for the new epoll system calls, with the cover
>> letter rewritten which includes some more explanation. Comments are very
>> welcome!
>>
>> Original Motivation
>> ===================
>>
>> QEMU, and probably many more select/poll based applications, will consider
>> epoll as an alternative, when its event loop needs to handle a big number of
>> fds. However, there are currently two concerns with epoll which prevents the
>> switching from ppoll to epoll. 
>>
>> The major one is the timeout precision.
>>
>> For example in QEMU, the main loop takes care of calling callbacks at a
>> specific timeout - the QEMU timer API. The timeout value in ppoll depends on
>> the next firing timer. epoll_pwait's millisecond timeout is so coarse that
>> rounding up the timeout will hurt performance badly.
>>
>> The minor one is the number of system call to update fd set.
>>
>> While epoll can handle a large number of fds quickly, it still requires one
>> epoll_ctl per fd update, compared to the one-shot call to select/poll with an
>> fd array. This may as well make epoll inferior to ppoll in the cases where a
>> small, but frequently changing set of fds are polled by the event loop.
>>
>> This series introduces two new epoll APIs to address them respectively. The
>> idea of epoll_ctl_batch is suggested by Andy Lutomirski in [1], who also
>> suggested clockid as a parameter in epoll_pwait1.
>>
>> Discussion
>> ==========
>>
>> [Note: This is the question part regarding the interface contract of
>> epoll_ctl_batch.  If you don't have the context of what is epoll_ctl_batch yet,
>> please skip this part and probably start with the man page style documentation.
>> You can resume to this section later.]
>>
>> [Thanks to Omar Sandoval <osandov@...ndov.com>, who pointed out this in
>> reviewing v1]
>>
>> We try to report status for each command in epoll_ctl_batch, by writting to
>> user provided command array (pointed to cmds). The tricky thing in the
>> implementation is that, copying the results back to userspace comes last, after
>> the commands are executed. At this point, if the copy_to_user fails, the
>> effects are done and no return - or if we add some mechanism to revert it, the
>> code will be too complicated and slow.
>>
>> In above corner case, the return value of epoll_ctl_batch is smaller than
>> ncmds, which assures our caller that last N commands failed, where N = ncmds -
>> ret.  But they'll also find that cmd.result is not changed to error code.
>>
>> I suppose this is not a big deal, because 1) it should happen very rarely. 2)
>> user does know the actual number of commands that succeed.
>>
>> So, do we leave it as is? Or is there any way to improve?
>>
>> One tiny improvement (not a complete fix) in my mind is a testing copy_to_user
>> before we execute the commands. If it succeeds, it's even less likely the last
>> copy_to_user could fail, so that we can even probably assert it won't. The
>> testing 'copy_to_user(cmds, &kcmds, ...)' will not hurt functionality if do it
>> right after 'copy_from_user(&kcmds, cmds, ...)'. But I'm not sure about the
>> performance impact, especially when @ncmds is big.
>>
> I don't think this is the right thing to do, since, for example, another
> thread could unmap the memory region holding buffer between the "check"
> copy_to_user and the actual one.
> 
> The two alternatives that I see are:
> 
> 1. If copy_to_user fails, ignore it and return the number of commands
> that succeeded (i.e., what the code in your patch does).
> 2. If copy_to_user fails, return -EFAULT, regardless of how many
> commands succeeded.
> 
> The problem with 1 is that it could potentially mask bugs in a user
> program. You could imagine a buggy program that passes a read-only
> buffer to epoll_ctl_batch and never finds out about it because they
> don't get the error. (Then, when there's a real error in one of the
> epoll_ctl_cmds, but .result is 0, someone will be very confused.)
> 
> So I think 2 is the better option. Sure, the user will have no idea how
> many commands were executed, but when would EFAULT on this system call
> be part of normal operation, anyways? You'll find the memory bug, fix
> it, and rest easy knowing that nothing is silently failing behind your
> back.

What Omar says makes sense to me too. Best to have the user get a clear 
error indication for this case.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/