lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 2 Aug 2012 18:37:06 -0700
From:	"Paton J. Lewis" <palewis@...be.com>
To:	Christof Meerwald <cmeerw@...erw.org>
CC:	Alexander Viro <viro@...iv.linux.org.uk>,
	Jason Baron <jbaron@...hat.com>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Paul Holland <pholland@...be.com>,
	Davide Libenzi <davidel@...ilserver.org>
Subject: Re: [PATCH] epoll: Improved support for multi-threaded clients

Christof,

I notice that Windows (via I/O Completion Ports) and both BSD and 
OS/X (via kqueue) all appear to have support for both of the concepts 
we have been discussing: 1) the ability to disable epoll items, and 
2) the ability to send custom events. This suggests that either 
solution may be acceptable to a broader community.

I implemented and tested the method of using a new custom epoll event 
as a way to signal that an epoll item should be deleted. Although 
this alternative technique works, the approach has several drawbacks 
that I feel make it less desirable than the original proposal.

My first concern is about code clarity. Using a custom event to 
delete an event type (either EPOLLIN or EPOLLOUT) from an epoll item 
requires that functionality to be split across two areas of code: the 
code that requests the deletion (via the call to epoll_ctl), and the 
code that responds to it (via epoll_wait).

However, my main concern is about performance. Handling a custom 
event means that each return from epoll_wait requires the responding 
thread to check for possible custom events, which in the case of 
deletion is going to be relatively rare. Thus code which was once 
purely concerned with responding to I/O events must now spend a 
fraction of its time testing for exceptional conditions. In addition, 
handling deletion in this manner now requires a thread or context switch.

Given the drawbacks listed above, and the kernel design philosophy of 
only implementing what is actually needed, I would argue for sticking 
with the original EPOLL_CTL_DISABLE proposal for now.

Pat

At 6/29/2012 02:43 PM, Paton J. Lewis wrote:
>At 6/19/2012 11:17 AM, Christof Meerwald wrote:
>>Hi Paton,
>>
>>On Mon, Jun 18, 2012 at 04:24:35PM -0700, Paton J. Lewis wrote:
>> > We believe that EPOLLONESHOT is required in order to make any
>> > sensible use of calling epoll_wait on a single epoll set
>> > concurrently in multiple threads.
>>
>>I guess we have to disagree here - though it might be more difficult.
>>
>>
>> > >On Mon, 11 Jun 2012 15:34:49 -0700, Paton Lewis wrote:
>> > >> This patch introduces the new epoll_ctl command 
>> EPOLL_CTL_DISABLE, which
>> > >> disables the associated epoll item and returns -EBUSY if the
>> > >epoll item is not
>> > >> currently in the epoll ready queue. This allows multiple 
>> threads to use a
>> > >> mutex to determine when it is safe to delete an epoll item and
>> > >its associated
>> > >> resources. This allows epoll items to be deleted and closed 
>> efficiently and
>> > >> without error.
>>
>>Maybe I am missing something here (as I am not really familiar with
>>the kernel internals), but I don't really understand the logic behind
>>your patch. Isn't the "expected" case that the item is not on the
>>ready list and no I/O is being processed for that item?
>
>Consider the case where we want to have a set of threads waiting for 
>'write' events on a set of pipes or sockets. We have no control over 
>when code on the other side of a pipe or socket might write into it, 
>and so have no control over when one of the threads calling 
>epoll_wait will receive events relative to the timing of the thread 
>that is attempting to cancel the pending read operation.
>
>I believe there is no "expected" case, because the probability for 
>an item to be on the ready list is a complex function of the number 
>of file descriptors being monitored, the frequency at which those 
>descriptors receive events, the number of threads calling 
>epoll_wait, and the complexity of the code responding to events.
>
>Therefore for some clients of epoll it will be the case that items 
>will often be on the ready list, and for others it will not.
>
>>So I think instead of checking for the item being on the ready list,
>>checking for the event mask would make more sense for me, e.g.
>>
>>   if (!(epi->event.events & ~EP_PRIVATE_BITS))
>
>I think that would be fine. However, the inlined function 
>ep_is_linked boils down to a call to the inline function list_empty, 
>which is just a single comparison. I haven't compared the 
>disassembly, but I would expect the two methods to be roughly 
>equivalent in terms of performance. Given that the operation of 
>deleting an epoll item is likely to be an exceptional circumstance 
>and therefore not performance-critical, would it be better to test 
>against ep_is_linked for clarity's sake in the code?
>
>However, I believe these discussions are rendered moot by your 
>suggestion below:
>
>
>>But, taking one step back - wouldn't an alternative approach be to add
>>some mechanism to allow a thread to post a user-event for an fd? So in
>>delete_epoll_item you would post a user event (e.g. EPOLLUSER) for the
>>fd which you can then handle in your epoll_wait processing thread -
>>with no additional synchronisation necessary.
>
>I think this is an excellent suggestion, and in fact your proposal 
>is more similar to what Windows provides when solving this problem. 
>I'll test this idea out with our code and get back to you. Is there 
>an existing kernel technique that you would recommend for posting a 
>user event for an fd, or should I explore using epoll_ctl with EPOLL_CTL_MOD?
>
>I apologize in advance for any delay in responding to you; our 
>office is closed next week.
>
>Thank you,
>Pat
>
>>However, this would still require EPOLLONESHOT to be useful for memory
>>management.
>>
>>
>>Christof
>>
>>--
>>
>>http://cmeerw.org                             sip:cmeerw at cmeerw.org
>>mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ