lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56E711C3.8020008@akamai.com>
Date:	Mon, 14 Mar 2016 15:32:19 -0400
From:	Jason Baron <jbaron@...mai.com>
To:	"Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Cc:	mingo@...nel.org, peterz@...radead.org, viro@....linux.org.uk,
	normalperson@...t.net, m@...odev.com, corbet@....net,
	luto@...capital.net, torvalds@...ux-foundation.org, hagen@...u.net,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	linux-api@...r.kernel.org
Subject: Re: [PATCH] epoll: add exclusive wakeups flag



On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
> [Restoring CC, which I see I accidentally dropped, one iteration back.]
> 
> Hi Jason,
> 
> Thanks for the review. I've tweaked one piece to respond to your
> feedback. But I also have another new question below.
> 
> On 03/15/2016 03:55 AM, Jason Baron wrote:
>> On 03/11/2016 06:25 PM, Michael Kerrisk (man-pages) wrote:
>>> On 03/11/2016 09:51 PM, Jason Baron wrote:
>>>> On 03/11/2016 03:30 PM, Michael Kerrisk (man-pages) wrote:
> 
> [...]
> 
>> Hi Michael,
>>
>> Looks good. One comment below.
>>
>> Thanks,
>>
>>>        EPOLLEXCLUSIVE (since Linux 4.5)
>>>               Sets  an  exclusive  wakeup  mode  for  the  epoll  file
>>>               descriptor  that  is  being  attached to the target file
>>>               descriptor, fd.  When a wakeup event occurs and multiple
>>>               epoll  file  descriptors are attached to the same target
>>>               file using EPOLLEXCLUSIVE, one or more of the epoll file
>>>               descriptors  will  receive  an event with epoll_wait(2).
>>>               The default in this scenario (when EPOLLEXCLUSIVE is not
>>>               set)  is  for  all  epoll file descriptors to receive an
>>>               event.  EPOLLEXCLUSIVE is thus useful for avoiding thun‐
>>>               dering herd problems in certain scenarios.
>>>
>>>               If  the  same  file  descriptor  is  in  multiple  epoll
>>>               instances, some with the EPOLLEXCLUSIVE flag, and others
>>>               without,   then   events  will  provided  to  all  epoll
>>>               instances that did not specify  EPOLLEXCLUSIVE,  and  at
>>>               least  one  of  the  epoll  instances  that  did specify
>>>               EPOLLEXCLUSIVE.
>>>
>>>               The following values may  be  specified  in  conjunction
>>>               with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
>>>               EPOLLET.  EPOLLHUP and EPOLLERR can also  be  specified,
>>>               but  are  ignored (as usual).  Attempts to specify other
>>
>> I'm not sure 'ignored' is the right wording here. 'EPOLLHUP' and
>> 'EPOLERR' are always included in the set of events when something is
>> added as EPOLLEXCLUSIVE. This is consistent with the non-EPOLLEXCLUSIVE
>> add case. 
> 
> Yes.
> 
>> So 'EPOLLHUP' and 'EPOLERR' may be specified but will be
>> included in the set of events on an add, whether they are specified or not.
> 
> Yes. I understand your discomfort with the work "ignored", but the 
> problem was that, because it made special mention of EPOLLHUP and EPOLLERR,
> your proposed text made it sound as though EPOLLEXCLUSIVE somehow was
> special with respect to these two flags. I wanted to clarify that it is not.
> How about this:
> 
>               The following values may  be  specified  in  conjunction
>               with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
>               EPOLLET.  EPOLLHUP and EPOLLERR can also  be  specified,
>               but  this  is  not  required: as usual, these events are
>               always reported if they  occur,  regardless  of  whether
>               they are specified in events.
> ?

Yes, nothing special here with respect to EPOLLHUP and EPOLLERR. So this
looks fine to me.

> 
>>>               values in events yield an error.  EPOLLEXCLUSIVE may  be
>>>               used  only  in  an  EPOLL_CTL_ADD operation; attempts to
>>>               employ  it  with  EPOLL_CTL_MOD  yield  an  error.    If
>>>               EPOLLEXCLUSIVE has set using epoll_ctl(2), then a subse‐
>>>               quent EPOLL_CTL_MOD on the same epfd, fd pair yields  an
> b>>               error.  An epoll_ctl(2) that specifies EPOLLEXCLUSIVE in
>>>               events and specifies the target file descriptor fd as an
>>>               epoll  instance will likewise fail.  The error in all of
>>>               these cases is EINVAL.
>>>
>>>    ERRORS
>>>        EINVAL An invalid event type was specified along with  EPOLLEX‐
>>>               CLUSIVE in events.
>>>
>>>        EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.
>>>
>>>        EINVAL op  was  EPOLL_CTL_MOD  and  the EPOLLEXCLUSIVE flag has
>>>               previously been applied to this epfd, fd pair.
>>>
>>>        EINVAL EPOLLEXCLUSIVE was specified in event and fd  is  refers
>>>               to an epoll instance.
> 
> Returning to the second sentence in this description:
> 
>               When a wakeup event occurs and multiple epoll file descrip‐
>               tors are attached to the same target file using EPOLLEXCLU‐
>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>               receive  an  event with epoll_wait(2).
> 
> There is a point that is unclear to me: what does "target file" refer to?
> Is it an open file description (aka open file table entry) or an inode?
> I suspect the former, but it was not clear in your original text.
>

So from epoll's perspective, the wakeups are associated with a 'wait
queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
file->poll()) results in adding to the same 'wait queue' then we will
get 'exclusive' wakeup behavior.

So in general, I think the answer here is that its associated with the
inode (I coudn't say with 100% certainty without really looking at all
file->poll() implementations). Certainly, with the 'FIFO' example below,
the two scenarios will have the same behavior with respect to
EPOLLEXCLUSIVE.

Also, the 'non-exclusive' mode would be subject to the same question of
which wait queue is the epfd is associated with...

Thanks,

-Jason

> To make this point even clearer, here are two scenarios I'm thinking of.
> In each case, we're talking of monitoring the read end of a FIFO.
> 
> ===
> 
> Scenario 1:
> 
> We have three processes each of which
> 1. Creates an epoll instance
> 2. Opens the read end of the FIFO
> 3. Adds the read end of the FIFO to the epoll instance, specifying
>    EPOLLEXCLUSIVE
> 
> When input becomes available on the FIFO, how many processes
> get a wakeup?
> 
> ===
> 
> Scenario 3
> 
> A parent process opens the read end of a FIFO and then calls
> fork() three times to create three children. Each child then:
> 
> 1. Creates an epoll instance
> 2. Adds the read end of the FIFO to the epoll instance, specifying
> EPOLLEXCLUSIVE
> 
> When input becomes available on the FIFO, how many processes
> get a wakeup?
> 
> ===
> 
> Cheers,
> 
> Michael
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ