lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5091C5C2.8060009@linux.vnet.ibm.com>
Date:	Thu, 01 Nov 2012 08:43:46 +0800
From:	Michael Wang <wangyun@...ux.vnet.ibm.com>
To:	"Paton J. Lewis" <palewis@...be.com>
CC:	Alexander Viro <viro@...iv.linux.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jason Baron <jbaron@...hat.com>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Paul Holland <pholland@...be.com>,
	Davide Libenzi <davidel@...ilserver.org>,
	Michael Kerrisk <mtk.manpages@...il.com>,
	"libc-alpha@...rceware.org" <libc-alpha@...rceware.org>,
	"linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
	"paulmck@...ux.vnet.ibm.com" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH v3] epoll: Support for disabling items, and a self-test
 app.

On 11/01/2012 02:57 AM, Paton J. Lewis wrote:
> On 10/30/12 11:32 PM, Michael Wang wrote:
>> On 10/26/2012 08:08 AM, Paton J. Lewis wrote:
>>> From: "Paton J. Lewis" <palewis@...be.com>
>>>
>>> It is not currently possible to reliably delete epoll items when
>>> using the
>>> same epoll set from multiple threads. After calling epoll_ctl with
>>> EPOLL_CTL_DEL, another thread might still be executing code related
>>> to an
>>> event for that epoll item (in response to epoll_wait). Therefore the
>>> deleting
>>> thread does not know when it is safe to delete resources pertaining
>>> to the
>>> associated epoll item because another thread might be using those
>>> resources.
>>>
>>> The deleting thread could wait an arbitrary amount of time after calling
>>> epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
>>> inefficient and could result in the destruction of resources before
>>> another
>>> thread is done handling an event returned by epoll_wait.
>>>
>>> This patch enhances epoll_ctl to support EPOLL_CTL_DISABLE, which
>>> disables an
>>> epoll item. If epoll_ctl returns -EBUSY in this case, then another
>>> thread may
>>> handling a return from epoll_wait for this item. Otherwise if epoll_ctl
>>> returns 0, then it is safe to delete the epoll item. This allows
>>> multiple
>>> threads to use a mutex to determine when it is safe to delete an
>>> epoll item
>>> and its associated resources, which allows epoll items to be deleted
>>> both
>>> efficiently and without error in a multi-threaded environment. Note that
>>> EPOLL_CTL_DISABLE is only useful in conjunction with EPOLLONESHOT,
>>> and using
>>> EPOLL_CTL_DISABLE on an epoll item without EPOLLONESHOT returns -EINVAL.
>>>
>>> This patch also adds a new test_epoll self-test program to both
>>> demonstrate
>>> the need for this feature and test it.
>>
>> Hi, Paton
>>
>> I'm just think about may be we could use this way.
>>
>> Seems like currently we are depending on the epoll_ctl() to indicate the
>> start point of safe section and epoll_wait() for the end point, like:
>>
>>          while () {
>>                  epoll_wait()                    --------------
>>
>>                  fd event arrived                safe section
>>
>>                  clear fd epi->event.events
>>                                                  --------------
>>                  if (fd need stop)
>>                          continue;
>>                                                  --------------
>>                  ...fd data process...
>>
>>                  epoll_ctl(MOD)                  danger section
>>
>>                  set fd epi->event.events        --------------
>>
>>                  continue;
>>          }
>>
>> So we got a safe section and do delete work in this section won't cause
>> trouble since we have a stop check directly after it.
>>
>> Actually what we want is to make sure no one will touch the fd any more
>> after we DISABLE it.
>>
>> Then what about we add a ref count and a stop flag in epi, maintain it
>> like:
>>
>>          epoll_wait()
>>
>>          check user events and
>>          dec the ref count of fd         ---------------------------
>>
>>          ...
>>
>>          fd event arrived                safe sec if ref count is 0
>>
>>          if epi stop flag set
>>                  do nothing
>>          else
>>                  inc epi ref count       ---------------------------
> 
> The pseudecode you provide below (for "DISABLE") seems to indicate that
> this "epi ref count" must be maintained by the kernel. Therefore any
> userspace modification of a ref count associated with an epoll item will
> require a new or changed kernel API.
> 
>>                  send event
>>
>> And what DISABLE do is:
>>
>>          set epi stop flag
>>
>>          if epi ref count is not 0
>>                  wait until ref count be 0
> 
> Perhaps I don't fully understand what you're proposing, but I don't
> think it's reasonable for a kernel API (epoll_ctl in this case) to block
> while waiting for a userspace action (decrementing the ref count) that
> might never occur.
> 
> Andrew Morton also proposed using ref counting in response to my initial
> patch submission; my reply to his proposal might also be applicable to
> your proposal. A link to that discussion thread:
> http://thread.gmane.org/gmane.linux.kernel/1311457/focus=1315096
> 
> Sorry if I am misunderstanding your proposal, but I don't see how it
> solves the original problem.

I just try to find out whether we could using DISABLE with out ONESHOT :)

My currently understanding is:

1. we actually want to determine the part between each epoll_wait() in a
while().

2. we can't count on epoll_wait() itself, since no info pass to kernel
to indicate whether it was invoked after another epoll_wait() in the
same while().

3. so we need epoll_ctl(MOD) to tell kernel: user finished process data
after epoll_wait(), and those data belong to which epi.

4. since 3 we need ONESHOT to be enabled.


Is this the reason we rely on ONESHOT to be enabled?


If it is, then if we could do some change to make epoll_wait() have the
ability:

1. indicate whether it is invoked after another epoll_wait() in same while()

2. indicate which epi has been fired by last epoll_wait()

3. no changes on the api for user

Then we could mark the safe section inside epoll_wait(), and we don't
need MOD or ONESHOT any more, is that correct?


Regards,
Michael Wang

> 
> Pat
> 
>> So after DISABLE return, we can safely delete any thing related to
>> that epi.
>>
>> One thing is that the user should not change the events info returned by
>> epoll_wait().
>>
>> It's just a propose, but if it works, there will be no limit on ONESHOT
>> any more ;-)
>>
>> Regards,
>> Michael Wang
>>
>>>
>>> Signed-off-by: Paton J. Lewis <palewis@...be.com>
>>> ---
>>>   fs/eventpoll.c                             |   40 ++-
>>>   include/linux/eventpoll.h                  |    1 +
>>>   tools/testing/selftests/Makefile           |    2 +-
>>>   tools/testing/selftests/epoll/Makefile     |   11 +
>>>   tools/testing/selftests/epoll/test_epoll.c |  364
>>> ++++++++++++++++++++++++++++
>>>   5 files changed, 414 insertions(+), 4 deletions(-)
>>>   create mode 100644 tools/testing/selftests/epoll/Makefile
>>>   create mode 100644 tools/testing/selftests/epoll/test_epoll.c
>>>
>>> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>>> index 739b098..c718afd 100644
>>> --- a/fs/eventpoll.c
>>> +++ b/fs/eventpoll.c
>>> @@ -339,7 +339,7 @@ static inline struct epitem
>>> *ep_item_from_epqueue(poll_table *p)
>>>   /* Tells if the epoll_ctl(2) operation needs an event copy from
>>> userspace */
>>>   static inline int ep_op_has_event(int op)
>>>   {
>>> -     return op != EPOLL_CTL_DEL;
>>> +     return op == EPOLL_CTL_ADD || op == EPOLL_CTL_MOD;
>>>   }
>>>
>>>   /* Initialize the poll safe wake up structure */
>>> @@ -664,6 +664,36 @@ static int ep_remove(struct eventpoll *ep,
>>> struct epitem *epi)
>>>        return 0;
>>>   }
>>>
>>> +/*
>>> + * Disables a "struct epitem" in the eventpoll set. Returns -EBUSY
>>> if the item
>>> + * had no event flags set, indicating that another thread may be
>>> currently
>>> + * handling that item's events (in the case that EPOLLONESHOT was being
>>> + * used). Otherwise a zero result indicates that the item has been
>>> disabled
>>> + * from receiving events. A disabled item may be re-enabled via
>>> + * EPOLL_CTL_MOD. Must be called with "mtx" held.
>>> + */
>>> +static int ep_disable(struct eventpoll *ep, struct epitem *epi)
>>> +{
>>> +     int result = 0;
>>> +     unsigned long flags;
>>> +
>>> +     spin_lock_irqsave(&ep->lock, flags);
>>> +     if (epi->event.events & EPOLLONESHOT) {
>>> +             if (epi->event.events & ~EP_PRIVATE_BITS) {
>>> +                     if (ep_is_linked(&epi->rdllink))
>>> +                             list_del_init(&epi->rdllink);
>>> +                     /* Ensure ep_poll_callback will not add epi
>>> back onto
>>> +                        ready list: */
>>> +                     epi->event.events &= EP_PRIVATE_BITS;
>>> +             } else
>>> +                     result = -EBUSY;
>>> +     } else
>>> +             result = -EINVAL;
>>> +     spin_unlock_irqrestore(&ep->lock, flags);
>>> +
>>> +     return result;
>>> +}
>>> +
>>>   static void ep_free(struct eventpoll *ep)
>>>   {
>>>        struct rb_node *rbp;
>>> @@ -996,8 +1026,6 @@ static void ep_rbtree_insert(struct eventpoll
>>> *ep, struct epitem *epi)
>>>        rb_insert_color(&epi->rbn, &ep->rbr);
>>>   }
>>>
>>> -
>>> -
>>>   #define PATH_ARR_SIZE 5
>>>   /*
>>>    * These are the number paths of length 1 to 5, that we are
>>> allowing to emanate
>>> @@ -1701,6 +1729,12 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op,
>>> int, fd,
>>>                } else
>>>                        error = -ENOENT;
>>>                break;
>>> +     case EPOLL_CTL_DISABLE:
>>> +             if (epi)
>>> +                     error = ep_disable(ep, epi);
>>> +             else
>>> +                     error = -ENOENT;
>>> +             break;
>>>        }
>>>        mutex_unlock(&ep->mtx);
>>>
>>> diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
>>> index 657ab55..e91f7e3 100644
>>> --- a/include/linux/eventpoll.h
>>> +++ b/include/linux/eventpoll.h
>>> @@ -25,6 +25,7 @@
>>>   #define EPOLL_CTL_ADD 1
>>>   #define EPOLL_CTL_DEL 2
>>>   #define EPOLL_CTL_MOD 3
>>> +#define EPOLL_CTL_DISABLE 4
>>>
>>>   /* Set the One Shot behaviour for the target file descriptor */
>>>   #define EPOLLONESHOT (1 << 30)
>>> diff --git a/tools/testing/selftests/Makefile
>>> b/tools/testing/selftests/Makefile
>>> index 28bc57e..4cf477f 100644
>>> --- a/tools/testing/selftests/Makefile
>>> +++ b/tools/testing/selftests/Makefile
>>> @@ -1,4 +1,4 @@
>>> -TARGETS = breakpoints vm
>>> +TARGETS = breakpoints epoll vm
>>>
>>>   all:
>>>        for TARGET in $(TARGETS); do \
>>> diff --git a/tools/testing/selftests/epoll/Makefile
>>> b/tools/testing/selftests/epoll/Makefile
>>> new file mode 100644
>>> index 0000000..19806ed
>>> --- /dev/null
>>> +++ b/tools/testing/selftests/epoll/Makefile
>>> @@ -0,0 +1,11 @@
>>> +# Makefile for epoll selftests
>>> +
>>> +all: test_epoll
>>> +%: %.c
>>> +     gcc -pthread -g -o $@ $^
>>> +
>>> +run_tests: all
>>> +     ./test_epoll
>>> +
>>> +clean:
>>> +     $(RM) test_epoll
>>> diff --git a/tools/testing/selftests/epoll/test_epoll.c
>>> b/tools/testing/selftests/epoll/test_epoll.c
>>> new file mode 100644
>>> index 0000000..54284eb
>>> --- /dev/null
>>> +++ b/tools/testing/selftests/epoll/test_epoll.c
>>> @@ -0,0 +1,364 @@
>>> +/*
>>> + *  tools/testing/selftests/epoll/test_epoll.c
>>> + *
>>> + *  Copyright 2012 Adobe Systems Incorporated
>>> + *
>>> + *  This program is free software; you can redistribute it and/or
>>> modify
>>> + *  it under the terms of the GNU General Public License as
>>> published by
>>> + *  the Free Software Foundation; either version 2 of the License, or
>>> + *  (at your option) any later version.
>>> + *
>>> + *  Paton J. Lewis <palewis@...be.com>
>>> + *
>>> + */
>>> +
>>> +#include <errno.h>
>>> +#include <fcntl.h>
>>> +#include <pthread.h>
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +#include <unistd.h>
>>> +#include <sys/epoll.h>
>>> +#include <sys/socket.h>
>>> +
>>> +/*
>>> + * A pointer to an epoll_item_private structure will be stored in
>>> the epoll
>>> + * item's event structure so that we can get access to the
>>> epoll_item_private
>>> + * data after calling epoll_wait:
>>> + */
>>> +struct epoll_item_private {
>>> +     int index;  /* Position of this struct within the epoll_items
>>> array. */
>>> +     int fd;
>>> +     uint32_t events;
>>> +     pthread_mutex_t mutex;  /* Guards the following variables... */
>>> +     int stop;
>>> +     int status;  /* Stores any error encountered while handling
>>> item. */
>>> +     /* The following variable allows us to test whether we have
>>> encountered
>>> +        a problem while attempting to cancel and delete the associated
>>> +        event. When the test program exits, 'deleted' should be exactly
>>> +        one. If it is greater than one, then the failed test
>>> reflects a real
>>> +        world situation where we would have tried to access the
>>> epoll item's
>>> +        private data after deleting it: */
>>> +     int deleted;
>>> +};
>>> +
>>> +struct epoll_item_private *epoll_items;
>>> +
>>> +/*
>>> + * Delete the specified item from the epoll set. In a real-world
>>> secneario this
>>> + * is where we would free the associated data structure, but in this
>>> testing
>>> + * environment we retain the structure so that we can test for
>>> double-deletion:
>>> + */
>>> +void delete_item(int index)
>>> +{
>>> +     __sync_fetch_and_add(&epoll_items[index].deleted, 1);
>>> +}
>>> +
>>> +/*
>>> + * A pointer to a read_thread_data structure will be passed as the
>>> argument to
>>> + * each read thread:
>>> + */
>>> +struct read_thread_data {
>>> +     int stop;
>>> +     int status;  /* Indicates any error encountered by the read
>>> thread. */
>>> +     int epoll_set;
>>> +};
>>> +
>>> +/*
>>> + * The function executed by the read threads:
>>> + */
>>> +void *read_thread_function(void *function_data)
>>> +{
>>> +     struct read_thread_data *thread_data =
>>> +             (struct read_thread_data *)function_data;
>>> +     struct epoll_event event_data;
>>> +     struct epoll_item_private *item_data;
>>> +     char socket_data;
>>> +
>>> +     /* Handle events until we encounter an error or this thread's
>>> 'stop'
>>> +        condition is set: */
>>> +     while (1) {
>>> +             int result = epoll_wait(thread_data->epoll_set,
>>> +                                     &event_data,
>>> +                                     1,      /* Number of desired
>>> events */
>>> +                                     1000);  /* Timeout in ms */
>>> +             if (result < 0) {
>>> +                     /* Breakpoints signal all threads. Ignore that
>>> while
>>> +                        debugging: */
>>> +                     if (errno == EINTR)
>>> +                             continue;
>>> +                     thread_data->status = errno;
>>> +                     return 0;
>>> +             } else if (thread_data->stop)
>>> +                     return 0;
>>> +             else if (result == 0)  /* Timeout */
>>> +                     continue;
>>> +
>>> +             /* We need the mutex here because checking for the stop
>>> +                condition and re-enabling the epoll item need to be
>>> done
>>> +                together as one atomic operation when
>>> EPOLL_CTL_DISABLE is
>>> +                available: */
>>> +             item_data = (struct epoll_item_private
>>> *)event_data.data.ptr;
>>> +             pthread_mutex_lock(&item_data->mutex);
>>> +
>>> +             /* Remove the item from the epoll set if we want to stop
>>> +                handling that event: */
>>> +             if (item_data->stop)
>>> +                     delete_item(item_data->index);
>>> +             else {
>>> +                     /* Clear the data that was written to the other
>>> end of
>>> +                        our non-blocking socket: */
>>> +                     do {
>>> +                             if (read(item_data->fd, &socket_data,
>>> 1) < 1) {
>>> +                                     if ((errno == EAGAIN) ||
>>> +                                         (errno == EWOULDBLOCK))
>>> +                                             break;
>>> +                                     else
>>> +                                             goto error_unlock;
>>> +                             }
>>> +                     } while (item_data->events & EPOLLET);
>>> +
>>> +                     /* The item was one-shot, so re-enable it: */
>>> +                     event_data.events = item_data->events;
>>> +                     if (epoll_ctl(thread_data->epoll_set,
>>> +                                               EPOLL_CTL_MOD,
>>> +                                               item_data->fd,
>>> +                                               &event_data) < 0)
>>> +                             goto error_unlock;
>>> +             }
>>> +
>>> +             pthread_mutex_unlock(&item_data->mutex);
>>> +     }
>>> +
>>> +error_unlock:
>>> +     thread_data->status = item_data->status = errno;
>>> +     pthread_mutex_unlock(&item_data->mutex);
>>> +     return 0;
>>> +}
>>> +
>>> +/*
>>> + * A pointer to a write_thread_data structure will be passed as the
>>> argument to
>>> + * the write thread:
>>> + */
>>> +struct write_thread_data {
>>> +     int stop;
>>> +     int status;  /* Indicates any error encountered by the write
>>> thread. */
>>> +     int n_fds;
>>> +     int *fds;
>>> +};
>>> +
>>> +/*
>>> + * The function executed by the write thread. It writes a single
>>> byte to each
>>> + * socket in turn until the stop condition for this thread is set.
>>> If writing to
>>> + * a socket would block (i.e. errno was EAGAIN), we leave that
>>> socket alone for
>>> + * the moment and just move on to the next socket in the list. We
>>> don't care
>>> + * about the order in which we deliver events to the epoll set. In
>>> fact we don't
>>> + * care about the data we're writing to the pipes at all; we just
>>> want to
>>> + * trigger epoll events:
>>> + */
>>> +void *write_thread_function(void *function_data)
>>> +{
>>> +     const char data = 'X';
>>> +     int index;
>>> +     struct write_thread_data *thread_data =
>>> +             (struct write_thread_data *)function_data;
>>> +     while (!thread_data->stop)
>>> +             for (index = 0;
>>> +                  !thread_data->stop && (index < thread_data->n_fds);
>>> +                  ++index)
>>> +                     if ((write(thread_data->fds[index], &data, 1) <
>>> 1) &&
>>> +                             (errno != EAGAIN) &&
>>> +                             (errno != EWOULDBLOCK)) {
>>> +                             thread_data->status = errno;
>>> +                             return;
>>> +                     }
>>> +}
>>> +
>>> +/*
>>> + * Arguments are currently ignored:
>>> + */
>>> +int main(int argc, char **argv)
>>> +{
>>> +     const int n_read_threads = 100;
>>> +     const int n_epoll_items = 500;
>>> +     int index;
>>> +     int epoll_set = epoll_create1(0);
>>> +     struct write_thread_data write_thread_data = {
>>> +             0, 0, n_epoll_items, malloc(n_epoll_items * sizeof(int))
>>> +     };
>>> +     struct read_thread_data *read_thread_data =
>>> +             malloc(n_read_threads * sizeof(struct read_thread_data));
>>> +     pthread_t *read_threads = malloc(n_read_threads *
>>> sizeof(pthread_t));
>>> +     pthread_t write_thread;
>>> +     int socket_pair[2];
>>> +     struct epoll_event event_data;
>>> +
>>> +     printf("-----------------\n");
>>> +     printf("Runing test_epoll\n");
>>> +     printf("-----------------\n");
>>> +
>>> +     epoll_items = malloc(n_epoll_items * sizeof(struct
>>> epoll_item_private));
>>> +
>>> +     if (epoll_set < 0 || epoll_items == 0 || write_thread_data.fds
>>> == 0 ||
>>> +             read_thread_data == 0 || read_threads == 0)
>>> +             goto error;
>>> +
>>> +     if (sysconf(_SC_NPROCESSORS_ONLN) < 2) {
>>> +             printf("Error: please run this test on a multi-core
>>> system.\n");
>>> +             goto error;
>>> +     }
>>> +
>>> +     /* Create the socket pairs and epoll items: */
>>> +     for (index = 0; index < n_epoll_items; ++index) {
>>> +             if (socketpair(AF_UNIX,
>>> +                            SOCK_STREAM | SOCK_NONBLOCK,
>>> +                            0,
>>> +                            socket_pair) < 0)
>>> +                     goto error;
>>> +             write_thread_data.fds[index] = socket_pair[0];
>>> +             epoll_items[index].index = index;
>>> +             epoll_items[index].fd = socket_pair[1];
>>> +             if (pthread_mutex_init(&epoll_items[index].mutex, NULL)
>>> != 0)
>>> +                     goto error;
>>> +             /* We always use EPOLLONESHOT because this test is
>>> currently
>>> +                structured to demonstrate the need for
>>> EPOLL_CTL_DISABLE,
>>> +                which only produces useful information in the
>>> EPOLLONESHOT
>>> +                case (without EPOLLONESHOT, calling epoll_ctl with
>>> +                EPOLL_CTL_DISABLE will never return EBUSY). If
>>> support for
>>> +                testing events without EPOLLONESHOT is desired, it
>>> should
>>> +                probably be implemented in a separate unit test. */
>>> +             epoll_items[index].events = EPOLLIN | EPOLLONESHOT;
>>> +             if (index < n_epoll_items / 2)
>>> +                     epoll_items[index].events |= EPOLLET;
>>> +             epoll_items[index].stop = 0;
>>> +             epoll_items[index].status = 0;
>>> +             epoll_items[index].deleted = 0;
>>> +             event_data.events = epoll_items[index].events;
>>> +             event_data.data.ptr = &epoll_items[index];
>>> +             if (epoll_ctl(epoll_set,
>>> +                           EPOLL_CTL_ADD,
>>> +                           epoll_items[index].fd,
>>> +                           &event_data) < 0)
>>> +                     goto error;
>>> +     }
>>> +
>>> +#ifdef EPOLL_CTL_DISABLE
>>> +     /* Test to make sure that using EPOLL_CTL_DISABLE without
>>> EPOLLONESHOT
>>> +        returns a clear error: */
>>> +     if (socketpair(AF_UNIX,
>>> +                    SOCK_STREAM | SOCK_NONBLOCK,
>>> +                    0,
>>> +                    socket_pair) < 0)
>>> +             goto error;
>>> +     event_data.events = EPOLLIN;
>>> +     event_data.data.ptr = NULL;
>>> +     if (epoll_ctl(epoll_set, EPOLL_CTL_ADD,
>>> +                   socket_pair[1], &event_data) < 0)
>>> +             goto error;
>>> +     if ((epoll_ctl(epoll_set, EPOLL_CTL_DISABLE,
>>> +                    socket_pair[1], NULL) == 0) || (errno != EINVAL))
>>> +             goto error;
>>> +     if (epoll_ctl(epoll_set, EPOLL_CTL_DEL, socket_pair[1], NULL)
>>> != 0)
>>> +             goto error;
>>> +#endif
>>> +
>>> +     /* Create and start the read threads: */
>>> +     for (index = 0; index < n_read_threads; ++index) {
>>> +             read_thread_data[index].stop = 0;
>>> +             read_thread_data[index].status = 0;
>>> +             read_thread_data[index].epoll_set = epoll_set;
>>> +             if (pthread_create(&read_threads[index],
>>> +                                NULL,
>>> +                                read_thread_function,
>>> +                                &read_thread_data[index]) != 0)
>>> +                     goto error;
>>> +     }
>>> +
>>> +     if (pthread_create(&write_thread,
>>> +                        NULL,
>>> +                        write_thread_function,
>>> +                        &write_thread_data) != 0)
>>> +             goto error;
>>> +
>>> +     /* Cancel all event pollers: */
>>> +#ifdef EPOLL_CTL_DISABLE
>>> +     for (index = 0; index < n_epoll_items; ++index) {
>>> +             pthread_mutex_lock(&epoll_items[index].mutex);
>>> +             ++epoll_items[index].stop;
>>> +             if (epoll_ctl(epoll_set,
>>> +                           EPOLL_CTL_DISABLE,
>>> +                           epoll_items[index].fd,
>>> +                           NULL) == 0)
>>> +                     delete_item(index);
>>> +             else if (errno != EBUSY) {
>>> +                     pthread_mutex_unlock(&epoll_items[index].mutex);
>>> +                     goto error;
>>> +             }
>>> +             /* EBUSY means events were being handled; allow the
>>> other thread
>>> +                to delete the item. */
>>> +             pthread_mutex_unlock(&epoll_items[index].mutex);
>>> +     }
>>> +#else
>>> +     for (index = 0; index < n_epoll_items; ++index) {
>>> +             pthread_mutex_lock(&epoll_items[index].mutex);
>>> +             ++epoll_items[index].stop;
>>> +             pthread_mutex_unlock(&epoll_items[index].mutex);
>>> +             /* Wait in case a thread running read_thread_function is
>>> +                currently executing code between epoll_wait and
>>> +                pthread_mutex_lock with this item. Note that a
>>> longer delay
>>> +                would make double-deletion less likely (at the
>>> expense of
>>> +                performance), but there is no guarantee that any
>>> delay would
>>> +                ever be sufficient. Note also that we delete all event
>>> +                pollers at once for testing purposes, but in a
>>> real-world
>>> +                environment we are likely to want to be able to
>>> cancel event
>>> +                pollers at arbitrary times. Therefore we can't
>>> improve this
>>> +                situation by just splitting this loop into two loops
>>> +                (i.e. signal 'stop' for all items, sleep, and then
>>> delete all
>>> +                items). We also can't fix the problem via EPOLL_CTL_DEL
>>> +                because that command can't prevent the case where
>>> some other
>>> +                thread is executing read_thread_function within the
>>> region
>>> +                mentioned above: */
>>> +             usleep(1);
>>> +             pthread_mutex_lock(&epoll_items[index].mutex);
>>> +             if (!epoll_items[index].deleted)
>>> +                     delete_item(index);
>>> +             pthread_mutex_unlock(&epoll_items[index].mutex);
>>> +     }
>>> +#endif
>>> +
>>> +     /* Shut down the read threads: */
>>> +     for (index = 0; index < n_read_threads; ++index)
>>> +             __sync_fetch_and_add(&read_thread_data[index].stop, 1);
>>> +     for (index = 0; index < n_read_threads; ++index) {
>>> +             if (pthread_join(read_threads[index], NULL) != 0)
>>> +                     goto error;
>>> +             if (read_thread_data[index].status)
>>> +                     goto error;
>>> +     }
>>> +
>>> +     /* Shut down the write thread: */
>>> +     __sync_fetch_and_add(&write_thread_data.stop, 1);
>>> +     if ((pthread_join(write_thread, NULL) != 0) ||
>>> write_thread_data.status)
>>> +             goto error;
>>> +
>>> +     /* Check for final error conditions: */
>>> +     for (index = 0; index < n_epoll_items; ++index) {
>>> +             if (epoll_items[index].status != 0)
>>> +                     goto error;
>>> +             if (pthread_mutex_destroy(&epoll_items[index].mutex) < 0)
>>> +                     goto error;
>>> +     }
>>> +     for (index = 0; index < n_epoll_items; ++index)
>>> +             if (epoll_items[index].deleted != 1) {
>>> +                     printf("Error: item data deleted %1d times.\n",
>>> +                                epoll_items[index].deleted);
>>> +                     goto error;
>>> +             }
>>> +
>>> +     printf("[PASS]\n");
>>> +     return 0;
>>> +
>>> + error:
>>> +     printf("[FAIL]\n");
>>> +     return errno;
>>> +}
>>>
>>
>> .
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ