[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5092C3B9.2070909@adobe.com>
Date: Thu, 1 Nov 2012 11:47:21 -0700
From: "Paton J. Lewis" <palewis@...be.com>
To: Michael Wang <wangyun@...ux.vnet.ibm.com>
CC: Alexander Viro <viro@...iv.linux.org.uk>,
Andrew Morton <akpm@...ux-foundation.org>,
Jason Baron <jbaron@...hat.com>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Paul Holland <pholland@...be.com>,
Davide Libenzi <davidel@...ilserver.org>,
Michael Kerrisk <mtk.manpages@...il.com>,
"libc-alpha@...rceware.org" <libc-alpha@...rceware.org>,
"linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
"paulmck@...ux.vnet.ibm.com" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH v3] epoll: Support for disabling items, and a self-test
app.
On 10/31/12 5:43 PM, Michael Wang wrote:
> On 11/01/2012 02:57 AM, Paton J. Lewis wrote:
>> On 10/30/12 11:32 PM, Michael Wang wrote:
>>> On 10/26/2012 08:08 AM, Paton J. Lewis wrote:
>>>> From: "Paton J. Lewis" <palewis@...be.com>
>>>>
>>>> It is not currently possible to reliably delete epoll items when
>>>> using the
>>>> same epoll set from multiple threads. After calling epoll_ctl with
>>>> EPOLL_CTL_DEL, another thread might still be executing code related
>>>> to an
>>>> event for that epoll item (in response to epoll_wait). Therefore the
>>>> deleting
>>>> thread does not know when it is safe to delete resources pertaining
>>>> to the
>>>> associated epoll item because another thread might be using those
>>>> resources.
>>>>
>>>> The deleting thread could wait an arbitrary amount of time after calling
>>>> epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
>>>> inefficient and could result in the destruction of resources before
>>>> another
>>>> thread is done handling an event returned by epoll_wait.
>>>>
>>>> This patch enhances epoll_ctl to support EPOLL_CTL_DISABLE, which
>>>> disables an
>>>> epoll item. If epoll_ctl returns -EBUSY in this case, then another
>>>> thread may
>>>> handling a return from epoll_wait for this item. Otherwise if epoll_ctl
>>>> returns 0, then it is safe to delete the epoll item. This allows
>>>> multiple
>>>> threads to use a mutex to determine when it is safe to delete an
>>>> epoll item
>>>> and its associated resources, which allows epoll items to be deleted
>>>> both
>>>> efficiently and without error in a multi-threaded environment. Note that
>>>> EPOLL_CTL_DISABLE is only useful in conjunction with EPOLLONESHOT,
>>>> and using
>>>> EPOLL_CTL_DISABLE on an epoll item without EPOLLONESHOT returns -EINVAL.
>>>>
>>>> This patch also adds a new test_epoll self-test program to both
>>>> demonstrate
>>>> the need for this feature and test it.
>>>
>>> Hi, Paton
>>>
>>> I'm just think about may be we could use this way.
>>>
>>> Seems like currently we are depending on the epoll_ctl() to indicate the
>>> start point of safe section and epoll_wait() for the end point, like:
>>>
>>> while () {
>>> epoll_wait() --------------
>>>
>>> fd event arrived safe section
>>>
>>> clear fd epi->event.events
>>> --------------
>>> if (fd need stop)
>>> continue;
>>> --------------
>>> ...fd data process...
>>>
>>> epoll_ctl(MOD) danger section
>>>
>>> set fd epi->event.events --------------
>>>
>>> continue;
>>> }
>>>
>>> So we got a safe section and do delete work in this section won't cause
>>> trouble since we have a stop check directly after it.
>>>
>>> Actually what we want is to make sure no one will touch the fd any more
>>> after we DISABLE it.
>>>
>>> Then what about we add a ref count and a stop flag in epi, maintain it
>>> like:
>>>
>>> epoll_wait()
>>>
>>> check user events and
>>> dec the ref count of fd ---------------------------
>>>
>>> ...
>>>
>>> fd event arrived safe sec if ref count is 0
>>>
>>> if epi stop flag set
>>> do nothing
>>> else
>>> inc epi ref count ---------------------------
>>
>> The pseudecode you provide below (for "DISABLE") seems to indicate that
>> this "epi ref count" must be maintained by the kernel. Therefore any
>> userspace modification of a ref count associated with an epoll item will
>> require a new or changed kernel API.
>>
>>> send event
>>>
>>> And what DISABLE do is:
>>>
>>> set epi stop flag
>>>
>>> if epi ref count is not 0
>>> wait until ref count be 0
>>
>> Perhaps I don't fully understand what you're proposing, but I don't
>> think it's reasonable for a kernel API (epoll_ctl in this case) to block
>> while waiting for a userspace action (decrementing the ref count) that
>> might never occur.
>>
>> Andrew Morton also proposed using ref counting in response to my initial
>> patch submission; my reply to his proposal might also be applicable to
>> your proposal. A link to that discussion thread:
>> http://thread.gmane.org/gmane.linux.kernel/1311457/focus=1315096
>>
>> Sorry if I am misunderstanding your proposal, but I don't see how it
>> solves the original problem.
>
> I just try to find out whether we could using DISABLE with out ONESHOT :)
>
> My currently understanding is:
>
> 1. we actually want to determine the part between each epoll_wait() in a
> while().
>
> 2. we can't count on epoll_wait() itself, since no info pass to kernel
> to indicate whether it was invoked after another epoll_wait() in the
> same while().
>
> 3. so we need epoll_ctl(MOD) to tell kernel: user finished process data
> after epoll_wait(), and those data belong to which epi.
>
> 4. since 3 we need ONESHOT to be enabled.
>
>
> Is this the reason we rely on ONESHOT to be enabled?
No, we need to use EPOLLONESHOT to ensure that only one worker thread at
a time can ever be handling private data associated with a given fd.
This constraint allows a deletion thread to coordinate with that worker
thread via a mutex to cleanly and quickly delete the associated private
data (provided that the deletion thread knows whether or not such a
worker thread exists in that state, which the question that
EPOLL_CTL_DISABLE answers).
Pat
> If it is, then if we could do some change to make epoll_wait() have the
> ability:
>
> 1. indicate whether it is invoked after another epoll_wait() in same while()
>
> 2. indicate which epi has been fired by last epoll_wait()
>
> 3. no changes on the api for user
>
> Then we could mark the safe section inside epoll_wait(), and we don't
> need MOD or ONESHOT any more, is that correct?
>
>
> Regards,
> Michael Wang
>
>>
>> Pat
>>
>>> So after DISABLE return, we can safely delete any thing related to
>>> that epi.
>>>
>>> One thing is that the user should not change the events info returned by
>>> epoll_wait().
>>>
>>> It's just a propose, but if it works, there will be no limit on ONESHOT
>>> any more ;-)
>>>
>>> Regards,
>>> Michael Wang
>>>
>>>>
>>>> Signed-off-by: Paton J. Lewis <palewis@...be.com>
>>>> ---
>>>> fs/eventpoll.c | 40 ++-
>>>> include/linux/eventpoll.h | 1 +
>>>> tools/testing/selftests/Makefile | 2 +-
>>>> tools/testing/selftests/epoll/Makefile | 11 +
>>>> tools/testing/selftests/epoll/test_epoll.c | 364
>>>> ++++++++++++++++++++++++++++
>>>> 5 files changed, 414 insertions(+), 4 deletions(-)
>>>> create mode 100644 tools/testing/selftests/epoll/Makefile
>>>> create mode 100644 tools/testing/selftests/epoll/test_epoll.c
>>>>
>>>> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>>>> index 739b098..c718afd 100644
>>>> --- a/fs/eventpoll.c
>>>> +++ b/fs/eventpoll.c
>>>> @@ -339,7 +339,7 @@ static inline struct epitem
>>>> *ep_item_from_epqueue(poll_table *p)
>>>> /* Tells if the epoll_ctl(2) operation needs an event copy from
>>>> userspace */
>>>> static inline int ep_op_has_event(int op)
>>>> {
>>>> - return op != EPOLL_CTL_DEL;
>>>> + return op == EPOLL_CTL_ADD || op == EPOLL_CTL_MOD;
>>>> }
>>>>
>>>> /* Initialize the poll safe wake up structure */
>>>> @@ -664,6 +664,36 @@ static int ep_remove(struct eventpoll *ep,
>>>> struct epitem *epi)
>>>> return 0;
>>>> }
>>>>
>>>> +/*
>>>> + * Disables a "struct epitem" in the eventpoll set. Returns -EBUSY
>>>> if the item
>>>> + * had no event flags set, indicating that another thread may be
>>>> currently
>>>> + * handling that item's events (in the case that EPOLLONESHOT was being
>>>> + * used). Otherwise a zero result indicates that the item has been
>>>> disabled
>>>> + * from receiving events. A disabled item may be re-enabled via
>>>> + * EPOLL_CTL_MOD. Must be called with "mtx" held.
>>>> + */
>>>> +static int ep_disable(struct eventpoll *ep, struct epitem *epi)
>>>> +{
>>>> + int result = 0;
>>>> + unsigned long flags;
>>>> +
>>>> + spin_lock_irqsave(&ep->lock, flags);
>>>> + if (epi->event.events & EPOLLONESHOT) {
>>>> + if (epi->event.events & ~EP_PRIVATE_BITS) {
>>>> + if (ep_is_linked(&epi->rdllink))
>>>> + list_del_init(&epi->rdllink);
>>>> + /* Ensure ep_poll_callback will not add epi
>>>> back onto
>>>> + ready list: */
>>>> + epi->event.events &= EP_PRIVATE_BITS;
>>>> + } else
>>>> + result = -EBUSY;
>>>> + } else
>>>> + result = -EINVAL;
>>>> + spin_unlock_irqrestore(&ep->lock, flags);
>>>> +
>>>> + return result;
>>>> +}
>>>> +
>>>> static void ep_free(struct eventpoll *ep)
>>>> {
>>>> struct rb_node *rbp;
>>>> @@ -996,8 +1026,6 @@ static void ep_rbtree_insert(struct eventpoll
>>>> *ep, struct epitem *epi)
>>>> rb_insert_color(&epi->rbn, &ep->rbr);
>>>> }
>>>>
>>>> -
>>>> -
>>>> #define PATH_ARR_SIZE 5
>>>> /*
>>>> * These are the number paths of length 1 to 5, that we are
>>>> allowing to emanate
>>>> @@ -1701,6 +1729,12 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op,
>>>> int, fd,
>>>> } else
>>>> error = -ENOENT;
>>>> break;
>>>> + case EPOLL_CTL_DISABLE:
>>>> + if (epi)
>>>> + error = ep_disable(ep, epi);
>>>> + else
>>>> + error = -ENOENT;
>>>> + break;
>>>> }
>>>> mutex_unlock(&ep->mtx);
>>>>
>>>> diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
>>>> index 657ab55..e91f7e3 100644
>>>> --- a/include/linux/eventpoll.h
>>>> +++ b/include/linux/eventpoll.h
>>>> @@ -25,6 +25,7 @@
>>>> #define EPOLL_CTL_ADD 1
>>>> #define EPOLL_CTL_DEL 2
>>>> #define EPOLL_CTL_MOD 3
>>>> +#define EPOLL_CTL_DISABLE 4
>>>>
>>>> /* Set the One Shot behaviour for the target file descriptor */
>>>> #define EPOLLONESHOT (1 << 30)
>>>> diff --git a/tools/testing/selftests/Makefile
>>>> b/tools/testing/selftests/Makefile
>>>> index 28bc57e..4cf477f 100644
>>>> --- a/tools/testing/selftests/Makefile
>>>> +++ b/tools/testing/selftests/Makefile
>>>> @@ -1,4 +1,4 @@
>>>> -TARGETS = breakpoints vm
>>>> +TARGETS = breakpoints epoll vm
>>>>
>>>> all:
>>>> for TARGET in $(TARGETS); do \
>>>> diff --git a/tools/testing/selftests/epoll/Makefile
>>>> b/tools/testing/selftests/epoll/Makefile
>>>> new file mode 100644
>>>> index 0000000..19806ed
>>>> --- /dev/null
>>>> +++ b/tools/testing/selftests/epoll/Makefile
>>>> @@ -0,0 +1,11 @@
>>>> +# Makefile for epoll selftests
>>>> +
>>>> +all: test_epoll
>>>> +%: %.c
>>>> + gcc -pthread -g -o $@ $^
>>>> +
>>>> +run_tests: all
>>>> + ./test_epoll
>>>> +
>>>> +clean:
>>>> + $(RM) test_epoll
>>>> diff --git a/tools/testing/selftests/epoll/test_epoll.c
>>>> b/tools/testing/selftests/epoll/test_epoll.c
>>>> new file mode 100644
>>>> index 0000000..54284eb
>>>> --- /dev/null
>>>> +++ b/tools/testing/selftests/epoll/test_epoll.c
>>>> @@ -0,0 +1,364 @@
>>>> +/*
>>>> + * tools/testing/selftests/epoll/test_epoll.c
>>>> + *
>>>> + * Copyright 2012 Adobe Systems Incorporated
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or
>>>> modify
>>>> + * it under the terms of the GNU General Public License as
>>>> published by
>>>> + * the Free Software Foundation; either version 2 of the License, or
>>>> + * (at your option) any later version.
>>>> + *
>>>> + * Paton J. Lewis <palewis@...be.com>
>>>> + *
>>>> + */
>>>> +
>>>> +#include <errno.h>
>>>> +#include <fcntl.h>
>>>> +#include <pthread.h>
>>>> +#include <stdio.h>
>>>> +#include <stdlib.h>
>>>> +#include <unistd.h>
>>>> +#include <sys/epoll.h>
>>>> +#include <sys/socket.h>
>>>> +
>>>> +/*
>>>> + * A pointer to an epoll_item_private structure will be stored in
>>>> the epoll
>>>> + * item's event structure so that we can get access to the
>>>> epoll_item_private
>>>> + * data after calling epoll_wait:
>>>> + */
>>>> +struct epoll_item_private {
>>>> + int index; /* Position of this struct within the epoll_items
>>>> array. */
>>>> + int fd;
>>>> + uint32_t events;
>>>> + pthread_mutex_t mutex; /* Guards the following variables... */
>>>> + int stop;
>>>> + int status; /* Stores any error encountered while handling
>>>> item. */
>>>> + /* The following variable allows us to test whether we have
>>>> encountered
>>>> + a problem while attempting to cancel and delete the associated
>>>> + event. When the test program exits, 'deleted' should be exactly
>>>> + one. If it is greater than one, then the failed test
>>>> reflects a real
>>>> + world situation where we would have tried to access the
>>>> epoll item's
>>>> + private data after deleting it: */
>>>> + int deleted;
>>>> +};
>>>> +
>>>> +struct epoll_item_private *epoll_items;
>>>> +
>>>> +/*
>>>> + * Delete the specified item from the epoll set. In a real-world
>>>> secneario this
>>>> + * is where we would free the associated data structure, but in this
>>>> testing
>>>> + * environment we retain the structure so that we can test for
>>>> double-deletion:
>>>> + */
>>>> +void delete_item(int index)
>>>> +{
>>>> + __sync_fetch_and_add(&epoll_items[index].deleted, 1);
>>>> +}
>>>> +
>>>> +/*
>>>> + * A pointer to a read_thread_data structure will be passed as the
>>>> argument to
>>>> + * each read thread:
>>>> + */
>>>> +struct read_thread_data {
>>>> + int stop;
>>>> + int status; /* Indicates any error encountered by the read
>>>> thread. */
>>>> + int epoll_set;
>>>> +};
>>>> +
>>>> +/*
>>>> + * The function executed by the read threads:
>>>> + */
>>>> +void *read_thread_function(void *function_data)
>>>> +{
>>>> + struct read_thread_data *thread_data =
>>>> + (struct read_thread_data *)function_data;
>>>> + struct epoll_event event_data;
>>>> + struct epoll_item_private *item_data;
>>>> + char socket_data;
>>>> +
>>>> + /* Handle events until we encounter an error or this thread's
>>>> 'stop'
>>>> + condition is set: */
>>>> + while (1) {
>>>> + int result = epoll_wait(thread_data->epoll_set,
>>>> + &event_data,
>>>> + 1, /* Number of desired
>>>> events */
>>>> + 1000); /* Timeout in ms */
>>>> + if (result < 0) {
>>>> + /* Breakpoints signal all threads. Ignore that
>>>> while
>>>> + debugging: */
>>>> + if (errno == EINTR)
>>>> + continue;
>>>> + thread_data->status = errno;
>>>> + return 0;
>>>> + } else if (thread_data->stop)
>>>> + return 0;
>>>> + else if (result == 0) /* Timeout */
>>>> + continue;
>>>> +
>>>> + /* We need the mutex here because checking for the stop
>>>> + condition and re-enabling the epoll item need to be
>>>> done
>>>> + together as one atomic operation when
>>>> EPOLL_CTL_DISABLE is
>>>> + available: */
>>>> + item_data = (struct epoll_item_private
>>>> *)event_data.data.ptr;
>>>> + pthread_mutex_lock(&item_data->mutex);
>>>> +
>>>> + /* Remove the item from the epoll set if we want to stop
>>>> + handling that event: */
>>>> + if (item_data->stop)
>>>> + delete_item(item_data->index);
>>>> + else {
>>>> + /* Clear the data that was written to the other
>>>> end of
>>>> + our non-blocking socket: */
>>>> + do {
>>>> + if (read(item_data->fd, &socket_data,
>>>> 1) < 1) {
>>>> + if ((errno == EAGAIN) ||
>>>> + (errno == EWOULDBLOCK))
>>>> + break;
>>>> + else
>>>> + goto error_unlock;
>>>> + }
>>>> + } while (item_data->events & EPOLLET);
>>>> +
>>>> + /* The item was one-shot, so re-enable it: */
>>>> + event_data.events = item_data->events;
>>>> + if (epoll_ctl(thread_data->epoll_set,
>>>> + EPOLL_CTL_MOD,
>>>> + item_data->fd,
>>>> + &event_data) < 0)
>>>> + goto error_unlock;
>>>> + }
>>>> +
>>>> + pthread_mutex_unlock(&item_data->mutex);
>>>> + }
>>>> +
>>>> +error_unlock:
>>>> + thread_data->status = item_data->status = errno;
>>>> + pthread_mutex_unlock(&item_data->mutex);
>>>> + return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * A pointer to a write_thread_data structure will be passed as the
>>>> argument to
>>>> + * the write thread:
>>>> + */
>>>> +struct write_thread_data {
>>>> + int stop;
>>>> + int status; /* Indicates any error encountered by the write
>>>> thread. */
>>>> + int n_fds;
>>>> + int *fds;
>>>> +};
>>>> +
>>>> +/*
>>>> + * The function executed by the write thread. It writes a single
>>>> byte to each
>>>> + * socket in turn until the stop condition for this thread is set.
>>>> If writing to
>>>> + * a socket would block (i.e. errno was EAGAIN), we leave that
>>>> socket alone for
>>>> + * the moment and just move on to the next socket in the list. We
>>>> don't care
>>>> + * about the order in which we deliver events to the epoll set. In
>>>> fact we don't
>>>> + * care about the data we're writing to the pipes at all; we just
>>>> want to
>>>> + * trigger epoll events:
>>>> + */
>>>> +void *write_thread_function(void *function_data)
>>>> +{
>>>> + const char data = 'X';
>>>> + int index;
>>>> + struct write_thread_data *thread_data =
>>>> + (struct write_thread_data *)function_data;
>>>> + while (!thread_data->stop)
>>>> + for (index = 0;
>>>> + !thread_data->stop && (index < thread_data->n_fds);
>>>> + ++index)
>>>> + if ((write(thread_data->fds[index], &data, 1) <
>>>> 1) &&
>>>> + (errno != EAGAIN) &&
>>>> + (errno != EWOULDBLOCK)) {
>>>> + thread_data->status = errno;
>>>> + return;
>>>> + }
>>>> +}
>>>> +
>>>> +/*
>>>> + * Arguments are currently ignored:
>>>> + */
>>>> +int main(int argc, char **argv)
>>>> +{
>>>> + const int n_read_threads = 100;
>>>> + const int n_epoll_items = 500;
>>>> + int index;
>>>> + int epoll_set = epoll_create1(0);
>>>> + struct write_thread_data write_thread_data = {
>>>> + 0, 0, n_epoll_items, malloc(n_epoll_items * sizeof(int))
>>>> + };
>>>> + struct read_thread_data *read_thread_data =
>>>> + malloc(n_read_threads * sizeof(struct read_thread_data));
>>>> + pthread_t *read_threads = malloc(n_read_threads *
>>>> sizeof(pthread_t));
>>>> + pthread_t write_thread;
>>>> + int socket_pair[2];
>>>> + struct epoll_event event_data;
>>>> +
>>>> + printf("-----------------\n");
>>>> + printf("Runing test_epoll\n");
>>>> + printf("-----------------\n");
>>>> +
>>>> + epoll_items = malloc(n_epoll_items * sizeof(struct
>>>> epoll_item_private));
>>>> +
>>>> + if (epoll_set < 0 || epoll_items == 0 || write_thread_data.fds
>>>> == 0 ||
>>>> + read_thread_data == 0 || read_threads == 0)
>>>> + goto error;
>>>> +
>>>> + if (sysconf(_SC_NPROCESSORS_ONLN) < 2) {
>>>> + printf("Error: please run this test on a multi-core
>>>> system.\n");
>>>> + goto error;
>>>> + }
>>>> +
>>>> + /* Create the socket pairs and epoll items: */
>>>> + for (index = 0; index < n_epoll_items; ++index) {
>>>> + if (socketpair(AF_UNIX,
>>>> + SOCK_STREAM | SOCK_NONBLOCK,
>>>> + 0,
>>>> + socket_pair) < 0)
>>>> + goto error;
>>>> + write_thread_data.fds[index] = socket_pair[0];
>>>> + epoll_items[index].index = index;
>>>> + epoll_items[index].fd = socket_pair[1];
>>>> + if (pthread_mutex_init(&epoll_items[index].mutex, NULL)
>>>> != 0)
>>>> + goto error;
>>>> + /* We always use EPOLLONESHOT because this test is
>>>> currently
>>>> + structured to demonstrate the need for
>>>> EPOLL_CTL_DISABLE,
>>>> + which only produces useful information in the
>>>> EPOLLONESHOT
>>>> + case (without EPOLLONESHOT, calling epoll_ctl with
>>>> + EPOLL_CTL_DISABLE will never return EBUSY). If
>>>> support for
>>>> + testing events without EPOLLONESHOT is desired, it
>>>> should
>>>> + probably be implemented in a separate unit test. */
>>>> + epoll_items[index].events = EPOLLIN | EPOLLONESHOT;
>>>> + if (index < n_epoll_items / 2)
>>>> + epoll_items[index].events |= EPOLLET;
>>>> + epoll_items[index].stop = 0;
>>>> + epoll_items[index].status = 0;
>>>> + epoll_items[index].deleted = 0;
>>>> + event_data.events = epoll_items[index].events;
>>>> + event_data.data.ptr = &epoll_items[index];
>>>> + if (epoll_ctl(epoll_set,
>>>> + EPOLL_CTL_ADD,
>>>> + epoll_items[index].fd,
>>>> + &event_data) < 0)
>>>> + goto error;
>>>> + }
>>>> +
>>>> +#ifdef EPOLL_CTL_DISABLE
>>>> + /* Test to make sure that using EPOLL_CTL_DISABLE without
>>>> EPOLLONESHOT
>>>> + returns a clear error: */
>>>> + if (socketpair(AF_UNIX,
>>>> + SOCK_STREAM | SOCK_NONBLOCK,
>>>> + 0,
>>>> + socket_pair) < 0)
>>>> + goto error;
>>>> + event_data.events = EPOLLIN;
>>>> + event_data.data.ptr = NULL;
>>>> + if (epoll_ctl(epoll_set, EPOLL_CTL_ADD,
>>>> + socket_pair[1], &event_data) < 0)
>>>> + goto error;
>>>> + if ((epoll_ctl(epoll_set, EPOLL_CTL_DISABLE,
>>>> + socket_pair[1], NULL) == 0) || (errno != EINVAL))
>>>> + goto error;
>>>> + if (epoll_ctl(epoll_set, EPOLL_CTL_DEL, socket_pair[1], NULL)
>>>> != 0)
>>>> + goto error;
>>>> +#endif
>>>> +
>>>> + /* Create and start the read threads: */
>>>> + for (index = 0; index < n_read_threads; ++index) {
>>>> + read_thread_data[index].stop = 0;
>>>> + read_thread_data[index].status = 0;
>>>> + read_thread_data[index].epoll_set = epoll_set;
>>>> + if (pthread_create(&read_threads[index],
>>>> + NULL,
>>>> + read_thread_function,
>>>> + &read_thread_data[index]) != 0)
>>>> + goto error;
>>>> + }
>>>> +
>>>> + if (pthread_create(&write_thread,
>>>> + NULL,
>>>> + write_thread_function,
>>>> + &write_thread_data) != 0)
>>>> + goto error;
>>>> +
>>>> + /* Cancel all event pollers: */
>>>> +#ifdef EPOLL_CTL_DISABLE
>>>> + for (index = 0; index < n_epoll_items; ++index) {
>>>> + pthread_mutex_lock(&epoll_items[index].mutex);
>>>> + ++epoll_items[index].stop;
>>>> + if (epoll_ctl(epoll_set,
>>>> + EPOLL_CTL_DISABLE,
>>>> + epoll_items[index].fd,
>>>> + NULL) == 0)
>>>> + delete_item(index);
>>>> + else if (errno != EBUSY) {
>>>> + pthread_mutex_unlock(&epoll_items[index].mutex);
>>>> + goto error;
>>>> + }
>>>> + /* EBUSY means events were being handled; allow the
>>>> other thread
>>>> + to delete the item. */
>>>> + pthread_mutex_unlock(&epoll_items[index].mutex);
>>>> + }
>>>> +#else
>>>> + for (index = 0; index < n_epoll_items; ++index) {
>>>> + pthread_mutex_lock(&epoll_items[index].mutex);
>>>> + ++epoll_items[index].stop;
>>>> + pthread_mutex_unlock(&epoll_items[index].mutex);
>>>> + /* Wait in case a thread running read_thread_function is
>>>> + currently executing code between epoll_wait and
>>>> + pthread_mutex_lock with this item. Note that a
>>>> longer delay
>>>> + would make double-deletion less likely (at the
>>>> expense of
>>>> + performance), but there is no guarantee that any
>>>> delay would
>>>> + ever be sufficient. Note also that we delete all event
>>>> + pollers at once for testing purposes, but in a
>>>> real-world
>>>> + environment we are likely to want to be able to
>>>> cancel event
>>>> + pollers at arbitrary times. Therefore we can't
>>>> improve this
>>>> + situation by just splitting this loop into two loops
>>>> + (i.e. signal 'stop' for all items, sleep, and then
>>>> delete all
>>>> + items). We also can't fix the problem via EPOLL_CTL_DEL
>>>> + because that command can't prevent the case where
>>>> some other
>>>> + thread is executing read_thread_function within the
>>>> region
>>>> + mentioned above: */
>>>> + usleep(1);
>>>> + pthread_mutex_lock(&epoll_items[index].mutex);
>>>> + if (!epoll_items[index].deleted)
>>>> + delete_item(index);
>>>> + pthread_mutex_unlock(&epoll_items[index].mutex);
>>>> + }
>>>> +#endif
>>>> +
>>>> + /* Shut down the read threads: */
>>>> + for (index = 0; index < n_read_threads; ++index)
>>>> + __sync_fetch_and_add(&read_thread_data[index].stop, 1);
>>>> + for (index = 0; index < n_read_threads; ++index) {
>>>> + if (pthread_join(read_threads[index], NULL) != 0)
>>>> + goto error;
>>>> + if (read_thread_data[index].status)
>>>> + goto error;
>>>> + }
>>>> +
>>>> + /* Shut down the write thread: */
>>>> + __sync_fetch_and_add(&write_thread_data.stop, 1);
>>>> + if ((pthread_join(write_thread, NULL) != 0) ||
>>>> write_thread_data.status)
>>>> + goto error;
>>>> +
>>>> + /* Check for final error conditions: */
>>>> + for (index = 0; index < n_epoll_items; ++index) {
>>>> + if (epoll_items[index].status != 0)
>>>> + goto error;
>>>> + if (pthread_mutex_destroy(&epoll_items[index].mutex) < 0)
>>>> + goto error;
>>>> + }
>>>> + for (index = 0; index < n_epoll_items; ++index)
>>>> + if (epoll_items[index].deleted != 1) {
>>>> + printf("Error: item data deleted %1d times.\n",
>>>> + epoll_items[index].deleted);
>>>> + goto error;
>>>> + }
>>>> +
>>>> + printf("[PASS]\n");
>>>> + return 0;
>>>> +
>>>> + error:
>>>> + printf("[FAIL]\n");
>>>> + return errno;
>>>> +}
>>>>
>>>
>>> .
>>>
>>
>
> .
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists