[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <565DFF0A.4060901@akamai.com>
Date: Tue, 1 Dec 2015 15:11:54 -0500
From: Jason Baron <jbaron@...mai.com>
To: Madars Vitolins <m@...odev.com>
Cc: Eric Wong <normalperson@...t.net>, linux-kernel@...r.kernel.org
Subject: Re: epoll and multiple processes - eliminate unneeded process
wake-ups
Hi Madars,
On 11/30/2015 04:28 PM, Madars Vitolins wrote:
> Hi Jason,
>
> I today did search the mail archive and checked your offered patch did on February, it basically does the some (flag for add_wait_queue_exclusive() + balance).
>
> So I plan to run off some tests with your patch, flag on/off and will provide results. I guess if I pull up 250 or 500 processes (which could real for production environment) waiting on one Q, then there could be a notable difference in performance with EPOLLEXCLUSIVE set or not.
>
Sounds good. Below is an updated patch if you want to try it - it only adds the 'EPOLLEXCLUSIVE' flag.
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1e009ca..265fa7b 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -92,7 +92,7 @@
*/
/* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET | EPOLLEXCLUSIVE)
/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4
@@ -1002,6 +1002,7 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
unsigned long flags;
struct epitem *epi = ep_item_from_wait(wait);
struct eventpoll *ep = epi->ep;
+ int ewake = 0;
if ((unsigned long)key & POLLFREE) {
ep_pwq_from_wait(wait)->whead = NULL;
@@ -1066,8 +1067,10 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
* Wake up ( if active ) both the eventpoll wait list and the ->poll()
* wait list.
*/
- if (waitqueue_active(&ep->wq))
+ if (waitqueue_active(&ep->wq)) {
+ ewake = 1;
wake_up_locked(&ep->wq);
+ }
if (waitqueue_active(&ep->poll_wait))
pwake++;
@@ -1078,6 +1081,9 @@ out_unlock:
if (pwake)
ep_poll_safewake(&ep->poll_wait);
+ if (epi->event.events & EPOLLEXCLUSIVE)
+ return ewake;
+
return 1;
}
@@ -1095,7 +1101,10 @@ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
pwq->whead = whead;
pwq->base = epi;
- add_wait_queue(whead, &pwq->wait);
+ if (epi->event.events & EPOLLEXCLUSIVE)
+ add_wait_queue_exclusive(whead, &pwq->wait);
+ else
+ add_wait_queue(whead, &pwq->wait);
list_add_tail(&pwq->llink, &epi->pwqlist);
epi->nwait++;
} else {
@@ -1861,6 +1870,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
if (f.file == tf.file || !is_file_epoll(f.file))
goto error_tgt_fput;
+ if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
+ (op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
+ goto error_tgt_fput;
+
/*
* At this point it is safe to assume that the "private_data" contains
* our own data structure.
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index bc81fb2..925bbfb 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -26,6 +26,9 @@
#define EPOLL_CTL_DEL 2
#define EPOLL_CTL_MOD 3
+/* Add exclusively */
+#define EPOLLEXCLUSIVE (1 << 28)
+
/*
* Request the handling of system wakeup events so as to prevent system suspends
* from happening while those events are being processed.
> During kernel hacking with debug print, with 10 processes waiting on one event source, with original kernel I did see lot un-needed processing inside of eventpoll.c, it got 10x calls to ep_poll_callback() and other stuff for single event, which results with few processes waken up in user space (count probably gets randomly depending on concurrency).
>
>
> Meanwhile we are not the only ones who talk about this patch, see here: http://stackoverflow.com/questions/33226842/epollexclusive-and-epollroundrobin-flags-in-mainstream-kernel others are asking too.
>
> So what is the current situation with your patch, what is the blocking for getting it into mainline?
>
If we can show some good test results here I will re-submit it.
Thanks,
-Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists