[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9664870cea1bbe5938ac40ff2c161be6@silodev.com>
Date: Sat, 05 Dec 2015 13:47:10 +0200
From: Madars Vitolins <m@...odev.com>
To: Jason Baron <jbaron@...mai.com>
Cc: Eric Wong <normalperson@...t.net>, linux-kernel@...r.kernel.org
Subject: Re: epoll and multiple processes - eliminate unneeded process
wake-ups
Hi Jason,
I did the testing and wrote for it a blog article for this:
https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
But in summary is following:
Test case:
- One multi-threaded binary with 10 threads are doing total of 1'000'000
calls to 250 single threaded processes doing epoll() on the Posix queue
- The 'call' are basically sending a message to shared queue (to those
250 load balanced processed) and they send reply back to client thread's
private queue
Tests done on following system:
- Host system: Linux Mint Mate 17.2 64bit, kernel: 3.13.0-24-generic
- CPU: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz (two cores)
- RAM: 16 GB
- Visualization platform: Oracle Virtual Box 4.3.28
- Guest OS: Gentoo Linux 2015.03, kernel 4.3.0-gentoo, 64 bit.
- CPU for guest: Two cores
- RAM for guest: 5GB (no swap usage, free about 4GB)
- Enduro/X version: 2.3.2
Results with original kernel (no EPOLLEXCLUSIVE):
Gives:
$ time ./bankcl
...
real 14m20.561s
user 0m21.823s
sys 10m49.821s
Patched kernel version with EPOLLEXCLUSIVE flag in use:
$ time ./bankcl
...
real 0m24.953s
user 0m17.497s
sys 0m4.445s
Thus 14 minutes vs 24 seconds! So EPOLLEXCLUSIVE flag makes application
to run *35 times faster*!
Guys this is MUST HAVE patch!
Thanks,
Madars
Jason Baron @ 2015-12-01 22:11 rakstīja:
> Hi Madars,
>
> On 11/30/2015 04:28 PM, Madars Vitolins wrote:
>> Hi Jason,
>>
>> I today did search the mail archive and checked your offered patch did
>> on February, it basically does the some (flag for
>> add_wait_queue_exclusive() + balance).
>>
>> So I plan to run off some tests with your patch, flag on/off and will
>> provide results. I guess if I pull up 250 or 500 processes (which
>> could real for production environment) waiting on one Q, then there
>> could be a notable difference in performance with EPOLLEXCLUSIVE set
>> or not.
>>
>
> Sounds good. Below is an updated patch if you want to try it - it only
> adds the 'EPOLLEXCLUSIVE' flag.
>
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 1e009ca..265fa7b 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -92,7 +92,7 @@
> */
>
> /* Epoll private bits inside the event mask */
> -#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)
> +#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET |
> EPOLLEXCLUSIVE)
>
> /* Maximum number of nesting allowed inside epoll sets */
> #define EP_MAX_NESTS 4
> @@ -1002,6 +1002,7 @@ static int ep_poll_callback(wait_queue_t *wait,
> unsigned mode, int sync, void *k
> unsigned long flags;
> struct epitem *epi = ep_item_from_wait(wait);
> struct eventpoll *ep = epi->ep;
> + int ewake = 0;
>
> if ((unsigned long)key & POLLFREE) {
> ep_pwq_from_wait(wait)->whead = NULL;
> @@ -1066,8 +1067,10 @@ static int ep_poll_callback(wait_queue_t *wait,
> unsigned mode, int sync, void *k
> * Wake up ( if active ) both the eventpoll wait list and the
> ->poll()
> * wait list.
> */
> - if (waitqueue_active(&ep->wq))
> + if (waitqueue_active(&ep->wq)) {
> + ewake = 1;
> wake_up_locked(&ep->wq);
> + }
> if (waitqueue_active(&ep->poll_wait))
> pwake++;
>
> @@ -1078,6 +1081,9 @@ out_unlock:
> if (pwake)
> ep_poll_safewake(&ep->poll_wait);
>
> + if (epi->event.events & EPOLLEXCLUSIVE)
> + return ewake;
> +
> return 1;
> }
>
> @@ -1095,7 +1101,10 @@ static void ep_ptable_queue_proc(struct file
> *file, wait_queue_head_t *whead,
> init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
> pwq->whead = whead;
> pwq->base = epi;
> - add_wait_queue(whead, &pwq->wait);
> + if (epi->event.events & EPOLLEXCLUSIVE)
> + add_wait_queue_exclusive(whead, &pwq->wait);
> + else
> + add_wait_queue(whead, &pwq->wait);
> list_add_tail(&pwq->llink, &epi->pwqlist);
> epi->nwait++;
> } else {
> @@ -1861,6 +1870,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op,
> int, fd,
> if (f.file == tf.file || !is_file_epoll(f.file))
> goto error_tgt_fput;
>
> + if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
> + (op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
> + goto error_tgt_fput;
> +
> /*
> * At this point it is safe to assume that the "private_data"
> contains
> * our own data structure.
> diff --git a/include/uapi/linux/eventpoll.h
> b/include/uapi/linux/eventpoll.h
> index bc81fb2..925bbfb 100644
> --- a/include/uapi/linux/eventpoll.h
> +++ b/include/uapi/linux/eventpoll.h
> @@ -26,6 +26,9 @@
> #define EPOLL_CTL_DEL 2
> #define EPOLL_CTL_MOD 3
>
> +/* Add exclusively */
> +#define EPOLLEXCLUSIVE (1 << 28)
> +
> /*
> * Request the handling of system wakeup events so as to prevent
> system suspends
> * from happening while those events are being processed.
>
>
>> During kernel hacking with debug print, with 10 processes waiting on
>> one event source, with original kernel I did see lot un-needed
>> processing inside of eventpoll.c, it got 10x calls to
>> ep_poll_callback() and other stuff for single event, which results
>> with few processes waken up in user space (count probably gets
>> randomly depending on concurrency).
>>
>>
>> Meanwhile we are not the only ones who talk about this patch, see
>> here:
>> http://stackoverflow.com/questions/33226842/epollexclusive-and-epollroundrobin-flags-in-mainstream-kernel
>> others are asking too.
>>
>> So what is the current situation with your patch, what is the blocking
>> for getting it into mainline?
>>
>
> If we can show some good test results here I will re-submit it.
>
> Thanks,
>
> -Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists