lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Sat, 05 Dec 2015 13:47:10 +0200
From:	Madars Vitolins <m@...odev.com>
To:	Jason Baron <jbaron@...mai.com>
Cc:	Eric Wong <normalperson@...t.net>, linux-kernel@...r.kernel.org
Subject: Re: epoll and multiple processes - eliminate unneeded process
 wake-ups

Hi Jason,

I did the testing and wrote for it a blog article for this: 
https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/

But in summary is following:

Test case:
- One multi-threaded binary with 10 threads are doing total of 1'000'000 
calls to 250 single threaded processes doing epoll() on the Posix queue
- The 'call' are basically sending a message to shared queue (to those 
250 load balanced processed) and they send reply back to client thread's 
private queue

Tests done on following system:
- Host system: Linux Mint Mate 17.2 64bit, kernel: 3.13.0-24-generic
- CPU: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz (two cores)
- RAM: 16 GB
- Visualization platform: Oracle Virtual Box 4.3.28
- Guest OS: Gentoo Linux 2015.03, kernel 4.3.0-gentoo, 64 bit.
- CPU for guest: Two cores
- RAM for guest: 5GB (no swap usage, free about 4GB)
- Enduro/X version: 2.3.2


Results with original kernel (no EPOLLEXCLUSIVE):
Gives:

$ time ./bankcl
...

real 14m20.561s
user 0m21.823s
sys 10m49.821s


Patched kernel version with EPOLLEXCLUSIVE flag in use:
$ time ./bankcl
...
real 0m24.953s
user 0m17.497s
sys 0m4.445s

Thus 14 minutes vs 24 seconds! So EPOLLEXCLUSIVE flag makes application 
to run *35 times faster*!

Guys this is MUST HAVE patch!

Thanks,
Madars



Jason Baron @ 2015-12-01 22:11 rakstīja:
> Hi Madars,
> 
> On 11/30/2015 04:28 PM, Madars Vitolins wrote:
>> Hi Jason,
>> 
>> I today did search the mail archive and checked your offered patch did 
>> on February, it basically does the some (flag for 
>> add_wait_queue_exclusive() + balance).
>> 
>> So I plan to run off some tests with your patch, flag on/off and will 
>> provide results. I guess if I pull up 250 or 500 processes (which 
>> could real for production environment) waiting on one Q, then there 
>> could be a notable difference in performance with EPOLLEXCLUSIVE set 
>> or not.
>> 
> 
> Sounds good. Below is an updated patch if you want to try it - it only
> adds the 'EPOLLEXCLUSIVE' flag.
> 
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 1e009ca..265fa7b 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -92,7 +92,7 @@
>   */
> 
>  /* Epoll private bits inside the event mask */
> -#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)
> +#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET | 
> EPOLLEXCLUSIVE)
> 
>  /* Maximum number of nesting allowed inside epoll sets */
>  #define EP_MAX_NESTS 4
> @@ -1002,6 +1002,7 @@ static int ep_poll_callback(wait_queue_t *wait,
> unsigned mode, int sync, void *k
>  	unsigned long flags;
>  	struct epitem *epi = ep_item_from_wait(wait);
>  	struct eventpoll *ep = epi->ep;
> +	int ewake = 0;
> 
>  	if ((unsigned long)key & POLLFREE) {
>  		ep_pwq_from_wait(wait)->whead = NULL;
> @@ -1066,8 +1067,10 @@ static int ep_poll_callback(wait_queue_t *wait,
> unsigned mode, int sync, void *k
>  	 * Wake up ( if active ) both the eventpoll wait list and the 
> ->poll()
>  	 * wait list.
>  	 */
> -	if (waitqueue_active(&ep->wq))
> +	if (waitqueue_active(&ep->wq)) {
> +		ewake = 1;
>  		wake_up_locked(&ep->wq);
> +	}
>  	if (waitqueue_active(&ep->poll_wait))
>  		pwake++;
> 
> @@ -1078,6 +1081,9 @@ out_unlock:
>  	if (pwake)
>  		ep_poll_safewake(&ep->poll_wait);
> 
> +	if (epi->event.events & EPOLLEXCLUSIVE)
> +		return ewake;
> +
>  	return 1;
>  }
> 
> @@ -1095,7 +1101,10 @@ static void ep_ptable_queue_proc(struct file
> *file, wait_queue_head_t *whead,
>  		init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
>  		pwq->whead = whead;
>  		pwq->base = epi;
> -		add_wait_queue(whead, &pwq->wait);
> +		if (epi->event.events & EPOLLEXCLUSIVE)
> +			add_wait_queue_exclusive(whead, &pwq->wait);
> +		else
> +			add_wait_queue(whead, &pwq->wait);
>  		list_add_tail(&pwq->llink, &epi->pwqlist);
>  		epi->nwait++;
>  	} else {
> @@ -1861,6 +1870,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, 
> int, fd,
>  	if (f.file == tf.file || !is_file_epoll(f.file))
>  		goto error_tgt_fput;
> 
> +	if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
> +		(op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
> +		goto error_tgt_fput;
> +
>  	/*
>  	 * At this point it is safe to assume that the "private_data" 
> contains
>  	 * our own data structure.
> diff --git a/include/uapi/linux/eventpoll.h 
> b/include/uapi/linux/eventpoll.h
> index bc81fb2..925bbfb 100644
> --- a/include/uapi/linux/eventpoll.h
> +++ b/include/uapi/linux/eventpoll.h
> @@ -26,6 +26,9 @@
>  #define EPOLL_CTL_DEL 2
>  #define EPOLL_CTL_MOD 3
> 
> +/* Add exclusively */
> +#define EPOLLEXCLUSIVE (1 << 28)
> +
>  /*
>   * Request the handling of system wakeup events so as to prevent
> system suspends
>   * from happening while those events are being processed.
> 
> 
>> During kernel hacking with debug print, with 10 processes waiting on 
>> one event source, with original kernel I did see lot un-needed 
>> processing inside of eventpoll.c, it got 10x calls to 
>> ep_poll_callback() and other stuff for single event, which results 
>> with few processes waken up in user space (count probably gets 
>> randomly depending on concurrency).
>> 
>> 
>> Meanwhile we are not the only ones who talk about this patch, see 
>> here: 
>> http://stackoverflow.com/questions/33226842/epollexclusive-and-epollroundrobin-flags-in-mainstream-kernel 
>> others are asking too.
>> 
>> So what is the current situation with your patch, what is the blocking 
>> for getting it into mainline?
>> 
> 
> If we can show some good test results here I will re-submit it.
> 
> Thanks,
> 
> -Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ