lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <cover.1424805740.git.jbaron@akamai.com>
Date:	Tue, 24 Feb 2015 21:25:39 +0000 (GMT)
From:	Jason Baron <jbaron@...mai.com>
To:	peterz@...radead.org, mingo@...hat.com, viro@...iv.linux.org.uk
Cc:	akpm@...ux-foundation.org, normalperson@...t.net,
	davidel@...ilserver.org, mtk.manpages@...il.com,
	luto@...capital.net, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, linux-api@...r.kernel.org
Subject: [PATCH v3 0/3] epoll: introduce round robin wakeup mode

Hi,

When we are sharing a wakeup source among multiple epoll fds, we end up with
thundering herd wakeups, since there is currently no way to add to the
wakeup source exclusively. This series introduces a new EPOLL_ROTATE flag
to allow for round robin exclusive wakeups.

I believe this patch series addresses the two main concerns that were raised in
prior postings. Namely, that it affected code (and potentially performance)
of the core kernel wakeup functions, even in cases where it was not strictly
needed, and that it could lead to wakeup starvation (since we were are no
longer waking up all waiters). It does so by adding an extra layer of
indirection, whereby waiters are attached to a 'psuedo' epoll fd, which in turn
is attached directly to the wakeup source.

Patch 1 introduces the required wakeup hooks. This could be restricted to just
the epoll code, but I added them to the generic code in case other ppl might
find them useful.

Patch 2 adds an optimization to the epoll wakeup code that allows EPOLL_ROTATE
to work optimally, however it could be its own standalone patch.

Finally, patch 3 adds the EPOLL_ROTATE, and documents the API usage.

I'm also inlining test code making use of this interface, which shows roughly
a 50% speedup, similar to my previous results: http://lwn.net/Articles/632590/.

Sample epoll_create1 manpage text:

EPOLL_ROTATE
	Set the 'exclusive rotation' rotation flag on the new file descriptor.
	This new file descriptor can be added via epoll_ctl() to at most 1
	non-epoll file descriptors. Any epoll fds addeded directory to the
	new file descriptor via epoll_ctl() will be woken up in a round robin
	exclusive manner.

Thanks,

-Jason

v3:
-restrict epoll exclusive rotate wakeups to within the epoll code
-Add epoll optimization for overflow list

Jason Baron (3):
  sched/wait: add __wake_up_rotate()
  epoll: limit wakeups to the overflow list
  epoll: Add EPOLL_ROTATE mode

 fs/eventpoll.c                 | 52 +++++++++++++++++++++++++++++++++++-------
 include/linux/wait.h           |  1 +
 include/uapi/linux/eventpoll.h |  4 ++++
 kernel/sched/wait.c            | 27 ++++++++++++++++++++++
 4 files changed, 76 insertions(+), 8 deletions(-)

-- 
1.8.2.rc2



#include <unistd.h>
#include <sys/epoll.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define NUM_THREADS 100
#define NUM_EVENTS 20000
#define EPOLLEXCLUSIVE (1 << 28)
#define EPOLLBALANCED (1 << 27)

int optimize, exclusive;
int p[2];
int ep_src_fd;
pthread_t threads[NUM_THREADS];
int event_count[NUM_THREADS];

struct epoll_event evt = {
	.events = EPOLLIN 
};

void die(const char *msg) {
    perror(msg);
    exit(-1);
}

void *run_func(void *ptr)
{
	int i = 0;
	int j = 0;
	int ret;
	int epfd;
	char buf[4];
	int id = *(int *)ptr;
	int *contents;

	if ((epfd = epoll_create(1)) < 0)
		die("create");

	ret = epoll_ctl(epfd, EPOLL_CTL_ADD, ep_src_fd, &evt);
	if (ret)
		perror("epoll_ctl add error!\n");

	while (1) { 
    		ret = epoll_wait(epfd, &evt, 10000, -1);
		ret = read(p[0], buf, sizeof(int));
		if (ret == 4)
			event_count[id]++;
	}
}

#define EPOLL_ROTATE 1

int main(int argc, char *argv[])
{
	int ret, i, j;
	int id[NUM_THREADS];
	int total = 0;
	int nohit = 0;
	int extra_wakeups = 0;

	if (argc == 2) {
		if (strcmp(argv[1], "-o") == 0)
			optimize = 1;
		if (strcmp(argv[1], "-e") == 0)
			exclusive = 1;
	}

	if (pipe(p) < 0)
		die("pipe");
	if (optimize) {
		if ((ep_src_fd = epoll_create1(EPOLL_ROTATE)) < 0)
			die("create");
	} else {
		if ((ep_src_fd = epoll_create1(0)) < 0)
			die("create");
	}
			
	ret = epoll_ctl(ep_src_fd, EPOLL_CTL_ADD, p[0], &evt);
	if (ret)
		perror("epoll_ctl add core error!\n");

	for (i = 0; i < NUM_THREADS; i++) {
		id[i] = i;
		pthread_create(&threads[i], NULL, run_func, &id[i]);
	} 

	for (j = 0; j < NUM_EVENTS; j++) {
		write(p[1], p, sizeof(int));
		usleep(100);
	}

	for (i = 0; i < NUM_THREADS; i++) {
		pthread_cancel(threads[i]);
		printf("joined: %d\n", i);
		printf("event count: %d\n", event_count[i]);
		total += event_count[i];
		if (!event_count[i])
			nohit++;
	} 

	printf("total events is: %d\n", total);
	printf("nohit is: %d\n", nohit);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ