linux-kernel - Re: epoll and multiple processes - eliminate unneeded process wake-ups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150803234842.GA21995@dcvr.yhbt.net>
Date:	Mon, 3 Aug 2015 23:48:42 +0000
From:	Eric Wong <normalperson@...t.net>
To:	Madars Vitolins <m@...odev.com>
Cc:	linux-kernel@...r.kernel.org, Jason Baron <jbaron@...mai.com>
Subject: Re: epoll and multiple processes - eliminate unneeded process
 wake-ups

Madars Vitolins <m@...odev.com> wrote:
> Hi Folks,
> 
> I am developing kind of open systems application, which uses
> multiple processes/executables where each of them monitors some set
> of resources (in this case POSIX Queues) via epoll interface. For
> example when 10 processes on same queue are in state of epoll_wait()
> and one message arrives, all 10 processes gets woken up and all of
> them tries to read the message from Q. One succeeds, the others gets
> EAGAIN error. The problem is with those others, which generates
> extra context switches - useless CPU usage. With more processes
> inefficiency gets higher.
> 
> I tried to use EPOLLONESHOT, but no help. Seems this is suitable for
> multi-threaded application and not for multi-process application.

Correct.  Most FDs are not shared across processes.

> Ideal mechanism for this would be:
> 1. If multiple epoll sets in kernel matches same event and one or
> more processes are in state of epoll_wait() - then send event only
> to one waiter.
> 2. If none of processes are in wait state, then send the event to
> all epoll sets (as it is currently). Then the first free process
> will grab the event.

Jason Baron was working on this (search LKML archives for
EPOLLEXCLUSIVE, EPOLLROUNDROBIN, EPOLL_ROTATE)

However, I was unconvinced about modifying epoll.

Perhaps I may be more easily convinced about your mqueue case than his
case for listen sockets, though[*]

Typical applications have few (probably only one) listen sockets or
POSIX mqueues; so I would rather use dedicated threads to issue
blocking syscalls (accept4 or mq_timedreceive).

Making blocking syscalls allows exclusive wakeups to avoid thundering
herds.

> How do you think, would it be real to implement this? How about
> concurrency?
> Can you please give me some hints from which points in code to start
> to implement these changes?

For now, I suggest dedicating a thread in each process to do
mq_timedreceive/mq_receive, assuming you only have a small amount
of queues in your system.


[*] mq_timedreceive may copy a largish buffer which benefits from
    staying on the same CPU as much as possible.
    Contrary, accept4 only creates a client socket.  With a C10K+
    socket server (e.g. http/memcached/DB), a typical new client
    socket spends a fair amount of time idle.  Thus I don't believe
    memory locality inside the kernel is much concern when there's
    thousands of accepted client sockets.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/