lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1509195507-29037-1-git-send-email-houtao1@huawei.com>
Date:   Sat, 28 Oct 2017 20:58:19 +0800
From:   Hou Tao <houtao1@...wei.com>
To:     <linux-fsdevel@...r.kernel.org>
CC:     <linux-kernel@...r.kernel.org>, <viro@...iv.linux.org.uk>,
        <jbaron@...mai.com>, <oleg@...hat.com>, <dave@...olabs.net>,
        <koct9i@...il.com>
Subject: [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case

Hi,

We are optimizing the Request-Per-Second of nginx http server, and we found
that acquiring epmutex in eventpoll_release_file() will become a bottleneck
under the one-request-per-connection scenario. The following are some details
of the scenario:

* HTTP server (nginx):
	* under ARM64 with 64 cores
	* 64 worker processes, each worker is binded to a specific CPU
	* keepalive_requests = 1 in nginx.conf: nginx will close the
	  connection fd after a reply is send
* HTTP benchmark tool (wrk):
	* under x86-64 with 48 cores
	* 16 threads, 64 connections per-thread

Before the patch, the RPS measured by wrk is ~220K, after applying
the patch the RPS is ~240K. We also measure the overhead of
eventpoll_release_file() and its children by perf: 29% before and
2% after.

In the following section I will explain the purposes of epmutex, and
the way of replacing it by using locks with a smaller granularity.

epmutex serves four purposes:
(1) serialize ep_loop_check() and ep_free()/eventpoll_release_file()
	(a) ensure the validity of ep when clearing visited_list
	The acquisition of epmutex in ep_free() prevent the freeing of ep.

	It's fixed in patch 2: when freeing ep, remove it from visited_list.
	When there is no nested-epoll cast, ep will not been added to
	visited_list, so we check the condition first. If it has already been
	added to visited_list, we need to wait for the release of epmutex.

(2) serialize reverse_path_check() and ep_free()/eventpoll_release_file()
	(a) ensure the validity of file in tfile_check_list
	epi->ffd.file was added to tfile_check_list under ep->mtx, but
	was accessed without ep->mtx. The acquisition of epmutex in
	eventpoll_release_file() prevent the freeing of file.

	It's fixed in patch 3: when releasing file, remove it from
	tfile_check_list. If it has been already added, we need to
	wait for the release of epmutex.

	(b) ensure the validity of epi->ep and epi->ep->file
	The epmutex will prevent the freeing of ep and its related file,
	so it's OK to access epi->ep under rcu read critical region.

	The change is done in patch 4: we free ep by rcu, so it's OK
	to access epi->ep->file under rcu read critical region. The file
	has already been freed by rcu, so it's also OK to access its fields.

(3) serialize the concurrent invocations of epoll_ctl(EPOLL_CTL_ADD)
    for the nested-epoll-fd case
	(a) protect tfile_check_list and visited_list

	There is nothing to do.

(4) serialize ep_free() and eventpoll_release_file()
	(a) protect file->f_ep_links
	eventpoll_release_file() will read the list through
	file->f_ep_links, and modify it through epi->fllink.
	ep_free() will modify it through epi->fllink.

	It's fixed in patch 5: using rcu and list_first_or_null_rcu() to
	iterate file->f_ep_links instead of epmutex.

	(b) ensure the validity of epi->ep
	When eventpoll_release_file() gets epi from file->f_ep_links,
	epi->ep should still be valid.

	It's fixed in patch 4 and 6: add an ref-counter to eventpoll and
	free eventpoll by rcu.

	(c) protect the removal of epi
	Both ep_free() and eventpoll_release_file() will try to remove
	the same epi, if one function has removed the epi, the other
	function should not remove it again.

	It's fixed in patch 7: check whether or not ep_free() has already
	removed the epi before the invocation of ep_remove() in
	eventpoll_release_file().

	(d) ensure the validity of epi->ffd.file
	When ep_remove() is invoked by ep_free(), epi->ffd.file should
	still be valid.

	Do not need to do anything: when ep_free() is invoking ep_remove()
	and access epi->ffd.file, if the file is freeing, the freeing will
	be blocked on ep->mtx, so it's OK to access the file in ep_remove().

Patch 1 just removes epmutex from ep_free() and eventpoll_release_file(),
and patch 8 enlarge the protected region of ep->mtx to protect against
the iteration of ep->rbr.

The patch set has passed the epoll related test cases in LTP, and we are
planing to run some torture or performance test cases for nested-epoll
cases.

Comments and questions are welcome.

Regards,

Tao
---
Hou Tao (8):
  epoll: remove epmutex from ep_free() & eventpoll_release_file()
  epoll: remove ep from visited_list when freeing ep
  epoll: remove file from tfile_check_list when releasing file
  epoll: free eventpoll by rcu to provide existence guarantee
  epoll: iterate epi in file->f_ep_links by using list_first_or_null_rcu
  epoll: ensure the validity of ep when removing epi in
    eventpoll_release_file()
  epoll: prevent the double-free of epi in eventpoll_release_file()
  epoll: protect the iteration of ep->rbr by ep->mtx in ep_free()

 fs/eventpoll.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 88 insertions(+), 14 deletions(-)

-- 
2.7.5

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ