lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 26 Aug 2015 15:53:26 -0700
From:	Hideaki Kimura <hideaki.kimura@....com>
To:	Jason Low <jason.low2@...com>, Oleg Nesterov <oleg@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>
CC:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	linux-kernel@...r.kernel.org,
	Frederic Weisbecker <fweisbec@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Rik van Riel <riel@...hat.com>,
	Scott J Norton <scott.norton@...com>
Subject: Re: [PATCH 0/3] timer: Improve itimers scalability

Sure, let me elaborate.

Executive summary:
  Yes, enabling a process-wide timer in such a large machine is not 
wise, but sometimes users/applications cannot avoid it.


The issue was observed actually not in a database itself but in a common 
library it links to; gperftools.

The database itself is optimized for many-cores/sockets, so surely it 
avoids putting a process-wide timer or other unscalable things. It just 
links to libprofiler for an optional feature to profile performance 
bottleneck only when the user turns it on. We of course avoid turning 
the feature on unless while we debug/tune the database.

However, libprofiler sets the timer even when the client program doesn't 
invoke any of its functions: libprofiler does it when the shared library 
is loaded. We requested the developer of libprofiler to change the 
behavior, but seems like there is a reason to keep that behavior:
   https://code.google.com/p/gperftools/issues/detail?id=133

Based on this, I think there are two reasons why we should ameliorate 
this issue in kernel layer.


1. In the particular case, it's hard to prevent or even detect the issue 
in user space.

We (a team of low-level database and kernel experts) in fact spent huge 
amount of time to just figure out what's the bottleneck there because 
nothing measurable happens in user space. I pulled out countless hairs.

Also, the user has to de-link the library from the application to 
prevent the itimer installation. Imagine a case where the software is 
proprietary. It won't fly.


2. This is just one example. There could be many other such 
binaries/libraries that do similar things somewhere in a complex 
software stack.

Today we haven't heard of many such cases, but people will start hitting 
it once 100s~1,000s of cores become common.


After applying this patchset, we have observed that the performance hit 
almost completely went away at least for 240 cores. So, it's quite 
beneficial in real world.

-- 
Hideaki Kimura
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ