lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 18 Jun 2014 04:14:25 -0700
From:	"Luis R. Rodriguez" <mcgrof@...not-panic.com>
To:	hpa@...ux.intel.com, akpm@...ux-foundation.org
Cc:	linux-kernel@...r.kernel.org,
	"Luis R. Rodriguez" <mcgrof@...e.com>,
	Andrew Lunn <andrew@...n.ch>,
	Stephen Warren <swarren@...dotorg.org>,
	Michal Hocko <mhocko@...e.cz>, Petr Mladek <pmladek@...e.cz>,
	Joe Perches <joe@...ches.com>,
	Arun KS <arunks.linux@...il.com>,
	Kees Cook <keescook@...omium.org>,
	Davidlohr Bueso <davidlohr@...com>,
	Chris Metcalf <cmetcalf@...era.com>
Subject: [RFT 2/2] printk: allow increasing the ring buffer depending on the number of CPUs

From: "Luis R. Rodriguez" <mcgrof@...e.com>

The default size of the ring buffer is too small for machines
with a large amount of CPUs under heavy load. What ends up
happening when debugging is the ring buffer overlaps and chews
up old messages making debugging impossible unless the size is
passed as a kernel parameter. An idle system upon boot up will
on average spew out only about one or two extra lines but where
this really matters is on heavy load and that will vary widely
depending on the system and environment.

There are mechanisms to help increase the kernel ring buffer
for tracing through debugfs, and those interfaces even allow growing
the kernel ring buffer per CPU. We also have a static value which
can be passed upon boot. Relying on debugfs however is not ideal
for production, and relying on the value passed upon bootup is
can only used *after* an issue has creeped up. Instead of being
reactive this adds a proactive measure which lets you scale the
amount of contributions you'd expect to the kernel ring buffer
under load by each CPU in the worst case scenario.

We use num_possible_cpus() to avoid complexities which could be
introduced by dynamically changing the ring buffer size at run
time, num_possible_cpus() lets us use the upper limit on possible
number of CPUs therefore avoiding having to deal with hotplugging
CPUs on and off. This introduces the kernel configuration option
LOG_CPU_MIN_BUF_SHIFT which is used to specify the maximum amount
of contributions to the kernel ring buffer in the worst case before
the kernel ring buffer flips over, the size is specified as a power
of 2. The total amount of contributions made by each CPU must be
greater than half of the default kernel ring buffer size
(1 << LOG_BUF_SHIFT bytes) in order to trigger an increase upon
bootup. The kernel ring buffer is increased to the next power of
two that would fit the required minimum kernel ring buffer size
plus the additional CPU contribution. For example if LOG_BUF_SHIFT
is 18 (256 KB) you'd require at least 128 KB contributions by
other CPUs in order to trigger an increase of the kernel ring buffer.
With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least
anything over > 64 possible CPUs to trigger an increase. If you
had 128 possible CPUs the amount of minimum required kernel ring
buffer bumps to:

   ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB

Since we require the ring buffer to be a power of two the new
required size would be 1024 KB.

This CPU contributions are ignored when the "log_buf_len" kernel parameter
is used as it forces the exact size of the ring buffer to an expected power
of two value.

Cc: Andrew Lunn <andrew@...n.ch>
Cc: Stephen Warren <swarren@...dotorg.org>
Cc: Michal Hocko <mhocko@...e.cz>
Cc: Petr Mladek <pmladek@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: Joe Perches <joe@...ches.com>
Cc: Arun KS <arunks.linux@...il.com>
Cc: Kees Cook <keescook@...omium.org>
Cc: Davidlohr Bueso <davidlohr@...com>
Cc: Chris Metcalf <cmetcalf@...era.com>
Cc: linux-kernel@...r.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@...e.com>
---
 Documentation/kernel-parameters.txt |  8 ++++--
 init/Kconfig                        | 53 ++++++++++++++++++++++++++++++++++++-
 kernel/printk/printk.c              | 32 ++++++++++++++++++++++
 3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 6eaa9cd..229d031 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1685,8 +1685,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			7 (KERN_DEBUG)		debug-level messages
 
 	log_buf_len=n[KMG]	Sets the size of the printk ring buffer,
-			in bytes.  n must be a power of two.  The default
-			size is set in the kernel config file.
+			in bytes.  n must be a power of two and greater
+			than the minimal size. The minimal size is defined
+			by LOG_BUF_SHIFT kernel config parameter. There is
+			also CONFIG_LOG_CPU_MIN_BUF_SHIFT config parameter
+			that allows to increase the default size depending on
+			the number of CPUs. See init/Kconfig for more details.
 
 	logo.nologo	[FB] Disables display of the built-in Linux logo.
 			This may be used to provide more screen space for
diff --git a/init/Kconfig b/init/Kconfig
index 9d76b99..69bdbcf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -807,7 +807,11 @@ config LOG_BUF_SHIFT
 	range 12 21
 	default 17
 	help
-	  Select kernel log buffer size as a power of 2.
+	  Select the minimal kernel log buffer size as a power of 2.
+	  The final size is affected by LOG_CPU_MIN_BUF_SHIFT config
+	  parameter, see below. Any higher size also might be forced
+	  by "log_buf_len" boot parameter.
+
 	  Examples:
 	  	     17 => 128 KB
 		     16 => 64 KB
@@ -816,6 +820,53 @@ config LOG_BUF_SHIFT
 		     13 =>  8 KB
 		     12 =>  4 KB
 
+config LOG_CPU_MIN_BUF_SHIFT
+	int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
+	range 0 21
+	default 12
+	depends on SMP
+	depends on !BASE_SMALL
+	help
+	  The kernel ring buffer will get additional data logged onto it
+	  when multiple CPUs are supported. Typically the contributions are
+	  only a few lines when idle however under under load this can vary
+	  and in the worst case it can mean losing logging information. You
+	  can use this to set the maximum expected mount of amount of logging
+	  contribution under load by each CPU in the worst case scenario, as
+	  a power of 2. The total amount of contributions made by each CPU
+	  must be greater than half of the default kernel ring buffer size
+	  ((1 << LOG_BUF_SHIFT / 2 bytes)) in order to trigger an increase upon
+	  bootup. If an increase is required the ring buffer is increated to
+	  the next power of 2 that can fit both the minimum kernel ring buffer
+	  (LOG_BUF_SHIFT) plus the additional worst case CPU contributions.
+	  For example if LOG_BUF_SHIFT is 18 (256 KB) you'd require at laest
+	  128 KB contributions by other CPUs in order to trigger an increase.
+	  With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything
+	  over > 64 possible CPUs to trigger an increase. If you had 128
+	  possible CPUs the new minimum required kernel ring buffer size
+	  would be:
+
+	     ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
+
+	  Since we only allow powers of two for the kernel ring buffer size the
+	  new kernel ring buffer size would be 1024 KB.
+
+	  CPU contributions are ignored when "log_buf_len" kernel parameter is
+	  used as it forces an exact (power of two) size of the ring buffer to
+	  an expected value.
+
+	  The number of possible CPUs is used for this computation ignoring
+	  hotplugging making the compuation optimal for the the worst case
+	  scenerio while allowing a simple algorithm to be used from bootup.
+
+	  Examples shift values and their meaning:
+		     17 => 128 KB for each CPU
+		     16 =>  64 KB for each CPU
+		     15 =>  32 KB for each CPU
+		     14 =>  16 KB for each CPU
+		     13 =>   8 KB for each CPU
+		     12 =>   4 KB for each CPU
+
 #
 # Architectures with an unreliable sched_clock() should select this:
 #
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index af164a7..7c7b599 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -266,6 +266,7 @@ static u32 clear_idx;
 #define LOG_ALIGN __alignof__(struct printk_log)
 #endif
 #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
 static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
@@ -848,12 +849,43 @@ static int __init log_buf_len_setup(char *str)
 }
 early_param("log_buf_len", log_buf_len_setup);
 
+static void __init log_buf_add_cpu(void)
+{
+	int cpu_extra;
+
+	/*
+	 * archs should set up cpu_possible_bits properly with
+	 * set_cpu_possible() after setup_arch() but just in
+	 * case lets ensure this is valid. During an early
+	 * call before setup_arch()() this will be 1.
+	 */
+	if (num_possible_cpus() <= 1)
+		return;
+
+	cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
+
+	/* by default this will only continue through for large > 64 CPUs */
+	if (cpu_extra <= __LOG_BUF_LEN / 2)
+		return;
+
+	pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
+	pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
+
+	log_buf_len_update(cpu_extra + __LOG_BUF_LEN);
+}
+
 void __init setup_log_buf(int early)
 {
 	unsigned long flags;
 	char *new_log_buf;
 	int free;
 
+	if (log_buf != __log_buf)
+		return;
+
+	if (!early && !new_log_buf_len)
+		log_buf_add_cpu();
+
 	if (!new_log_buf_len)
 		return;
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ