lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190826193638.6638-6-echron@arista.com>
Date:   Mon, 26 Aug 2019 12:36:33 -0700
From:   Edward Chron <echron@...sta.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     Michal Hocko <mhocko@...e.com>, Roman Gushchin <guro@...com>,
        Johannes Weiner <hannes@...xchg.org>,
        David Rientjes <rientjes@...gle.com>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        Shakeel Butt <shakeelb@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, colona@...sta.com,
        Edward Chron <echron@...sta.com>
Subject: [PATCH 05/10] mm/oom_debug: Add Select Slabs Print

Add OOM Debug code to allow select slab entries to be printed at the
time of an OOM event. Linux has added printing slab entries on an
OOM event, if the amount of memory used by slabs exceeds the amount
of memory used by user processes. This OOM Debug option allows
slab entries of a specified minimum entry size to be printed,
limiting the amount of print output an OOM event generates for slab
entries.

Configuring this OOM Debug Option (DEBUG_OOM_SLAB_SELECT_PRINT)
---------------------------------------------------------------
To configure this OOM debug option it needs to be configured
in the OOM Debugging configure menu. The kernel configuration entry
can be found in the config at: Kernel hacking, Memory Debugging,
OOM Debugging the DEBUG_OOM_SLAB_SELECT_PRINT config entry.

Two dynamic OOM debug settings for this option: enable, tenthpercent
--------------------------------------------------------------------
The oom debugfs base directory is found at: /sys/kernel/debug/oom.
The oom debugfs for this option is: slab_select_print_
and for select options there are two files, the enable file and
the tenthpercent file are the debugfs files.

Dynamic disable or re-enable this OOM Debug option
--------------------------------------------------
This option may be disabled or re-enabled using the debugfs entry for
this OOM debug option. The debugfs file to enable this entry is found
at: /sys/kernel/debug/oom/slab_select_print_enabled where the enabled
file's value determines whether the facility is enabled or disabled.
A value of 1 is enabled (default) and a value of 0 is disabled. The
default if configured is enabled.

Specifying the minimum entry size (0-1000) in the tenthpercent file
-------------------------------------------------------------------
Also for DEBUG_OOM_SLAB_SELECT_PRINT the number of slab entries printed is
adjustable. By default if the DEBUG_OOM_SLAB_SELECT_PRINT config option
is enabled entries that use 1% or more of memory are printed. This can be
adjusted to be entries as small as 0% of memory or as large as 100% of
memory in which case only a summary line is printed, as no slab entry
could possibly use 100% of memory. Adjustments are made in the debugfs
file found at: /sys/kernel/debug/oom/slab_select_print_tenthpercent
Entry values that are valid are 0 through 1000 which represent memory
usage of 0% of memory to 100% of memory. A value of of 0 prints all
slabs that have at least one slab in use, unused slabs are not printed.

Content of Slab Summary Records Output
--------------------------------------
Additional output consists of summary information that is printed
at the end of the output. This summary information includes:
  - # entries examined
  - # entries selected and printed
  - minimum entry size for selection
  - Slabs total size (kB)
  - Slabs reclaimable size (kB)
  - Slabs unreclaimable size (kB)

Sample Output
-------------
Output produced consists of the standard output currently produced
by Linux for slab entries plus two lines of summary information.
(The standard output provides a section header and entry per slab)

Summary output (minsize = 0kB, all entries with > 0 slabs in use printed):

Jul 23 23:26:34 yoursystem kernel: Summary: Slab entries examined: 123
 printed: 83 minsize: 0kB

Jul 23 23:26:34 yoursystem kernel: Slabs Total: 151212kB Reclaim: 50632kB
 Unreclaim: 100580kB


Signed-off-by: Edward Chron <echron@...sta.com>
---
 mm/Kconfig.debug    | 30 +++++++++++++++++++++
 mm/oom_kill.c       | 11 +++++++-
 mm/oom_kill_debug.c | 42 +++++++++++++++++++++++++++++
 mm/oom_kill_debug.h |  4 +++
 mm/slab.h           |  4 +++
 mm/slab_common.c    | 65 +++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 155 insertions(+), 1 deletion(-)

diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index fe4bb5ce0a6d..c7d53ca95d32 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -189,3 +189,33 @@ config DEBUG_OOM_ND_TBL
 	  A value of 1 is enabled (default) and a value of 0 is disabled.
 
 	  If unsure, say N.
+
+config DEBUG_OOM_SLAB_SELECT_PRINT
+	bool "Debug OOM Select Slabs Print"
+	depends on DEBUG_OOM
+	help
+	  When enabled, allows the number of unreclaimable slab entries
+	  to be print rate limited based on the amount of memory the
+	  slab entry is consuming. By default all slab entries with more
+	  than one object in use are printed if the trigger condition is
+	  met to dump slab entries.
+
+	  If the option is configured it is enabled/disabled by setting
+	  the value of the file entry in the debugfs OOM interface at:
+	  /sys/kernel/debug/oom/slab_select_print_enabled
+	  A value of 1 is enabled (default) and a value of 0 is disabled.
+
+	  When enabled entries are print limited by the amount of memory
+	  they consume. The setting value defines the minimum memory
+	  size consumed and are represented in tenths of a percent.
+	  Values supported are 0 to 1000 where 0 allows all entries to be
+	  printed, 1 would allow entries using 0.1% or more to be printed,
+	  10 would allow entries using 1% or more of memory to be printed.
+
+	  If configured and enabled the rate limiting memory percentage
+	  is specified by setting a value in the debugfs OOM interface at:
+	  /sys/kernel/debug/oom/slab_select_print_tenthpercent
+	  If configured the default settings are set to enabled and
+	  print limit value of 10 or 1% of memory.
+
+	  If unsure, say N.
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index c10d61fe944f..9022297fa2ba 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -438,6 +438,15 @@ static void dump_tasks(struct oom_control *oc)
 	}
 }
 
+static void oom_kill_unreclaimable_slabs_print(void)
+{
+#ifdef CONFIG_DEBUG_OOM_SLAB_SELECT_PRINT
+	if (oom_kill_debug_unreclaimable_slabs_print())
+		return;
+#endif
+	dump_unreclaimable_slab();
+}
+
 static void dump_oom_summary(struct oom_control *oc, struct task_struct *victim)
 {
 	/* one line summary of the oom killer context. */
@@ -464,7 +473,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
 	else {
 		show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
 		if (is_dump_unreclaim_slabs())
-			dump_unreclaimable_slab();
+			oom_kill_unreclaimable_slabs_print();
 	}
 #ifdef CONFIG_DEBUG_OOM
 	oom_kill_debug_oom_event_is();
diff --git a/mm/oom_kill_debug.c b/mm/oom_kill_debug.c
index c4a9117633fd..2b5245e1134d 100644
--- a/mm/oom_kill_debug.c
+++ b/mm/oom_kill_debug.c
@@ -165,6 +165,9 @@
 #if defined(CONFIG_DEBUG_OOM_ARP_TBL) || defined(CONFIG_DEBUG_OOM_ND_TBL)
 #include <net/neighbour.h>
 #endif
+#ifdef CONFIG_DEBUG_OOM_SLAB_SELECT_PRINT
+#include "slab.h"
+#endif
 
 #define OOMD_MAX_FNAME 48
 #define OOMD_MAX_OPTNAME 32
@@ -214,6 +217,12 @@ static struct oom_debug_option oom_debug_options_table[] = {
 		.option_name	= "nd_table_summary_",
 		.support_tpercent = false,
 	},
+#endif
+#ifdef CONFIG_DEBUG_OOM_SLAB_SELECT_PRINT
+	{
+		.option_name	= "slab_select_print_",
+		.support_tpercent = true,
+	},
 #endif
 	{}
 };
@@ -231,6 +240,9 @@ enum oom_debug_options_index {
 #endif
 #ifdef CONFIG_DEBUG_OOM_ND_TBL
 	ND_STATE,
+#endif
+#ifdef CONFIG_DEBUG_OOM_SLAB_SELECT_PRINT
+	SELECT_SLABS_STATE,
 #endif
 	OUT_OF_BOUNDS
 };
@@ -361,6 +373,36 @@ static void oom_kill_debug_system_summary_prt(void)
 }
 #endif /* CONFIG_DEBUG_OOM_SYSTEM_STATE */
 
+#ifdef CONFIG_DEBUG_OOM_SLAB_SELECT_PRINT
+static inline u16 oom_kill_debug_slabs_tenthpercent(void)
+{
+	return oom_kill_debug_tenthpercent(SELECT_SLABS_STATE);
+}
+
+static void oom_kill_debug_slabs_and_summary_print(void)
+{
+	u16 pcttenth = oom_kill_debug_slabs_tenthpercent();
+	unsigned long minkb = (K(totalram_pages()) * pcttenth) / 1000;
+
+	slab_common_oom_debug_dump_unreclaimable(minkb);
+
+	pr_info("Slabs Total: %lukB Reclaim: %lukB Unreclaim: %lukB\n",
+		K((global_node_page_state(NR_SLAB_RECLAIMABLE) +
+		   global_node_page_state(NR_SLAB_UNRECLAIMABLE))),
+		K(global_node_page_state(NR_SLAB_RECLAIMABLE)),
+		K(global_node_page_state(NR_SLAB_UNRECLAIMABLE)));
+}
+
+bool oom_kill_debug_unreclaimable_slabs_print(void)
+{
+	if (oom_kill_debug_enabled(SELECT_SLABS_STATE)) {
+		oom_kill_debug_slabs_and_summary_print();
+		return true;
+	}
+	return false;
+}
+#endif /* CONFIG_DEBUG_OOM_SLAB_SELECT_PRINT */
+
 #ifdef CONFIG_DEBUG_OOM_TASKS_SUMMARY
 static void oom_kill_debug_tasks_summary_print(void)
 {
diff --git a/mm/oom_kill_debug.h b/mm/oom_kill_debug.h
index 7288969db9ce..549b8da179d0 100644
--- a/mm/oom_kill_debug.h
+++ b/mm/oom_kill_debug.h
@@ -9,6 +9,10 @@
 #ifndef __MM_OOM_KILL_DEBUG_H__
 #define __MM_OOM_KILL_DEBUG_H__
 
+#ifdef CONFIG_DEBUG_OOM_SLAB_SELECT_PRINT
+extern bool oom_kill_debug_unreclaimable_slabs_print(void);
+#endif
+
 extern u32 oom_kill_debug_oom_event_is(void);
 extern u32 oom_kill_debug_event(void);
 extern bool oom_kill_debug_enabled(u16 index);
diff --git a/mm/slab.h b/mm/slab.h
index 9057b8056b07..703e914efedc 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -586,10 +586,14 @@ int memcg_slab_show(struct seq_file *m, void *p);
 
 #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG)
 void dump_unreclaimable_slab(void);
+void slab_common_oom_debug_dump_unreclaimable(unsigned long minkb);
 #else
 static inline void dump_unreclaimable_slab(void)
 {
 }
+static inline void slab_common_oom_debug_dump_unreclaimable(unsigned long minkb)
+{
+}
 #endif
 
 void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 807490fe217a..9ddc95040b60 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1450,6 +1450,71 @@ void dump_unreclaimable_slab(void)
 	mutex_unlock(&slab_mutex);
 }
 
+#ifdef CONFIG_DEBUG_OOM_SLAB_SELECT_PRINT
+static void oom_debug_slab_header_print(void)
+{
+	pr_info("Unreclaimable slab info:\n");
+	pr_info("Name                      Used          Total\n");
+}
+
+static void oom_debug_slab_print(struct slabinfo *psi, struct kmem_cache *pkc)
+{
+	pr_info("%-17s %10luKB %10luKB\n", cache_name(pkc),
+		(psi->active_objs * pkc->size) / 1024,
+		(psi->num_objs * pkc->size) / 1024);
+}
+
+static bool oom_debug_slab_check(struct slabinfo *psi, struct kmem_cache *pkc,
+				 unsigned long min_kb)
+{
+	if (psi->num_objs > 0) {
+		if (((psi->active_objs * pkc->size) / 1024) >= min_kb) {
+			oom_debug_slab_print(psi, pkc);
+			return true;
+		}
+	}
+	return false;
+}
+
+void slab_common_oom_debug_dump_unreclaimable(unsigned long minkb)
+{
+	struct kmem_cache *s, *s2;
+	struct slabinfo sinfo;
+	u32 slabs_examined = 0;
+	u32 slabs_printed = 0;
+
+	/*
+	 * Here acquiring slab_mutex is risky since we don't prefer to get
+	 * sleep in oom path. But, without mutex hold, it may introduce a
+	 * risk of crash.
+	 * Use mutex_trylock to protect the list traverse, dump nothing
+	 * without acquiring the mutex.
+	 */
+	if (!mutex_trylock(&slab_mutex)) {
+		pr_warn("excessive unreclaimable slab but cannot dump stats\n");
+		return;
+	}
+
+	oom_debug_slab_header_print();
+
+	list_for_each_entry_safe(s, s2, &slab_caches, list) {
+		if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
+			continue;
+
+		get_slabinfo(s, &sinfo);
+
+		++slabs_examined;
+
+		if (oom_debug_slab_check(&sinfo, s, minkb))
+			++slabs_printed;
+	}
+	mutex_unlock(&slab_mutex);
+
+	pr_info("Summary: Slab entries examined:%u printed:%u minsize:%lukB\n",
+		slabs_examined, slabs_printed, minkb);
+}
+#endif  /* CONFIG_DEBUG_OOM_SLAB_SELECT_PRINT */
+
 #if defined(CONFIG_MEMCG)
 void *memcg_slab_start(struct seq_file *m, loff_t *pos)
 {
-- 
2.20.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ