lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 06 Jan 2017 13:20:33 +0100
From:   Mike Galbraith <umgwanakikbuti@...il.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-rt-users <linux-rt-users@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [rfc patch-rt] radix-tree: Partially disable memcg accounting
 in radix_tree_node_alloc()

On Fri, 2017-01-06 at 11:52 +0100, Mike Galbraith wrote:
> On Fri, 2017-01-06 at 09:55 +0100, Michal Hocko wrote:
> > On Fri 06-01-17 09:13:23, Mike Galbraith wrote:
> > > radix-tree: Partially disable memcg accounting in radix_tree_node_alloc()
> > > 
> > > Having no preload, which turns accounting off for non-rt kernels, trying to
> > > allocate coming from shmem_fault() when memcg is full sends us scurrying off
> > > to pagefault_out_of_memory(), with dramatic (usually terminal) consequences.
> > > LTP's madvise06 testcase triggers this quite well, and per gitk, the below
> > > was the beginning of RT memcg woes.
> > > 
> > > 58e698af4c63 radix-tree: account radix_tree_node to memory cgroup
> > > 
> > > Turn memcg accounting off for RT in the problematic path.
> > 
> > I am really wondering why this is RT specific and the non RT kernels
> > doesn't have any problem.
> 
> For all I know, there may be a scenario for non-RT to explode, but the
> madvise06 testcase that thoroughly nails RT ain't it.

Unless you twiddle/apply the RT tree radix-tree patch.  So (as rashly
presumed), memcg woes are RT specific because RT disabled the preload
business.  madvise06 isn't as deadly to the twiddled PREEMPT kernel as
it is to PREEMPT_RT_FULL, but a very few runs attracted the oom beast.

('course there still may be a non-RT danger path lurking.. dunno)

[   81.376673] madvise06 invoked oom-killer: gfp_mask=0x0(), nodemask=0, order=0, oom_score_adj=-1000
[   81.376676] madvise06 cpuset=/ mems_allowed=0
[   81.376680] CPU: 5 PID: 4018 Comm: madvise06 Tainted: G            E   4.10.0-preempt #31
[   81.376681] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[   81.376682] Call Trace:
[   81.376687]  ? dump_stack+0x5c/0x7e
[   81.376690]  ? dump_header+0x7f/0x241
[   81.376692]  ? __do_fault+0x1d/0x70
[   81.376693]  ? handle_mm_fault+0x3f5/0xfe0
[   81.376696]  ? oom_kill_process+0x225/0x3f0
[   81.376697]  ? oom_badness+0x70/0x180
[   81.376699]  ? out_of_memory+0x103/0x4a0
[   81.376700]  ? pagefault_out_of_memory+0x43/0x60
[   81.376703]  ? do_page_fault+0x2b/0x70
[   81.376705]  ? page_fault+0x28/0x30

From: Thomas Gleixner <tglx@...utronix.de>
Date: Sun, 17 Jul 2011 21:33:18 +0200
Subject: radix-tree: Make RT aware

Disable radix_tree_preload() on -RT. This functions returns with
preemption disabled which may cause high latencies and breaks if the
user tries to grab any locks after invoking it.

Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
---
 include/linux/radix-tree.h |   18 +++++++++++++++++-
 lib/radix-tree.c           |    5 ++++-
 2 files changed, 21 insertions(+), 2 deletions(-)

--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -318,9 +318,24 @@ unsigned int radix_tree_gang_lookup(stru
 unsigned int radix_tree_gang_lookup_slot(struct radix_tree_root *root,
 			void ***results, unsigned long *indices,
 			unsigned long first_index, unsigned int max_items);
+#ifdef CONFIG_PREEMPT
+static inline int radix_tree_preload(gfp_t gm) { return 0; }
+static inline int radix_tree_maybe_preload(gfp_t gfp_mask) { return 0; }
+static inline int radix_tree_maybe_preload_order(gfp_t gfp_mask, int order)
+{
+	return 0;
+}
+
+static inline int radix_tree_split_preload(unsigned old_order, unsigned new_order, gfp_t gfp_mask)
+{
+	return 0;
+}
+#else
 int radix_tree_preload(gfp_t gfp_mask);
 int radix_tree_maybe_preload(gfp_t gfp_mask);
 int radix_tree_maybe_preload_order(gfp_t gfp_mask, int order);
+int radix_tree_split_preload(unsigned old_order, unsigned new_order, gfp_t gfp_mask);
+#endif
 void radix_tree_init(void);
 void *radix_tree_tag_set(struct radix_tree_root *root,
 			unsigned long index, unsigned int tag);
@@ -342,10 +357,11 @@ int radix_tree_tagged(struct radix_tree_
 
 static inline void radix_tree_preload_end(void)
 {
+#ifndef CONFIG_PREEMPT
 	preempt_enable();
+#endif
 }
 
-int radix_tree_split_preload(unsigned old_order, unsigned new_order, gfp_t);
 int radix_tree_split(struct radix_tree_root *, unsigned long index,
 			unsigned new_order);
 int radix_tree_join(struct radix_tree_root *, unsigned long index,
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -318,13 +318,14 @@ radix_tree_node_alloc(struct radix_tree_
 		 * succeed in getting a node here (and never reach
 		 * kmem_cache_alloc)
 		 */
-		rtp = this_cpu_ptr(&radix_tree_preloads);
+		rtp = &get_cpu_var(radix_tree_preloads);
 		if (rtp->nr) {
 			ret = rtp->nodes;
 			rtp->nodes = ret->private_data;
 			ret->private_data = NULL;
 			rtp->nr--;
 		}
+		put_cpu_var(radix_tree_preloads);
 		/*
 		 * Update the allocation stack trace as this is more useful
 		 * for debugging.
@@ -368,6 +369,7 @@ radix_tree_node_free(struct radix_tree_n
 	call_rcu(&node->rcu_head, radix_tree_node_rcu_free);
 }
 
+#ifndef CONFIG_PREEMPT
 /*
  * Load up this CPU's radix_tree_node buffer with sufficient objects to
  * ensure that the addition of a single element in the tree cannot fail.  On
@@ -509,6 +511,7 @@ int radix_tree_maybe_preload_order(gfp_t
 
 	return __radix_tree_preload(gfp_mask, nr_nodes);
 }
+#endif
 
 static unsigned radix_tree_load_root(struct radix_tree_root *root,
 		struct radix_tree_node **nodep, unsigned long *maxindex)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ