linux-kernel - Re: OOPSes in mem_cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Jun 2018 21:33:12 -0700
From:   Roman Gushchin <guro@...com>
To:     John Stultz <john.stultz@...aro.org>
CC:     Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...e.com>,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: OOPSes in mem_cgroup_protected

On Tue, Jun 12, 2018 at 09:08:27PM -0700, John Stultz wrote:
> On Tue, Jun 12, 2018 at 6:02 PM, John Stultz <john.stultz@...aro.org> wrote:
> > Hey Tejun,
> >   With the current linus/master, I'm able to fairly regularly trip
> > OOPSes (two examples below) in mem_cgroup_protected(), which seems to
> > be new.  I haven't managed to trigger this sort of thing with v4.17.
> >
> > I've not had much time to dig in or bisect it - I only know that
> > enabling most of the memory debuging config options didn't seem to
> > trip anything prior to the issue. So I wanted to send you a heads up
> > to see if there was already known, or if there was anything you might
> > suggest to help chase this down.
> 
> 
> So the line where we're crashing seems to be in mem_cgroup_protected():
>   parent_emin = READ_ONCE(parent->memory.emin);
> 
> where I'm guessing the parent->memory value is null, and emin is at
> the 0x120 offset in the strucutre.
> 
> Reverting the following commits seems to avoid the issue.
> bf8d5d52ffe8 ("memcg: introduce memory.min")
> 5f93ad67436b ("mm: treat memory.low value inclusive")
> 230671533d64 ("mm: memory.low hierarchical behavior")
> 
> I'm guessing I'm tripping over some path where the memory value never
> gets initialized?
> 
> Any ideas or suggestions?

Hi, John!

The patch below should fix the problem.
It's in the mm tree right now, and hopefully will be merged upstream asap.
Sorry for the inconvenience.

Thanks!

--

>From 276e916d62887b85c35a9d053543bb52b00a81bf Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro@...com>
Date: Wed, 13 Jun 2018 01:01:43 +0000
Subject: [PATCH] mm: fix null pointer dereference in mem_cgroup_protected

Shakeel reported a crash in mem_cgroup_protected(), which can be triggered
by memcg reclaim if the legacy cgroup v1 use_hierarchy=0 mode is used:

[  226.060572] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000120
[  226.068310] PGD 8000001ff55da067 P4D 8000001ff55da067 PUD 1fdc7df067 PMD 0
[  226.075191] Oops: 0000 [#4] SMP PTI
[  226.078637] CPU: 0 PID: 15581 Comm: bash Tainted: G      D
 4.17.0-smp-clean #5
[  226.086635] Hardware name: ...
[  226.094546] RIP: 0010:mem_cgroup_protected+0x54/0x130
[  226.099533] Code: 4c 8b 8e 00 01 00 00 4c 8b 86 08 01 00 00 48 8d
8a 08 ff ff ff 48 85 d2 ba 00 00 00 00 48 0f 44 ca 48 39 c8 0f 84 cf
00 00 00 <48> 8b 81 20 01 00 00 4d 89 ca 4c 39 c8 4c 0f 46 d0 4d 85 d2
74 05
[  226.118194] RSP: 0000:ffffabe64dfafa58 EFLAGS: 00010286
[  226.123358] RAX: ffff9fb6ff03d000 RBX: ffff9fb6f5b1b000 RCX: 0000000000000000
[  226.130406] RDX: 0000000000000000 RSI: ffff9fb6f5b1b000 RDI: ffff9fb6f5b1b000
[  226.137454] RBP: ffffabe64dfafb08 R08: 0000000000000000 R09: 0000000000000000
[  226.144503] R10: 0000000000000000 R11: 000000000000c800 R12: ffffabe64dfafb88
[  226.151551] R13: ffff9fb6f5b1b000 R14: ffffabe64dfafb88 R15: ffff9fb77fffe000
[  226.158602] FS:  00007fed1f8ac700(0000) GS:ffff9fb6ff400000(0000)
knlGS:0000000000000000
[  226.166594] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  226.172270] CR2: 0000000000000120 CR3: 0000001fdcf86003 CR4: 00000000001606f0
[  226.179317] Call Trace:
[  226.181732]  ? shrink_node+0x194/0x510
[  226.185435]  do_try_to_free_pages+0xfd/0x390
[  226.189653]  try_to_free_mem_cgroup_pages+0x123/0x210
[  226.194643]  try_charge+0x19e/0x700
[  226.198088]  mem_cgroup_try_charge+0x10b/0x1a0
[  226.202478]  wp_page_copy+0x134/0x5b0
[  226.206094]  do_wp_page+0x90/0x460
[  226.209453]  __handle_mm_fault+0x8e3/0xf30
[  226.213498]  handle_mm_fault+0xfe/0x220
[  226.217285]  __do_page_fault+0x262/0x500
[  226.221158]  do_page_fault+0x28/0xd0
[  226.224689]  ? page_fault+0x8/0x30
[  226.228048]  page_fault+0x1e/0x30
[  226.231323] RIP: 0033:0x485b72

The problem happens because parent_mem_cgroup() returns a NULL pointer,
which is dereferenced later without a check.

As cgroup v1 has no memory guarantee support, let's make
mem_cgroup_protected() immediately return MEMCG_PROT_NONE, if the given
cgroup has no parent (non-hierarchical mode is used).

Link: http://lkml.kernel.org/r/20180611175418.7007-2-guro@fb.com
Fixes: bf8d5d52ffe8 ("memcg: introduce memory.min")
Signed-off-by: Roman Gushchin <guro@...com>
Reported-by: Shakeel Butt <shakeelb@...gle.com>
Tested-by: Shakeel Butt <shakeelb@...gle.com>
Acked-by: Johannes Weiner <hannes@...xchg.org>
Acked-by: Michal Hocko <mhocko@...nel.org>
Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
---
 mm/memcontrol.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c1e64d60ed02..5a3873e9d657 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5480,6 +5480,10 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
 	elow = memcg->memory.low;
 
 	parent = parent_mem_cgroup(memcg);
+	/* No parent means a non-hierarchical mode on v1 memcg */
+	if (!parent)
+		return MEMCG_PROT_NONE;
+
 	if (parent == root)
 		goto exit;
 
-- 
2.14.4