[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200701184552.GA61684@carbon.DHCP.thefacebook.com>
Date: Wed, 1 Jul 2020 11:45:52 -0700
From: Roman Gushchin <guro@...com>
To: Michal Hocko <mhocko@...nel.org>
CC: Naresh Kamboju <naresh.kamboju@...aro.org>,
Shakeel Butt <shakeelb@...gle.com>,
Johannes Weiner <hannes@...xchg.org>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-mm <linux-mm@...ck.org>,
open list <linux-kernel@...r.kernel.org>,
<lkft-triage@...ts.linaro.org>, Chris Down <chris@...isdown.name>
Subject: Re: BUG: Bad page state in process - page dumped because: page still
charged to cgroup
On Wed, Jul 01, 2020 at 10:29:04AM +0200, Michal Hocko wrote:
> Smells like a different observable problem with the same/similar culprit
> as http://lkml.kernel.org/r/CA+G9fYtrgF_EZHi0vi+HyWiXT5LGggDhVXtNspc=OzzFhL=xRQ@mail.gmail.com
>
> On Wed 01-07-20 13:48:57, Naresh Kamboju wrote:
> > While running LTP mm test suite on x86_64 device the BUG: Bad page
> > state in process
> > noticed on linux-next 20200630 tag.
> >
> > Steps to reproduce:
> > - boot linux-next 20200630 kernel on x86_64 device
> > - cd /opt/ltp
> > - ./runltp -f mm
> >
> > metadata:
> > git branch: master
> > git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> > git commit: f2b92b14533e646e434523abdbafddb727c23898
> > git describe: next-20200630
> > kernel-config:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.tuxbuild.com_j60yrp7CUpq3LCmqMB8Wdg_kernel.config&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=h_KJ0e7abuh0BK2eDlDmWnAxqHPccpqchPgBS-oJcVE&s=qofg2XRToTeHvi8vSdOvDPtKpJsUqf3IWfqwieZqITg&e=
> >
> > Test crash dump:
> > [ 803.905169] Node 0 Normal: 2608*4kB (UMEH) 1380*8kB (UMEH) 64*16kB
> > (MEH) 28*32kB (MEH) 13*64kB (UMEH) 164*128kB (UMEH) 39*256kB (UE)
> > 1*512kB (M) 1*1024kB (M) 1*2048kB (M) 1*4096kB (M) = 62880kB
> > [ 803.922375] Node 0 hugepages_total=0 hugepages_free=0
> > hugepages_surp=0 hugepages_size=2048kB
> > [ 803.930806] 2418 total pagecache pages
> > [ 803.934559] 0 pages in swap cache
> > [ 803.937878] Swap cache stats: add 0, delete 0, find 0/0
> > [ 803.943108] Free swap = 0kB
> > [ 803.945997] Total swap = 0kB
> > [ 803.948885] 4181245 pages RAM
> > [ 803.951857] 0 pages HighMem/MovableOnly
> > [ 803.955695] 626062 pages reserved
> > [ 803.959016] Tasks state (memory values in pages):
> > [ 803.963722] [ pid ] uid tgid total_vm rss pgtables_bytes
> > swapents oom_score_adj name
> > [ 803.972336] [ 332] 0 332 8529 507 106496
> > 0 0 systemd-journal
> > [ 803.981387] [ 349] 0 349 10730 508 118784
> > 0 -1000 systemd-udevd
> > [ 803.990262] [ 371] 993 371 8666 108 118784
> > 0 0 systemd-network
> > [ 803.999306] [ 379] 992 379 9529 99 110592
> > 0 0 systemd-resolve
> > [ 804.008347] [ 388] 0 388 2112 19 61440
> > 0 0 syslogd
> > [ 804.016709] [ 389] 995 389 9308 108 122880
> > 0 0 avahi-daemon
> > [ 804.025517] [ 391] 0 391 1075 21 57344
> > 0 0 acpid
> > [ 804.033695] [ 394] 995 394 9277 68 114688
> > 0 0 avahi-daemon
> > [ 804.042476] [ 396] 996 396 7241 154 102400
> > 0 -900 dbus-daemon
> > [ 804.051170] [ 397] 0 397 2313 72 65536
> > 0 0 crond
> > [ 804.059349] [ 399] 0 399 34025 161 167936
> > 0 0 thermald
> > [ 804.067783] [ 400] 0 400 8615 115 110592
> > 0 0 systemd-logind
> > [ 804.076734] [ 401] 0 401 2112 32 57344
> > 0 0 klogd
> > [ 804.084907] [ 449] 65534 449 3245 39 69632
> > 0 0 dnsmasq
> > [ 804.093254] [ 450] 0 450 3187 33 73728
> > 0 0 agetty
> > [ 804.101541] [ 452] 0 452 3187 33 73728
> > 0 0 agetty
> > [ 804.109826] [ 453] 0 453 14707 107 159744
> > 0 0 login
> > [ 804.118007] [ 463] 0 463 9532 163 122880
> > 0 0 systemd
> > [ 804.126362] [ 464] 0 464 16132 424 172032
> > 0 0 (sd-pam)
> > [ 804.134803] [ 468] 0 468 4538 105 81920
> > 0 0 sh
> > [ 804.142741] [ 472] 0 472 11102 83 131072
> > 0 0 su
> > [ 804.150680] [ 473] 0 473 4538 99 81920
> > 0 0 sh
> > [ 804.158637] [ 519] 0 519 2396 57 61440
> > 0 0 lava-test-runne
> > [ 804.167700] [ 1220] 0 1220 2396 52 61440
> > 0 0 lava-test-shell
> > [ 804.176738] [ 1221] 0 1221 2396 55 61440
> > 0 0 sh
> > [ 804.184680] [ 1223] 0 1223 2462 135 61440
> > 0 0 ltp.sh
> > [ 804.192946] [ 1242] 0 1242 2462 134 61440
> > 0 0 ltp.sh
> > [ 804.201207] [ 1243] 0 1243 2462 134 61440
> > 0 0 ltp.sh
> > [ 804.209475] [ 1244] 0 1244 2462 134 61440
> > 0 0 ltp.sh
> > [ 804.217742] [ 1245] 0 1245 2561 229 65536
> > 0 0 runltp
> > [ 804.226010] [ 1246] 0 1246 1072 15 53248
> > 0 0 tee
> > [ 804.234012] [ 1313] 0 1313 1070 29 53248
> > 0 0 ltp-pan
> > [ 804.242374] [ 3216] 0 3216 1613 20 53248
> > 0 0 oom01
> > [ 804.250554] [ 3217] 0 3217 1646 31 57344
> > 0 0 oom01
> > [ 804.258728] [ 3245] 0 3245 81271 469 266240
> > 0 0 NetworkManager
> > [ 804.267688] [ 3249] 0 3249 6422 54 98304
> > 0 0 systemd-hostnam
> > [ 804.276734] [ 3250] 0 3250 52976 178 172032
> > 0 0 nm-dispatcher
> > [ 804.285603] [ 3254] 998 3254 131113 828 245760
> > 0 0 polkitd
> > [ 804.293956] [ 3261] 0 3261 4726385 3349389 26939392
> > 0 0 oom01
> > [ 804.302129] [ 3265] 0 3265 3187 33 73728
> > 0 0 agetty
> > [ 804.310397] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=oom01,pid=3261,uid=0
> > [ 804.322751] Out of memory: Killed process 3261 (oom01)
> > total-vm:18905540kB, anon-rss:13397556kB, file-rss:0kB, shmem-rss:0kB,
> > UID:0 pgtables:26308kB oom_score_adj:0
> > [ 806.652952] oom_reaper: reaped process 3261 (oom01), now
> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 807.579373] BUG: Bad page state in process kworker/u8:12 pfn:374308
> > [ 807.579521] BUG: Bad page state in process kworker/u8:13 pfn:4182a4
> > [ 807.585734] page:ffffea000dd0c200 refcount:0 mapcount:0
> > mapping:0000000000000000 index:0xffff88837430e000
> > head:ffffea000dd0c200 order:3 compound_mapcount:0 compound_pincount:0
> > [ 807.585736] flags: 0x200000000010000(head)
> > [ 807.585740] raw: 0200000000010000 ffffea000dce6e00 0000000200000002
> > 0000000000000000
> > [ 807.592099] page:ffffea001060a900 refcount:0 mapcount:0
> > mapping:0000000000000000 index:0xffff8884182a5e00
> > head:ffffea001060a900 order:1 compound_mapcount:0
> > [ 807.607719] raw: ffff88837430e000 0000000000040000 00000000ffffffff
> > ffff8883bda6cac1
> > [ 807.607720] page dumped because: page still charged to cgroup
> > [ 807.607720] page->mem_cgroup:ffff8883bda6cac1
> > [ 807.607721] Modules linked in: x86_pkg_temp_thermal
> > [ 807.607725] CPU: 0 PID: 3242 Comm: kworker/u8:12 Not tainted
> > 5.8.0-rc3-next-20200630 #1
> > [ 807.607727] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> > 2.0b 07/27/2017
> > [ 807.607731] Workqueue: rpciod rpc_async_schedule
> > [ 807.611836] flags: 0x200000000010000(head)
> > [ 807.619563] Call Trace:
> > [ 807.619567] dump_stack+0x84/0xba
> > [ 807.619569] bad_page.cold+0x7b/0xac
> > [ 807.619573] __free_pages_ok+0x95b/0xab0
> > [ 807.633461] raw: 0200000000010000 dead000000000100 dead000000000122
> > 0000000000000000
> > [ 807.641189] __free_pages+0x42/0x50
> > [ 807.641191] __free_slab+0xcd/0x1f0
Hm, interesting, it means that page->obj_cgroups is still set.
But before __free_pages() __free_slab() always calls uncharge_slab_page(),
which sets page->obj_cgroups to NULL except when !memcg_kmem_enabled().
So it makes me think that somehow memcg_kmem_enabled() became false
after being true, which can cause refcounting problems as well.
Naresh, can you, please, check if the following patch solves problems?
And thank you for reporting the problem!
--
>From c97afecd32c0db5e024be9ba72f43d22974f5bcd Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro@...com>
Date: Wed, 1 Jul 2020 11:05:32 -0700
Subject: [PATCH] mm: kmem: make memcg_kmem_enabled() irreversible
Historically the kernel memory accounting was an opt-in feature, which
could be enabled for individual cgroups. But now it's not true, and
it's on by default both on cgroup v1 and cgroup v2. And as long as a
user has at least one non-root memory cgroup, the kernel memory
accounting is on. So in most setups it's either always on (if memory
cgroups are in use and kmem accounting is not disabled), either always
off (otherwise).
memcg_kmem_enabled() is used in many places to guard the kernel memory
accounting code. If memcg_kmem_enabled() can reverse from returning
true to returning false (as now), we can't rely on it on release paths
and have to check if it was on before.
If we'll make memcg_kmem_enabled() irreversible (always returning true
after returning it for the first time), it'll make the general logic
more simple and robust. It also will allow to guard some checks which
otherwise would stay unguarded.
Signed-off-by: Roman Gushchin <guro@...com>
---
mm/memcontrol.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 50ae77f3985e..2d018a51c941 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3582,7 +3582,8 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
objcg->memcg = memcg;
rcu_assign_pointer(memcg->objcg, objcg);
- static_branch_inc(&memcg_kmem_enabled_key);
+ if (!memcg_kmem_enabled())
+ static_branch_inc(&memcg_kmem_enabled_key);
/*
* A memory cgroup is considered kmem-online as soon as it gets
* kmemcg_id. Setting the id after enabling static branching will
@@ -3643,9 +3644,6 @@ static void memcg_free_kmem(struct mem_cgroup *memcg)
/* css_alloc() failed, offlining didn't happen */
if (unlikely(memcg->kmem_state == KMEM_ONLINE))
memcg_offline_kmem(memcg);
-
- if (memcg->kmem_state == KMEM_ALLOCATED)
- static_branch_dec(&memcg_kmem_enabled_key);
}
#else
static int memcg_online_kmem(struct mem_cgroup *memcg)
--
2.26.2
Powered by blists - more mailing lists