lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+G9fYvC_+Gmab=b5pJW+5m1cR=JnLKLuMfxh3p-j8XKBzseSA@mail.gmail.com>
Date:   Thu, 2 Jul 2020 12:22:03 +0530
From:   Naresh Kamboju <naresh.kamboju@...aro.org>
To:     Roman Gushchin <guro@...com>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-mm <linux-mm@...ck.org>,
        open list <linux-kernel@...r.kernel.org>,
        lkft-triage@...ts.linaro.org, Chris Down <chris@...isdown.name>
Subject: Re: BUG: Bad page state in process - page dumped because: page still
 charged to cgroup

On Thu, 2 Jul 2020 at 00:16, Roman Gushchin <guro@...com> wrote:
>
> On Wed, Jul 01, 2020 at 10:29:04AM +0200, Michal Hocko wrote:
> > Smells like a different observable problem with the same/similar culprit
> > as http://lkml.kernel.org/r/CA+G9fYtrgF_EZHi0vi+HyWiXT5LGggDhVXtNspc=OzzFhL=xRQ@mail.gmail.com
> >
> > On Wed 01-07-20 13:48:57, Naresh Kamboju wrote:
> > > While running LTP mm test suite on x86_64 device the BUG: Bad page
> > > state in process
> > > noticed on linux-next 20200630 tag.
> > >
> > > Steps to reproduce:
> > > - boot linux-next 20200630 kernel on x86_64 device
> > > - cd /opt/ltp
> > > - ./runltp -f mm
> > >
> > > metadata:
> > >   git branch: master
> > >   git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> > >   git commit: f2b92b14533e646e434523abdbafddb727c23898
> > >   git describe: next-20200630
> > >   kernel-config:
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.tuxbuild.com_j60yrp7CUpq3LCmqMB8Wdg_kernel.config&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=h_KJ0e7abuh0BK2eDlDmWnAxqHPccpqchPgBS-oJcVE&s=qofg2XRToTeHvi8vSdOvDPtKpJsUqf3IWfqwieZqITg&e=
> > >
> > > Test crash dump:
> > > [  803.905169] Node 0 Normal: 2608*4kB (UMEH) 1380*8kB (UMEH) 64*16kB
> > > (MEH) 28*32kB (MEH) 13*64kB (UMEH) 164*128kB (UMEH) 39*256kB (UE)
> > > 1*512kB (M) 1*1024kB (M) 1*2048kB (M) 1*4096kB (M) = 62880kB
> > > [  803.922375] Node 0 hugepages_total=0 hugepages_free=0
> > > hugepages_surp=0 hugepages_size=2048kB
> > > [  803.930806] 2418 total pagecache pages
> > > [  803.934559] 0 pages in swap cache
> > > [  803.937878] Swap cache stats: add 0, delete 0, find 0/0
> > > [  803.943108] Free swap  = 0kB
> > > [  803.945997] Total swap = 0kB
> > > [  803.948885] 4181245 pages RAM
> > > [  803.951857] 0 pages HighMem/MovableOnly
> > > [  803.955695] 626062 pages reserved
> > > [  803.959016] Tasks state (memory values in pages):
> > > [  803.963722] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes
> > > swapents oom_score_adj name
> > > [  803.972336] [    332]     0   332     8529      507   106496
> > > 0             0 systemd-journal
> > > [  803.981387] [    349]     0   349    10730      508   118784
> > > 0         -1000 systemd-udevd
> > > [  803.990262] [    371]   993   371     8666      108   118784
> > > 0             0 systemd-network
> > > [  803.999306] [    379]   992   379     9529       99   110592
> > > 0             0 systemd-resolve
> > > [  804.008347] [    388]     0   388     2112       19    61440
> > > 0             0 syslogd
> > > [  804.016709] [    389]   995   389     9308      108   122880
> > > 0             0 avahi-daemon
> > > [  804.025517] [    391]     0   391     1075       21    57344
> > > 0             0 acpid
> > > [  804.033695] [    394]   995   394     9277       68   114688
> > > 0             0 avahi-daemon
> > > [  804.042476] [    396]   996   396     7241      154   102400
> > > 0          -900 dbus-daemon
> > > [  804.051170] [    397]     0   397     2313       72    65536
> > > 0             0 crond
> > > [  804.059349] [    399]     0   399    34025      161   167936
> > > 0             0 thermald
> > > [  804.067783] [    400]     0   400     8615      115   110592
> > > 0             0 systemd-logind
> > > [  804.076734] [    401]     0   401     2112       32    57344
> > > 0             0 klogd
> > > [  804.084907] [    449] 65534   449     3245       39    69632
> > > 0             0 dnsmasq
> > > [  804.093254] [    450]     0   450     3187       33    73728
> > > 0             0 agetty
> > > [  804.101541] [    452]     0   452     3187       33    73728
> > > 0             0 agetty
> > > [  804.109826] [    453]     0   453    14707      107   159744
> > > 0             0 login
> > > [  804.118007] [    463]     0   463     9532      163   122880
> > > 0             0 systemd
> > > [  804.126362] [    464]     0   464    16132      424   172032
> > > 0             0 (sd-pam)
> > > [  804.134803] [    468]     0   468     4538      105    81920
> > > 0             0 sh
> > > [  804.142741] [    472]     0   472    11102       83   131072
> > > 0             0 su
> > > [  804.150680] [    473]     0   473     4538       99    81920
> > > 0             0 sh
> > > [  804.158637] [    519]     0   519     2396       57    61440
> > > 0             0 lava-test-runne
> > > [  804.167700] [   1220]     0  1220     2396       52    61440
> > > 0             0 lava-test-shell
> > > [  804.176738] [   1221]     0  1221     2396       55    61440
> > > 0             0 sh
> > > [  804.184680] [   1223]     0  1223     2462      135    61440
> > > 0             0 ltp.sh
> > > [  804.192946] [   1242]     0  1242     2462      134    61440
> > > 0             0 ltp.sh
> > > [  804.201207] [   1243]     0  1243     2462      134    61440
> > > 0             0 ltp.sh
> > > [  804.209475] [   1244]     0  1244     2462      134    61440
> > > 0             0 ltp.sh
> > > [  804.217742] [   1245]     0  1245     2561      229    65536
> > > 0             0 runltp
> > > [  804.226010] [   1246]     0  1246     1072       15    53248
> > > 0             0 tee
> > > [  804.234012] [   1313]     0  1313     1070       29    53248
> > > 0             0 ltp-pan
> > > [  804.242374] [   3216]     0  3216     1613       20    53248
> > > 0             0 oom01
> > > [  804.250554] [   3217]     0  3217     1646       31    57344
> > > 0             0 oom01
> > > [  804.258728] [   3245]     0  3245    81271      469   266240
> > > 0             0 NetworkManager
> > > [  804.267688] [   3249]     0  3249     6422       54    98304
> > > 0             0 systemd-hostnam
> > > [  804.276734] [   3250]     0  3250    52976      178   172032
> > > 0             0 nm-dispatcher
> > > [  804.285603] [   3254]   998  3254   131113      828   245760
> > > 0             0 polkitd
> > > [  804.293956] [   3261]     0  3261  4726385  3349389 26939392
> > > 0             0 oom01
> > > [  804.302129] [   3265]     0  3265     3187       33    73728
> > > 0             0 agetty
> > > [  804.310397] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=oom01,pid=3261,uid=0
> > > [  804.322751] Out of memory: Killed process 3261 (oom01)
> > > total-vm:18905540kB, anon-rss:13397556kB, file-rss:0kB, shmem-rss:0kB,
> > > UID:0 pgtables:26308kB oom_score_adj:0
> > > [  806.652952] oom_reaper: reaped process 3261 (oom01), now
> > > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > > [  807.579373] BUG: Bad page state in process kworker/u8:12  pfn:374308
> > > [  807.579521] BUG: Bad page state in process kworker/u8:13  pfn:4182a4
> > > [  807.585734] page:ffffea000dd0c200 refcount:0 mapcount:0
> > > mapping:0000000000000000 index:0xffff88837430e000
> > > head:ffffea000dd0c200 order:3 compound_mapcount:0 compound_pincount:0
> > > [  807.585736] flags: 0x200000000010000(head)
> > > [  807.585740] raw: 0200000000010000 ffffea000dce6e00 0000000200000002
> > > 0000000000000000
> > > [  807.592099] page:ffffea001060a900 refcount:0 mapcount:0
> > > mapping:0000000000000000 index:0xffff8884182a5e00
> > > head:ffffea001060a900 order:1 compound_mapcount:0
> > > [  807.607719] raw: ffff88837430e000 0000000000040000 00000000ffffffff
> > > ffff8883bda6cac1
> > > [  807.607720] page dumped because: page still charged to cgroup
> > > [  807.607720] page->mem_cgroup:ffff8883bda6cac1
> > > [  807.607721] Modules linked in: x86_pkg_temp_thermal
> > > [  807.607725] CPU: 0 PID: 3242 Comm: kworker/u8:12 Not tainted
> > > 5.8.0-rc3-next-20200630 #1
> > > [  807.607727] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> > > 2.0b 07/27/2017
> > > [  807.607731] Workqueue: rpciod rpc_async_schedule
> > > [  807.611836] flags: 0x200000000010000(head)
> > > [  807.619563] Call Trace:
> > > [  807.619567]  dump_stack+0x84/0xba
> > > [  807.619569]  bad_page.cold+0x7b/0xac
> > > [  807.619573]  __free_pages_ok+0x95b/0xab0
> > > [  807.633461] raw: 0200000000010000 dead000000000100 dead000000000122
> > > 0000000000000000
> > > [  807.641189]  __free_pages+0x42/0x50
> > > [  807.641191]  __free_slab+0xcd/0x1f0
>
> Hm, interesting, it means that page->obj_cgroups is still set.
> But before __free_pages() __free_slab() always calls uncharge_slab_page(),
> which sets page->obj_cgroups to NULL except when !memcg_kmem_enabled().
>
> So it makes me think that somehow memcg_kmem_enabled() became false
> after being true, which can cause refcounting problems as well.
>
> Naresh, can you, please, check if the following patch solves problems?
> And thank you for reporting the problem!
>

Thanks for providing a quick fix patch.

>
> From c97afecd32c0db5e024be9ba72f43d22974f5bcd Mon Sep 17 00:00:00 2001
> From: Roman Gushchin <guro@...com>
> Date: Wed, 1 Jul 2020 11:05:32 -0700
> Subject: [PATCH] mm: kmem: make memcg_kmem_enabled() irreversible
>
> Historically the kernel memory accounting was an opt-in feature, which
> could be enabled for individual cgroups. But now it's not true, and
> it's on by default both on cgroup v1 and cgroup v2.  And as long as a
> user has at least one non-root memory cgroup, the kernel memory
> accounting is on. So in most setups it's either always on (if memory
> cgroups are in use and kmem accounting is not disabled), either always
> off (otherwise).
>
> memcg_kmem_enabled() is used in many places to guard the kernel memory
> accounting code. If memcg_kmem_enabled() can reverse from returning
> true to returning false (as now), we can't rely on it on release paths
> and have to check if it was on before.
>
> If we'll make memcg_kmem_enabled() irreversible (always returning true
> after returning it for the first time), it'll make the general logic
> more simple and robust. It also will allow to guard some checks which
> otherwise would stay unguarded.
>
> Signed-off-by: Roman Gushchin <guro@...com>
Reported-by: Naresh Kamboju <naresh.kamboju@...aro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@...aro.org>

The below patch applied on top of linux-next 20200701and tested and
the reported bug got fixed on arm64, arm, x86_64 and i386.

> ---
>  mm/memcontrol.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 50ae77f3985e..2d018a51c941 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3582,7 +3582,8 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
>         objcg->memcg = memcg;
>         rcu_assign_pointer(memcg->objcg, objcg);
>
> -       static_branch_inc(&memcg_kmem_enabled_key);
> +       if (!memcg_kmem_enabled())
> +               static_branch_inc(&memcg_kmem_enabled_key);
>         /*
>          * A memory cgroup is considered kmem-online as soon as it gets
>          * kmemcg_id. Setting the id after enabling static branching will
> @@ -3643,9 +3644,6 @@ static void memcg_free_kmem(struct mem_cgroup *memcg)
>         /* css_alloc() failed, offlining didn't happen */
>         if (unlikely(memcg->kmem_state == KMEM_ONLINE))
>                 memcg_offline_kmem(memcg);
> -
> -       if (memcg->kmem_state == KMEM_ALLOCATED)
> -               static_branch_dec(&memcg_kmem_enabled_key);
>  }
>  #else
>  static int memcg_online_kmem(struct mem_cgroup *memcg)

- Naresh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ