lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 29 Oct 2018 17:35:38 +0100
From:   Mike Galbraith <efault@....de>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>, linux-mm <linux-mm@...ck.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Roman Gushchin <guro@...com>
Subject: Re: memcg oops:
 memcg_kmem_charge_memcg()->try_charge()->page_counter_try_charge()->BOOM

On Mon, 2018-10-29 at 14:20 +0100, Michal Hocko wrote:
> 
> > [    4.420976] Code: f3 c3 0f 1f 00 0f 1f 44 00 00 48 85 ff 0f 84 a8 00 00 00 41 56 48 89 f8 41 55 49 89 fe 41 54 49 89 d5 55 49 89 f4 53 48 89 f3 <f0> 48 0f c1 1f 48 01 f3 48 39 5f 18 48 89 fd 73 17 eb 41 48 89 e8
> > [    4.424162] RSP: 0018:ffffb27840c57cb0 EFLAGS: 00010202
> > [    4.425236] RAX: 00000000000000f8 RBX: 0000000000000020 RCX: 0000000000000200
> > [    4.426467] RDX: ffffb27840c57d08 RSI: 0000000000000020 RDI: 00000000000000f8
> > [    4.427652] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffb278410bc000
> > [    4.428883] R10: ffffb27840c57ed0 R11: 0000000000000040 R12: 0000000000000020
> > [    4.430168] R13: ffffb27840c57d08 R14: 00000000000000f8 R15: 00000000006000c0
> > [    4.431411] FS:  00007f79081a3940(0000) GS:ffff92a4b7bc0000(0000) knlGS:0000000000000000
> > [    4.432748] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    4.433836] CR2: 00000000000000f8 CR3: 00000002310ac002 CR4: 00000000001606e0
> > [    4.435500] Call Trace:
> > [    4.436319]  try_charge+0x92/0x7b0
> > [    4.437284]  ? unlazy_walk+0x4c/0xb0
> > [    4.438676]  ? terminate_walk+0x91/0x100
> > [    4.439984]  memcg_kmem_charge_memcg+0x28/0x80
> > [    4.441059]  memcg_kmem_charge+0x88/0x1d0
> > [    4.442105]  copy_process.part.37+0x23a/0x2070
> 
> Could you faddr2line this please?

homer:/usr/local/src/kernel/linux-master # ./scripts/faddr2line vmlinux copy_process.part.37+0x23a
copy_process.part.37+0x23a/0x2070:
memcg_charge_kernel_stack at kernel/fork.c:401
(inlined by) dup_task_struct at kernel/fork.c:850
(inlined by) copy_process at kernel/fork.c:1750

I bisected it this afternoon, and confirmed the result via revert.

9b6f7e163cd0f468d1b9696b785659d3c27c8667 is the first bad commit
commit 9b6f7e163cd0f468d1b9696b785659d3c27c8667
Author: Roman Gushchin <guro@...com>
Date:   Fri Oct 26 15:03:19 2018 -0700

    mm: rework memcg kernel stack accounting
    
    If CONFIG_VMAP_STACK is set, kernel stacks are allocated using
    __vmalloc_node_range() with __GFP_ACCOUNT.  So kernel stack pages are
    charged against corresponding memory cgroups on allocation and uncharged
    on releasing them.
    
    The problem is that we do cache kernel stacks in small per-cpu caches and
    do reuse them for new tasks, which can belong to different memory cgroups.
    
    Each stack page still holds a reference to the original cgroup, so the
    cgroup can't be released until the vmap area is released.
    
    To make this happen we need more than two subsequent exits without forks
    in between on the current cpu, which makes it very unlikely to happen.  As
    a result, I saw a significant number of dying cgroups (in theory, up to 2
    * number_of_cpu + number_of_tasks), which can't be released even by
    significant memory pressure.
    
    As a cgroup structure can take a significant amount of memory (first of
    all, per-cpu data like memcg statistics), it leads to a noticeable waste
    of memory.
    
    Link: http://lkml.kernel.org/r/20180827162621.30187-1-guro@fb.com
    Fixes: ac496bf48d97 ("fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y")
    Signed-off-by: Roman Gushchin <guro@...com>
    Reviewed-by: Shakeel Butt <shakeelb@...gle.com>
    Acked-by: Michal Hocko <mhocko@...nel.org>
    Cc: Johannes Weiner <hannes@...xchg.org>
    Cc: Andy Lutomirski <luto@...nel.org>
    Cc: Konstantin Khlebnikov <koct9i@...il.com>
    Cc: Tejun Heo <tj@...nel.org>
    Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>

:040000 040000 19a916f067fb987c6b15ce04f0e656c590db39dd edde98ce70d28e03f623f86f54887720516fcd91 M      include
:040000 040000 04213da714a8a10580baccd0b0977a6744fa2374 9204198e8eb4043b059f2a4eeaa4e19679fd3ddb M      kernel

git bisect start
# good: [e5f6d9afa3415104e402cd69288bb03f7165eeba] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
git bisect good e5f6d9afa3415104e402cd69288bb03f7165eeba
# bad: [345671ea0f9258f410eb057b9ced9cefbbe5dc78] Merge branch 'akpm' (patches from Andrew)
git bisect bad 345671ea0f9258f410eb057b9ced9cefbbe5dc78
# bad: [ae2b01f37044c10e975d22116755df56252b09d8] mm: remove vm_insert_pfn()
git bisect bad ae2b01f37044c10e975d22116755df56252b09d8
# good: [9703fc8caf36ac65dca1538b23dd137de0b53233] Merge tag 'usb-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect good 9703fc8caf36ac65dca1538b23dd137de0b53233
# good: [bf58e8820c48805394ec9e76339f0c4646050432] nvmem: change the signature of nvmem_unregister()
git bisect good bf58e8820c48805394ec9e76339f0c4646050432
# good: [cccb3b19e762edc8ef0481be506967555cb9e317] nvmem: fix nvmem_cell_get_from_lookup()
git bisect good cccb3b19e762edc8ef0481be506967555cb9e317
# good: [18d0eae30e6a4f8644d589243d7ac1d70d29203d] Merge tag 'char-misc-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
git bisect good 18d0eae30e6a4f8644d589243d7ac1d70d29203d
# bad: [9b6f7e163cd0f468d1b9696b785659d3c27c8667] mm: rework memcg kernel stack accounting
git bisect bad 9b6f7e163cd0f468d1b9696b785659d3c27c8667
# good: [2de24cb742d4f0c41358aa078bed7f089c827ac7] ocfs2: remove unused pointer 'eb'
git bisect good 2de24cb742d4f0c41358aa078bed7f089c827ac7
# good: [5780a02fd1e87641ad6a8dd6891a1e890cf45c5d] fs/iomap.c: change return type to vm_fault_t
git bisect good 5780a02fd1e87641ad6a8dd6891a1e890cf45c5d
# good: [0684e6526edfb4debf0a0a884834bb1a104085dc] mm/slub.c: switch to bitmap_zalloc()
git bisect good 0684e6526edfb4debf0a0a884834bb1a104085dc
# good: [c5fd3ca06b4699e251b4a1fb808c2d5124494101] slub: extend slub debug to handle multiple slabs
git bisect good c5fd3ca06b4699e251b4a1fb808c2d5124494101
# first bad commit: [9b6f7e163cd0f468d1b9696b785659d3c27c8667] mm: rework memcg kernel stack accounting

	-Mike

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ