linux-kernel - [PATCH memcg 0/1] false global OOM triggered by memcg-limited task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <9d10df01-0127-fb40-81c3-cc53c9733c3e@virtuozzo.com>
Date:   Mon, 18 Oct 2021 11:13:52 +0300
From:   Vasily Averin <vvs@...tuozzo.com>
To:     Michal Hocko <mhocko@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     Roman Gushchin <guro@...com>, Uladzislau Rezki <urezki@...il.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Shakeel Butt <shakeelb@...gle.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        cgroups@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, kernel@...nvz.org
Subject: [PATCH memcg 0/1] false global OOM triggered by memcg-limited task

While checking the patches fixed broken memcg accounting in vmalloc I found
another issue: a false global OOM triggered by memcg-limited user space task.

I executed vmalloc-eater inside a memcg limited LXC container in a loop, checked
that it does not consume host memory beyond the assigned limit, triggers memcg OOM
and generates "Memory cgroup out of memory" messages.  Everything was as expected.

However I was surprised to find quite rare global OOM messages too.
I set sysctl vm.panic_on_oom to 1, repeated the test and successfully
crashed the node.

Dmesg showed that global OOM was detected on 16 GB node with ~10 GB of free memory.

 syz-executor invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=1000
 CPU: 2 PID: 15307 Comm: syz-executor Kdump: loaded Not tainted 5.15.0-rc4+ #55
 Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.4 04/01/2014
 Call Trace:
  dump_stack_lvl+0x57/0x72
  dump_header+0x4a/0x2c1
  out_of_memory.cold+0xa/0x7e
  pagefault_out_of_memory+0x46/0x60
  exc_page_fault+0x79/0x2b0
  asm_exc_page_fault+0x1e/0x30
...
 Mem-Info:
 Node 0 DMA: 0*4kB 0*8kB <...> = 13296kB
 Node 0 DMA32: 705*4kB (UM) <...> = 2586964kB
 Node 0 Normal: 2743*4kB (UME) <...> = 6904828kB
...
 4095866 pages RAM
...
 Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled

Full dmesg can be found in attached file.

How could this happen?

User-space task inside the memcg-limited container generated a page fault,
its handler do_user_addr_fault() called handle_mm_fault which could not
allocate the page due to exceeding the memcg limit and returned VM_FAULT_OOM.
Then do_user_addr_fault() called pagefault_out_of_memory() which executed
out_of_memory() without set of memcg.

Partially this problem depends on one of my recent patches, disabled unlimited
memory allocation for dying tasks. However I think the problem can happen
on non-killed tasks too, for example because of kmem limit.

At present do_user_addr_fault() does not know why page allocation was failed,
i.e. was it global or memcg OOM. I propose to save this information in new flag
on task_struct. It can be set in case of memcg restrictons in
obj_cgroup_charge_pages() (for memory controller) and in try_charge_memcg() 
(for kmem controller).  Then it can be used in mem_cgroup_oom_synchronize()
called inside pagefault_out_of_memory():
in case of memcg-related restrictions it will not trigger fake global OOM and
returns to user space which will retry the fault or kill the process if it got
a fatal signal.

Thank you,
	Vasily Averin

Vasily Averin (1):
  memcg: prevent false global OOM trigggerd by memcg limited task.

 include/linux/sched.h |  1 +
 mm/memcontrol.c       | 12 +++++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

-- 
2.32.0

View attachment "dmesg-oom.txt" of type "text/plain" (25650 bytes)