[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <9d10df01-0127-fb40-81c3-cc53c9733c3e@virtuozzo.com>
Date: Mon, 18 Oct 2021 11:13:52 +0300
From: Vasily Averin <vvs@...tuozzo.com>
To: Michal Hocko <mhocko@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: Roman Gushchin <guro@...com>, Uladzislau Rezki <urezki@...il.com>,
Vlastimil Babka <vbabka@...e.cz>,
Shakeel Butt <shakeelb@...gle.com>,
Mel Gorman <mgorman@...hsingularity.net>,
cgroups@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, kernel@...nvz.org
Subject: [PATCH memcg 0/1] false global OOM triggered by memcg-limited task
While checking the patches fixed broken memcg accounting in vmalloc I found
another issue: a false global OOM triggered by memcg-limited user space task.
I executed vmalloc-eater inside a memcg limited LXC container in a loop, checked
that it does not consume host memory beyond the assigned limit, triggers memcg OOM
and generates "Memory cgroup out of memory" messages. Everything was as expected.
However I was surprised to find quite rare global OOM messages too.
I set sysctl vm.panic_on_oom to 1, repeated the test and successfully
crashed the node.
Dmesg showed that global OOM was detected on 16 GB node with ~10 GB of free memory.
syz-executor invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=1000
CPU: 2 PID: 15307 Comm: syz-executor Kdump: loaded Not tainted 5.15.0-rc4+ #55
Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.4 04/01/2014
Call Trace:
dump_stack_lvl+0x57/0x72
dump_header+0x4a/0x2c1
out_of_memory.cold+0xa/0x7e
pagefault_out_of_memory+0x46/0x60
exc_page_fault+0x79/0x2b0
asm_exc_page_fault+0x1e/0x30
...
Mem-Info:
Node 0 DMA: 0*4kB 0*8kB <...> = 13296kB
Node 0 DMA32: 705*4kB (UM) <...> = 2586964kB
Node 0 Normal: 2743*4kB (UME) <...> = 6904828kB
...
4095866 pages RAM
...
Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
Full dmesg can be found in attached file.
How could this happen?
User-space task inside the memcg-limited container generated a page fault,
its handler do_user_addr_fault() called handle_mm_fault which could not
allocate the page due to exceeding the memcg limit and returned VM_FAULT_OOM.
Then do_user_addr_fault() called pagefault_out_of_memory() which executed
out_of_memory() without set of memcg.
Partially this problem depends on one of my recent patches, disabled unlimited
memory allocation for dying tasks. However I think the problem can happen
on non-killed tasks too, for example because of kmem limit.
At present do_user_addr_fault() does not know why page allocation was failed,
i.e. was it global or memcg OOM. I propose to save this information in new flag
on task_struct. It can be set in case of memcg restrictons in
obj_cgroup_charge_pages() (for memory controller) and in try_charge_memcg()
(for kmem controller). Then it can be used in mem_cgroup_oom_synchronize()
called inside pagefault_out_of_memory():
in case of memcg-related restrictions it will not trigger fake global OOM and
returns to user space which will retry the fault or kill the process if it got
a fatal signal.
Thank you,
Vasily Averin
Vasily Averin (1):
memcg: prevent false global OOM trigggerd by memcg limited task.
include/linux/sched.h | 1 +
mm/memcontrol.c | 12 +++++++++---
2 files changed, 10 insertions(+), 3 deletions(-)
--
2.32.0
View attachment "dmesg-oom.txt" of type "text/plain" (25650 bytes)
Powered by blists - more mailing lists