linux-kernel - Re: [PATCH] fs: fix for core dumping of a process getting oom-killed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YUm7LLqwrXygzKll@dhcp22.suse.cz>
Date:   Tue, 21 Sep 2021 12:59:56 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Vishnu Rangayyan <vishnu.rangayyan@...le.com>
Cc:     Al Viro <viro@...iv.linux.org.uk>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fs: fix for core dumping of a process getting oom-killed

On Mon 20-09-21 23:38:40, Vishnu Rangayyan wrote:
> 
> Processes inside a memcg that get core dumped when there is less memory
> available in the memcg can have the core dumping interrupted by the
> oom-killer.
> We saw this with qemu processes inside a memcg, as in this trace below.
> The memcg was not out of memory when the core dump was triggered.

Why is it important to mention that the the memcg was not oom when the
dump was triggered?

> [201169.028782] qemu-kata-syste invoked oom-killer: gfp_mask=0x101c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_HARDWALL|__GFP_MOVABLE|__GFP_WRITE),
> order=0, oom_score_adj=-100
[...]
> [201169.028863] memory: usage 12218368kB, limit 12218368kB, failcnt 1728013

it obviously is for the particular allocation from the core dumping
code.

> [201169.028864] memory+swap: usage 12218368kB, limit 9007199254740988kB, failcnt 0
> [201169.028864] kmem: usage 154424kB, limit 9007199254740988kB, failcnt 0
> [201169.028880] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=podacfa3d53-2068-4b61-a754-fa21968b4201,mems_allowed=0-1,oom_memcg=/kubepods/burstable/podacfa3d53-2068-4b61-a754-fa21968b4201,task_memcg=/kubepods/burstable/podacfa3d53-2068-4b61-a754-fa21968b4201,task=qemu-kata-syste,pid=1887079,uid=0
> [201169.028888] Memory cgroup out of memory: Killed process 1887079
> (qemu-kata-syste) total-vm:13598556kB, anon-rss:39836kB, file-rss:8712kB, shmem-rss:12017992kB, UID:0 pgtables:24204kB oom_score_adj:-100
> [201169.045201] oom_reaper: reaped process 1887079 (qemu-kata-syste), now anon-rss:0kB, file-rss:28kB, shmem-rss:12018016kB
> 
> This change adds an fsync only for regular file core dumps based on a
> configurable limit core_sync_bytes placed alongside other core dump params
> and defaults the limit to (an arbitrary value) of 128KB.
> Setting core_sync_bytes to zero disables the sync.

This doesn't really explain neither the problem nor the solution. Why
is fsync helping at all? Why do we need a new sysctl to address the
problem and how does it help to prevent the memcg OOM. Also why is this
a problem in the first place.

Have a look at the oom report. It says that only 8MB of the 11GB limit
is consumed by the file backed memory. The absolute majority (98%) is
sitting in the shmem and fsync will not help a wee bit there.
-- 
Michal Hocko
SUSE Labs