linux-kernel - Re: [PATCH] fs: fix for core dumping of a process getting oom-killed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-id: <216745b1-2d4c-8707-2403-07117e6b3bca@apple.com>
Date:   Tue, 21 Sep 2021 20:12:08 -0500
From:   Vishnu Rangayyan <vishnu.rangayyan@...le.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Al Viro <viro@...iv.linux.org.uk>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fs: fix for core dumping of a process getting oom-killed



On 9/21/21 5:59 AM, Michal Hocko wrote:
> On Mon 20-09-21 23:38:40, Vishnu Rangayyan wrote:
>>
>> Processes inside a memcg that get core dumped when there is less memory
>> available in the memcg can have the core dumping interrupted by the
>> oom-killer.
>> We saw this with qemu processes inside a memcg, as in this trace below.
>> The memcg was not out of memory when the core dump was triggered.
> 
> Why is it important to mention that the the memcg was not oom when the
> dump was triggered?
> 
>> [201169.028782] qemu-kata-syste invoked oom-killer: gfp_mask=0x101c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_HARDWALL|__GFP_MOVABLE|__GFP_WRITE),
>> order=0, oom_score_adj=-100
> [...]
>> [201169.028863] memory: usage 12218368kB, limit 12218368kB, failcnt 1728013
> 
> it obviously is for the particular allocation from the core dumping
> code.
> 
>> [201169.028864] memory+swap: usage 12218368kB, limit 9007199254740988kB, failcnt 0
>> [201169.028864] kmem: usage 154424kB, limit 9007199254740988kB, failcnt 0
>> [201169.028880] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=podacfa3d53-2068-4b61-a754-fa21968b4201,mems_allowed=0-1,oom_memcg=/kubepods/burstable/podacfa3d53-2068-4b61-a754-fa21968b4201,task_memcg=/kubepods/burstable/podacfa3d53-2068-4b61-a754-fa21968b4201,task=qemu-kata-syste,pid=1887079,uid=0
>> [201169.028888] Memory cgroup out of memory: Killed process 1887079
>> (qemu-kata-syste) total-vm:13598556kB, anon-rss:39836kB, file-rss:8712kB, shmem-rss:12017992kB, UID:0 pgtables:24204kB oom_score_adj:-100
>> [201169.045201] oom_reaper: reaped process 1887079 (qemu-kata-syste), now anon-rss:0kB, file-rss:28kB, shmem-rss:12018016kB
>>
>> This change adds an fsync only for regular file core dumps based on a
>> configurable limit core_sync_bytes placed alongside other core dump params
>> and defaults the limit to (an arbitrary value) of 128KB.
>> Setting core_sync_bytes to zero disables the sync.
> 
> This doesn't really explain neither the problem nor the solution.
My apologies for not explaining better.
  Why
> is fsync helping at all? Why do we need a new sysctl to address the
> problem and how does it help to prevent the memcg OOM. Also why is this
> a problem in the first place.
The simple intent is to allow the core dumping to succeed in low memory 
situations where the dump_emit doesn't tip over the thing and trigger 
the oom-killer. This change avoids only that particular issue.
Agree, its not the actual problem at all. If the core dumping fails, 
that sometimes prevents or delays looking into the actual issue.
The sysctl was to allow disabling this behavior or to fine tune for 
special cases such as limited memory environments.
> 
> Have a look at the oom report. It says that only 8MB of the 11GB limit
> is consumed by the file backed memory. The absolute majority (98%) is
> sitting in the shmem and fsync will not help a wee bit there.
Agree.
>