lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YX+kMpr/fvmMW7hy@dhcp22.suse.cz>
Date:   Mon, 1 Nov 2021 09:24:18 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Yongqiang Liu <liuyongqiang13@...wei.com>
Cc:     rientjes@...gle.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, penguin-kernel@...ove.sakura.ne.jp,
        "Wangkefeng (OS Kernel Lab)" <wangkefeng.wang@...wei.com>
Subject: Re: [QUESTION] oom killed the key system process triggered by a bad
 process alloc memory with MAP_LOCKED

Hi,

On Mon 01-11-21 16:05:50, Yongqiang Liu wrote:
[...]
> And we found that when the oom_reaper is done but the memory is still high:
> 
> [   45.115685] Out of memory: Killed process 2553 (oom) total-vm:953404kB,
> anon-rss:947748kB, file-rss:388kB, shmem-rss:0kB, UID:0 pgtables:1896kB
> oom_score_adj:1000
> [   45.115739] oom_reaper: reaped process 2553 (oom), now anon-rss:947708kB,
> file-rss:0kB, shmem-rss:0kB
> 
> This is because the bad proccess which recieved SIGKILL is unlocking the mem
> to exit which needs more time. And the next oom is triggered to kill the
> other system process.

Yes, this is a known limitation of the oom_reaper based OOM killing.
__oom_reap_task_mm has to skip over mlocked memory areas because
munlocking requires some locking (or at least that was the case when the
oom reaper was introduced) and the primary purpose of the oom_reaper is
to guarantee a forward progress.

Addressing that limitation would require the munlock operation to not
depend on any locking. I am not sure how much work that would be with
the current code. Until now this was not a high priority because
processes with a high mlock limit should be really trusted with their
memory consumption so they shouldn't be really the primary oom killer
target.

Are you seeing this problem happening with a real workload or is this
only triggered with some artificial tests? E.g. LTP oom tests are known
to trigger this situation but they do not represent any real workload.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ