lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YQeR8FTlzrojIbSo@dhcp22.suse.cz>
Date:   Mon, 2 Aug 2021 08:34:24 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Aaron Tomlin <atomlin@...hat.com>
Cc:     linux-mm@...ck.org, akpm@...ux-foundation.org,
        penguin-kernel@...ove.sakura.ne.jp, rientjes@...gle.com,
        llong@...hat.com, neelx@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] mm/oom_kill: show oom eligibility when displaying the
 current memory state of all tasks

On Fri 30-07-21 17:20:02, Aaron Tomlin wrote:
> Changes since v2:
>  - Use single character (e.g. 'R' for MMF_OOM_SKIP) as suggested
>    by Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
>  - Add new header to oom_dump_tasks documentation
>  - Provide further justification
> 
> 
> The output generated by dump_tasks() can be helpful to determine why
> there was an OOM condition and which rogue task potentially caused it.
> Please note that this is only provided when sysctl oom_dump_tasks is
> enabled.
> 
> At the present time, when showing potential OOM victims, we do not
> exclude any task that are not OOM eligible e.g.

Well, this is not precise. We do exclude ineligible. Consider tasks that
are outside of the OOM domain for example. You are right that we are not
excluding all of them though.

> those that have
> MMF_OOM_SKIP set; it is possible that the last OOM killable victim was
> already OOM killed, yet the OOM reaper failed to reclaim memory and set
> MMF_OOM_SKIP. This can be confusing (or perhaps even be misleading) to the
> viewer. Now, we already unconditionally display a task's oom_score_adj_min
> value that can be set to OOM_SCORE_ADJ_MIN which is indicative of an
> "unkillable" task.
> 
> This patch provides a clear indication with regard to the OOM ineligibility
> (and why) of each displayed task with the addition of a new column namely
> "oom_skipped". An example is provided below:
> 
>     [ 5084.524970] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj oom_skipped name
>     [ 5084.526397] [660417]     0 660417    35869      683   167936        0         -1000 M conmon
>     [ 5084.526400] [660452]     0 660452   175834      472    86016        0          -998  pod
>     [ 5084.527460] [752415]     0 752415    35869      650   172032        0         -1000 M conmon
>     [ 5084.527462] [752575] 1001050000 752575   184205    11158   700416        0           999  npm
>     [ 5084.527467] [753606] 1001050000 753606   183380    46843  2134016        0           999  node
>     [ 5084.527581] Memory cgroup out of memory: Killed process 753606 (node) total-vm:733520kB, anon-rss:161228kB, file-rss:26144kB, shmem-rss:0kB, UID:1001050000
> 
> So, a single character 'M' is for OOM_SCORE_ADJ_MIN, 'R' MMF_OOM_SKIP and
> 'V' for in_vfork().

I have to say I dislike this for two reasons. First and foremost it
makes parsing unnecessarily more complex. Now you have a potentially
an empty column to special case. Secondly MMF_OOM_SKIP is an internal
state that shouldn't really leak to userspace IMO. in_vfork is a racy
check and M is already expressed via oom_score_adj so it duplicates an
existing information.

If you really want/need to make any change here then I would propose to
either add E(eligible)/I(ligible) column without any specifics or
consistently skip over all tasks which are not eligible.
> 2.31.1

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ