lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210730162002.279678-1-atomlin@redhat.com>
Date:   Fri, 30 Jul 2021 17:20:02 +0100
From:   Aaron Tomlin <atomlin@...hat.com>
To:     linux-mm@...ck.org
Cc:     akpm@...ux-foundation.org, mhocko@...e.com,
        penguin-kernel@...ove.sakura.ne.jp, rientjes@...gle.com,
        llong@...hat.com, neelx@...hat.com, linux-kernel@...r.kernel.org
Subject: [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks

Changes since v2:
 - Use single character (e.g. 'R' for MMF_OOM_SKIP) as suggested
   by Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
 - Add new header to oom_dump_tasks documentation
 - Provide further justification


The output generated by dump_tasks() can be helpful to determine why
there was an OOM condition and which rogue task potentially caused it.
Please note that this is only provided when sysctl oom_dump_tasks is
enabled.

At the present time, when showing potential OOM victims, we do not
exclude any task that are not OOM eligible e.g. those that have
MMF_OOM_SKIP set; it is possible that the last OOM killable victim was
already OOM killed, yet the OOM reaper failed to reclaim memory and set
MMF_OOM_SKIP. This can be confusing (or perhaps even be misleading) to the
viewer. Now, we already unconditionally display a task's oom_score_adj_min
value that can be set to OOM_SCORE_ADJ_MIN which is indicative of an
"unkillable" task.

This patch provides a clear indication with regard to the OOM ineligibility
(and why) of each displayed task with the addition of a new column namely
"oom_skipped". An example is provided below:

    [ 5084.524970] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj oom_skipped name
    [ 5084.526397] [660417]     0 660417    35869      683   167936        0         -1000 M conmon
    [ 5084.526400] [660452]     0 660452   175834      472    86016        0          -998  pod
    [ 5084.527460] [752415]     0 752415    35869      650   172032        0         -1000 M conmon
    [ 5084.527462] [752575] 1001050000 752575   184205    11158   700416        0           999  npm
    [ 5084.527467] [753606] 1001050000 753606   183380    46843  2134016        0           999  node
    [ 5084.527581] Memory cgroup out of memory: Killed process 753606 (node) total-vm:733520kB, anon-rss:161228kB, file-rss:26144kB, shmem-rss:0kB, UID:1001050000

So, a single character 'M' is for OOM_SCORE_ADJ_MIN, 'R' MMF_OOM_SKIP and
'V' for in_vfork().


Signed-off-by: Aaron Tomlin <atomlin@...hat.com>
---
 Documentation/admin-guide/sysctl/vm.rst |  5 ++--
 mm/oom_kill.c                           | 31 +++++++++++++++++++++----
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 003d5cc3751b..4c79fa00ddb3 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -650,8 +650,9 @@ oom_dump_tasks
 Enables a system-wide task dump (excluding kernel threads) to be produced
 when the kernel performs an OOM-killing and includes such information as
 pid, uid, tgid, vm size, rss, pgtables_bytes, swapents, oom_score_adj
-score, and name.  This is helpful to determine why the OOM killer was
-invoked, to identify the rogue task that caused it, and to determine why
+score, oom eligibility status and name.  This is helpful to determine why
+the OOM killer was invoked, to identify the rogue task that caused it, and
+to determine why
 the OOM killer chose the task it did to kill.
 
 If this is set to zero, this information is suppressed.  On very
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index c729a4c4a1ac..36daa6917b62 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -160,6 +160,27 @@ static inline bool is_sysrq_oom(struct oom_control *oc)
 	return oc->order == -1;
 }
 
+/**
+ * is_task_eligible_oom - determine if and why a task cannot be OOM killed
+ * @tsk: task to check
+ *
+ * Needs to be called with task_lock().
+ */
+static const char * const is_task_oom_eligible(struct task_struct *p)
+{
+	long adj;
+
+	adj = (long)p->signal->oom_score_adj;
+	if (adj == OOM_SCORE_ADJ_MIN)
+		return "M";
+	else if (test_bit(MMF_OOM_SKIP, &p->mm->flags)
+		return "R";
+	else if (in_vfork(p))
+		return "V";
+	else
+		return "";
+}
+
 /* return true if the task is not adequate as candidate victim task. */
 static bool oom_unkillable_task(struct task_struct *p)
 {
@@ -401,12 +422,13 @@ static int dump_task(struct task_struct *p, void *arg)
 		return 0;
 	}
 
-	pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu         %5hd %s\n",
+	pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu         %5hd %1s %s\n",
 		task->pid, from_kuid(&init_user_ns, task_uid(task)),
 		task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
 		mm_pgtables_bytes(task->mm),
 		get_mm_counter(task->mm, MM_SWAPENTS),
-		task->signal->oom_score_adj, task->comm);
+		task->signal->oom_score_adj, is_task_oom_eligible(task),
+		task->comm);
 	task_unlock(task);
 
 	return 0;
@@ -420,12 +442,13 @@ static int dump_task(struct task_struct *p, void *arg)
  * memcg, not in the same cpuset, or bound to a disjoint set of mempolicy nodes
  * are not shown.
  * State information includes task's pid, uid, tgid, vm size, rss,
- * pgtables_bytes, swapents, oom_score_adj value, and name.
+ * pgtables_bytes, swapents, oom_score_adj value, oom eligibility status
+ * and name.
  */
 static void dump_tasks(struct oom_control *oc)
 {
 	pr_info("Tasks state (memory values in pages):\n");
-	pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
+	pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj oom_skipped name\n");
 
 	if (is_memcg_oom(oc))
 		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
-- 
2.31.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ