lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20241204182953.10854-1-oxana@cloudflare.com>
Date: Wed,  4 Dec 2024 18:29:53 +0000
From: Oxana Kharitonova <oxana@...udflare.com>
To: peterz@...radead.org,
	mingo@...hat.com,
	juri.lelli@...hat.com,
	vincent.guittot@...aro.org,
	dietmar.eggemann@....com,
	rostedt@...dmis.org,
	bsegall@...gle.com,
	mgorman@...e.de,
	vschneid@...hat.com,
	viro@...iv.linux.org.uk,
	brauner@...nel.org,
	jack@...e.cz
Cc: linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org,
	oxana@...udflare.com,
	kernel-team@...udflare.com
Subject: [PATCH] hung_task: add task->flags, blocked by coredump to log

For the processes which are terminated abnormally the kernel can provide 
a coredump if enabled. When the coredump is performed, the process and 
all its threads are put into the D state 
(TASK_UNINTERRUPTIBLE | TASK_FREEZABLE). 

On the other hand, we have kernel thread khungtaskd which monitors the 
processes in the D state. If the task stuck in the D state more than 
kernel.hung_task_timeout_secs, the hung_task alert appears in the kernel 
log.

The higher memory usage of a process, the longer it takes to create 
coredump, the longer tasks are in the D state. We have hung_task alerts 
for the processes with memory usage above 10Gb. Although, our 
kernel.hung_task_timeout_secs is 10 sec when the default is 120 sec.

Adding additional information to the log that the task is blocked by 
coredump will help with monitoring. Another approach might be to 
completely filter out alerts for such tasks, but in that case we would 
lose transparency about what is putting pressure on some system 
resources, e.g. we saw an increase in I/O when coredump occurs due its 
writing to disk.

Additionally, it would be helpful to have task_struct->flags in the log 
from the function sched_show_task(). Currently it prints 
task_struct->thread_info->flags, this seems misleading as the line 
starts with "task:xxxx".

Signed-off-by: Oxana Kharitonova <oxana@...udflare.com>
---
 kernel/hung_task.c  | 2 ++
 kernel/sched/core.c | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index c18717189..953169893 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -147,6 +147,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
                        print_tainted(), init_utsname()->release,
                        (int)strcspn(init_utsname()->version, " "),
                        init_utsname()->version);
+               if (t->flags & PF_POSTCOREDUMP)
+                       pr_err("      Blocked by coredump.\n");
                pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
                        " disables this message.\n");
                sched_show_task(t);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 95e40895a..7f3dd4528 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7701,9 +7701,9 @@ void sched_show_task(struct task_struct *p)
        if (pid_alive(p))
                ppid = task_pid_nr(rcu_dereference(p->real_parent));
        rcu_read_unlock();
-       pr_cont(" stack:%-5lu pid:%-5d tgid:%-5d ppid:%-6d flags:0x%08lx\n",
+       pr_cont(" stack:%-5lu pid:%-5d tgid:%-5d ppid:%-6d task_flags:0x%08lx flags:0x%08lx\n",
                free, task_pid_nr(p), task_tgid_nr(p),
-               ppid, read_task_thread_flags(p));
+               ppid, p->flags, read_task_thread_flags(p));

        print_worker_info(KERN_INFO, p);
        print_stop_info(KERN_INFO, p);
-- 
2.39.5


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ