linux-kernel - [RESEND 2] [PATCH] rlimits: Print more information when limits are exceeded

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20170218083750.12305-1-arun@arunraghavan.net>
Date:   Sat, 18 Feb 2017 14:07:50 +0530
From:   Arun Raghavan <arun@...nraghavan.net>
To:     linux-kernel@...r.kernel.org
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Arun Raghavan <arun@...nraghavan.net>
Subject: [RESEND 2] [PATCH] rlimits: Print more information when limits are exceeded

This dumps some information in logs when a process exceeds its CPU or RT
limits (soft and hard). Makes debugging easier when userspace triggers
these limits.

Signed-off-by: Arun Raghavan <arun@...nraghavan.net>
---
 kernel/time/posix-cpu-timers.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Hello,
This has come up a couple of times in the past, but we haven't been able to
resolve whatever issues were pointed out.

In the mean time, we have frustrated users who don't know where they're getting
a SIGKILL from, and I'd really like to have a way for people to not have to go
through this.

The issues that came up the last time were:

 1. SIGXCPU messages shouldn't be needed since they can be caught: it's still
    useful to have the log because it isn't always possible to pin down the
    thread causing the problem in userspace.

 2. SIGKILL logging should be centralised: there seem to be multiple paths that
    trigger a SIGKILL -- and it seemed a bit ugly to try to add a reason
    parameter on all of them for the KILL case. Any other suggestions on how to
    deal with this?

I'm happy to fix this up to actually make it this time, but if there aren't
none, just pushing this out will make our lives a little less painful.

Thanks,
Arun

diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index e9e8c10..6dbcf84 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -860,6 +860,9 @@ static void check_thread_timers(struct task_struct *tsk,
 			 * At the hard limit, we just die.
 			 * No need to calculate anything else now.
 			 */
+			printk(KERN_INFO
+				"CPU Watchdog Timeout (hard): %s[%d]\n",
+				tsk->comm, task_pid_nr(tsk));
 			__group_send_sig_info(SIGKILL, SEND_SIG_PRIV, tsk);
 			return;
 		}
@@ -872,7 +875,7 @@ static void check_thread_timers(struct task_struct *tsk,
 				sig->rlim[RLIMIT_RTTIME].rlim_cur = soft;
 			}
 			printk(KERN_INFO
-				"RT Watchdog Timeout: %s[%d]\n",
+				"RT Watchdog Timeout (soft): %s[%d]\n",
 				tsk->comm, task_pid_nr(tsk));
 			__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
 		}
@@ -980,6 +983,9 @@ static void check_process_timers(struct task_struct *tsk,
 			 * At the hard limit, we just die.
 			 * No need to calculate anything else now.
 			 */
+			printk(KERN_INFO
+				"RT Watchdog Timeout (hard): %s[%d]\n",
+				tsk->comm, task_pid_nr(tsk));
 			__group_send_sig_info(SIGKILL, SEND_SIG_PRIV, tsk);
 			return;
 		}
@@ -987,6 +993,9 @@ static void check_process_timers(struct task_struct *tsk,
 			/*
 			 * At the soft limit, send a SIGXCPU every second.
 			 */
+			printk(KERN_INFO
+				"CPU Watchdog Timeout (soft): %s[%d]\n",
+				tsk->comm, task_pid_nr(tsk));
 			__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
 			if (soft < hard) {
 				soft++;
-- 
2.9.3