lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <1420647358-8901-1-git-send-email-daniel.thompson@linaro.org>
Date:	Wed,  7 Jan 2015 16:15:58 +0000
From:	Daniel Thompson <daniel.thompson@...aro.org>
To:	Jason Wessel <jason.wessel@...driver.com>
Cc:	Daniel Thompson <daniel.thompson@...aro.org>,
	linux-kernel@...r.kernel.org, patches@...aro.org,
	linaro-kernel@...ts.linaro.org,
	John Stultz <john.stultz@...aro.org>,
	Sumit Semwal <sumit.semwal@...aro.org>,
	Mike Travis <travis@....com>,
	Randy Dunlap <rdunlap@...radead.org>,
	Dimitri Sivanich <sivanich@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Borislav Petkov <bp@...e.de>,
	kgdb-bugreport@...ts.sourceforge.net,
	Ingo Molnar <mingo@...nel.org>
Subject: [RESEND PATCH v3 3.19-rc2] kgdb: Timeout if secondary CPUs ignore the roundup

Currently if an active CPU fails to respond to a roundup request the
CPU that requested the roundup will become stuck. This needlessly
reduces the robustness of the debugger.

This patch introduces a timeout allowing the system state to be examined
even when the system contains unresponsive processors. It also modifies
kdb's cpu command to make it censor attempts to switch to unresponsive
processors and to report their state as (D)ead.

Signed-off-by: Daniel Thompson <daniel.thompson@...aro.org>
Cc: Jason Wessel <jason.wessel@...driver.com>
Cc: Mike Travis <travis@....com>
Cc: Randy Dunlap <rdunlap@...radead.org>
Cc: Dimitri Sivanich <sivanich@....com>
Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: Borislav Petkov <bp@...e.de>
Cc: kgdb-bugreport@...ts.sourceforge.net
Cc: Ingo Molnar <mingo@...nel.org>
---

Notes:
    Jason:
      v2 of this patch is already integrated into kgdb-next. It's probably
      best to nuke v2 and replace it with this patch. However I can easily
      provide a diff of v2 versus v3 if you prefer. Just ask...
    
    v3:
    
    * Fix an out-by-one error in kdb_cpu().
    
    * Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we
      really want a static limit (Jason Wessel).
    
    * Removed the "KGDB: " prefix from the pr_crit() in debug_core.c
      (kgdb-next contains a patch which introduced pr_fmt() to this file
      to the tag will now be applied automatically).
    
    v2:
    
    * Set CATASTROPHIC if the system contains unresponsive processors
      (Jason Wessel)
    

 kernel/debug/debug_core.c       | 9 +++++++--
 kernel/debug/kdb/kdb_debugger.c | 4 ++++
 kernel/debug/kdb/kdb_main.c     | 4 +++-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 1adf62b39b96..f21580b347cc 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -471,6 +471,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 	int cpu;
 	int trace_on = 0;
 	int online_cpus = num_online_cpus();
+	u64 time_left;

 	kgdb_info[ks->cpu].enter_kgdb++;
 	kgdb_info[ks->cpu].exception_state |= exception_state;
@@ -595,9 +596,13 @@ return_normal:
 	/*
 	 * Wait for the other CPUs to be notified and be waiting for us:
 	 */
-	while (kgdb_do_roundup && (atomic_read(&masters_in_kgdb) +
-				atomic_read(&slaves_in_kgdb)) != online_cpus)
+	time_left = loops_per_jiffy * HZ;
+	while (kgdb_do_roundup && --time_left &&
+	       (atomic_read(&masters_in_kgdb) + atomic_read(&slaves_in_kgdb)) !=
+		   online_cpus)
 		cpu_relax();
+	if (!time_left)
+		pr_crit("Timed out waiting for secondary CPUs.\n");

 	/*
 	 * At this point the primary processor is completely
diff --git a/kernel/debug/kdb/kdb_debugger.c b/kernel/debug/kdb/kdb_debugger.c
index 8859ca34dcfe..15e1a7af5dd0 100644
--- a/kernel/debug/kdb/kdb_debugger.c
+++ b/kernel/debug/kdb/kdb_debugger.c
@@ -129,6 +129,10 @@ int kdb_stub(struct kgdb_state *ks)
 		ks->pass_exception = 1;
 		KDB_FLAG_SET(CATASTROPHIC);
 	}
+	/* set CATASTROPHIC if the system contains unresponsive processors */
+	for_each_online_cpu(i)
+		if (!kgdb_info[i].enter_kgdb)
+			KDB_FLAG_SET(CATASTROPHIC);
 	if (KDB_STATE(SSBPT) && reason == KDB_REASON_SSTEP) {
 		KDB_STATE_CLEAR(SSBPT);
 		KDB_STATE_CLEAR(DOING_SS);
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 379650b984f8..0c1dc7fa2e58 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2157,6 +2157,8 @@ static void kdb_cpu_status(void)
 	for (start_cpu = -1, i = 0; i < NR_CPUS; i++) {
 		if (!cpu_online(i)) {
 			state = 'F';	/* cpu is offline */
+		} else if (!kgdb_info[i].enter_kgdb) {
+			state = 'D';	/* cpu is online but unresponsive */
 		} else {
 			state = ' ';	/* cpu is responding to kdb */
 			if (kdb_task_state_char(KDB_TSK(i)) == 'I')
@@ -2210,7 +2212,7 @@ static int kdb_cpu(int argc, const char **argv)
 	/*
 	 * Validate cpunum
 	 */
-	if ((cpunum > NR_CPUS) || !cpu_online(cpunum))
+	if ((cpunum >= CONFIG_NR_CPUS) || !kgdb_info[cpunum].enter_kgdb)
 		return KDB_BADCPUNUM;

 	dbg_switch_cpu = cpunum;
--
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ