lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210409055316.1709-1-dongli.zhang@oracle.com>
Date:   Thu,  8 Apr 2021 22:53:16 -0700
From:   Dongli Zhang <dongli.zhang@...cle.com>
To:     linux-kernel@...r.kernel.org
Cc:     tglx@...utronix.de, peterz@...radead.org, qais.yousef@....com,
        mpe@...erman.id.au, paulmck@...nel.org, npiggin@...il.com,
        frederic@...nel.org, ethp@...com, joe.jin@...cle.com,
        dongli.zhang@...cle.com
Subject: [PATCH v2 1/1] kernel/cpu: to track which CPUHP callback is failed

During bootup or cpu hotplug, the cpuhp_up_callbacks() or
cpuhp_down_callbacks() call many CPUHP callbacks (e.g., perf, mm,
workqueue, RCU, kvmclock and more) for each cpu to online/offline. It may
roll back to its previous state if any of callbacks is failed. As a result,
the user will not be able to know which callback is failed and usually the
only symptom is cpu online/offline failure.

This patch is to print more debug log to help user narrow down where is the
root cause.

Below is the example that how the patch helps narrow down the root cause
for the issue fixed by commit d7eb79c6290c ("KVM: kvmclock: Fix vCPUs > 64
can't be online/hotpluged").

We will have below dynamic debug log once we add
dyndbg="file kernel/cpu.c +p" to kernel command line and when issue is
reproduced.

"CPUHP up callback failure (-12) for cpu 64 at kvmclock:setup_percpu (66)"

Cc: Joe Jin <joe.jin@...cle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@...cle.com>
---
Changed since v1 RFC:
  - use pr_debug() but not pr_err_once() (suggested by Qais Yousef)
  - print log for cpuhp_down_callbacks() as well (suggested by Qais Yousef)

 kernel/cpu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 1b6302ecbabe..bcd4dd7de9c3 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -621,6 +621,10 @@ static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 		st->state++;
 		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
 		if (ret) {
+			pr_debug("CPUHP up callback failure (%d) for cpu %u at %s (%d)\n",
+				 ret, cpu, cpuhp_get_step(st->state)->name,
+				 st->state);
+
 			if (can_rollback_cpu(st)) {
 				st->target = prev_state;
 				undo_cpu_up(cpu, st);
@@ -990,6 +994,10 @@ static int cpuhp_down_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 	for (; st->state > target; st->state--) {
 		ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
 		if (ret) {
+			pr_debug("CPUHP down callback failure (%d) for cpu %u at %s (%d)\n",
+				 ret, cpu, cpuhp_get_step(st->state)->name,
+				 st->state);
+
 			st->target = prev_state;
 			if (st->state < prev_state)
 				undo_cpu_down(cpu, st);
-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ