lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1539242268-63036-1-git-send-email-feng.tang@intel.com>
Date:   Thu, 11 Oct 2018 15:17:48 +0800
From:   Feng Tang <feng.tang@...el.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        H Peter Anvin <hpa@...ux.intel.com>,
        Borislav Petkov <bp@...en8.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org
Cc:     Feng Tang <feng.tang@...el.com>
Subject: [PATCH RFC] panic: Avoid extra noisy messages due to stopped cpus

Sometimes when debugging kernel panic, we saw many extra noisy error
messages after the expected end:

[   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
[   35.749975] ------------[ cut here ]------------

These messages may overflow the sceen (framebuffer) and make debugging
much difficulter.

This hack patch just quickly prevent these noisy message, and would
really like to get some comments and suggestions.

I have tried other ways like adding a panic notifier block inside
tick/sched code to cancel tick_sched timer in panic case, which
also works.

These extra messages are of 2 kinds:
a)
	 WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
	 Call Trace:
	  <IRQ>
	  try_to_wake_up+0x157/0x430
	  default_wake_function+0xd/0x10
	  autoremove_wake_function+0x11/0x60
	  __wake_up_common+0x8a/0x160
	  __wake_up_common_lock+0x6c/0x90
	  __wake_up+0xe/0x10
	  wake_up_klogd_work_func+0x3b/0x60
	  irq_work_run_list+0x4e/0x80
	  irq_work_tick+0x40/0x50
	  update_process_times+0x3d/0x50
	  tick_sched_timer+0x38/0x80
	  __hrtimer_run_queues+0xce/0x200
	  hrtimer_interrupt+0xac/0x1f0
	  smp_apic_timer_interrupt+0x6e/0x140
	  apic_timer_interrupt+0x8e/0xa0

b)
	sched: Unexpected reschedule of offline CPU#0!
	 ------------[ cut here ]------------
	 WARNING: CPU: 1 PID: 300 at arch/x86/kernel/smp.c:141 native_smp_send_reschedule+0x3d/0x50
	  trigger_load_balance+0x125/0x230
	  scheduler_tick+0xa2/0xd0
	  update_process_times+0x42/0x50
	  tick_sched_handle.isra.5+0x21/0x60
	  tick_sched_timer+0x38/0x80
	  __hrtimer_run_queues+0xce/0x200
	  hrtimer_interrupt+0xac/0x1f0
	  smp_apic_timer_interrupt+0x6e/0x140
	  apic_timer_interrupt+0x8e/0xa0

Signed-off-by: Feng Tang <feng.tang@...el.com>
---
 arch/x86/kernel/process.c | 1 +
 kernel/sched/fair.c       | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c93fcfd..b703862 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -520,6 +520,7 @@ void stop_this_cpu(void *dummy)
 	 * Remove this CPU:
 	 */
 	set_cpu_online(smp_processor_id(), false);
+	set_cpu_active(smp_processor_id(), false);
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7fc4a37..cf41b7b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9034,7 +9034,7 @@ static inline int find_new_ilb(void)
 {
 	int ilb = cpumask_first(nohz.idle_cpus_mask);
 
-	if (ilb < nr_cpu_ids && idle_cpu(ilb))
+	if (ilb < nr_cpu_ids && idle_cpu(ilb) && cpu_online(ilb))
 		return ilb;
 
 	return nr_cpu_ids;
-- 
2.7.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ