lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240806074131.36007-1-dongml2@chinatelecom.cn>
Date: Tue,  6 Aug 2024 15:41:31 +0800
From: Menglong Dong <menglong8.dong@...il.com>
To: peterz@...radead.org
Cc: mingo@...hat.com,
	juri.lelli@...hat.com,
	vincent.guittot@...aro.org,
	dietmar.eggemann@....com,
	rostedt@...dmis.org,
	bsegall@...gle.com,
	mgorman@...e.de,
	vschneid@...hat.com,
	linux-kernel@...r.kernel.org,
	Menglong Dong <dongml2@...natelecom.cn>,
	Bin Lai <laib2@...natelecom.cn>
Subject: [PATCH next] sched: make printk safe when rq lock is held

The dead lock can happen if we try to use printk(), such as a call of
SCHED_WARN_ON(), during the rq->__lock is held. The printk() will try to
print the message to the console, and the console driver can call
queue_work_on(), which will try to obtain rq->__lock again.

This means that any WARN during the kernel function that hold the
rq->__lock, such as schedule(), sched_ttwu_pending(), etc, can cause dead
lock.

Following is the call trace of the deadlock case that I encounter:

  PID: 0      TASK: ff36bfda010c8000  CPU: 156  COMMAND: "swapper/156"
   #0 crash_nmi_callback+30
   #1 nmi_handle+85
   #2 default_do_nmi+66
   #3 exc_nmi+291
   #4 end_repeat_nmi+22
      [exception RIP: native_queued_spin_lock_slowpath+96]
   #5 native_queued_spin_lock_slowpath+96
   #6 _raw_spin_lock+30
   #7 ttwu_queue+111
   #8 try_to_wake_up+375
   #9 __queue_work+462
  #10 queue_work_on+32
  #11 soft_cursor+420
  #12 bit_cursor+898
  #13 hide_cursor+39
  #14 vt_console_print+995
  #15 call_console_drivers.constprop.0+204
  #16 console_unlock+374
  #17 vprintk_emit+280
  #18 printk+88
  #19 __warn_printk+71
  #20 enqueue_task_fair+1779
  #21 activate_task+102
  #22 ttwu_do_activate+155
  #23 sched_ttwu_pending+177
  #24 flush_smp_call_function_from_idle+42
  #25 do_idle+161
  #26 cpu_startup_entry+25
  #27 secondary_startup_64_no_verify+194

Fix this by using __printk_safe_enter()/__printk_safe_exit() in
rq_pin_lock()/rq_unpin_lock(). Then, printk will defer to print out the
buffers to the console.

Signed-off-by: Menglong Dong <dongml2@...natelecom.cn>
Signed-off-by: Bin Lai <laib2@...natelecom.cn>
---
 kernel/sched/sched.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4c36cc680361..ba06dfdaea55 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1632,6 +1632,7 @@ static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
 {
 	rf->cookie = lockdep_pin_lock(__rq_lockp(rq));
 
+	__printk_safe_enter();
 #ifdef CONFIG_SCHED_DEBUG
 	rq->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
 	rf->clock_update_flags = 0;
@@ -1648,6 +1649,7 @@ static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
 		rf->clock_update_flags = RQCF_UPDATED;
 #endif
 
+	__printk_safe_exit();
 	lockdep_unpin_lock(__rq_lockp(rq), rf->cookie);
 }
 
-- 
2.39.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ