lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c9ddbc3f-dbb8-4ccc-8eb6-43554327984b@paulmck-laptop>
Date:   Tue, 17 Oct 2023 07:10:30 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        John Stultz <jstultz@...gle.com>,
        Stephen Boyd <sboyd@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        x86@...nel.org, joel@...lfernandes.org
Subject: Re: [PATCH] clocksource: disable irq when holding watchdog_lock.

On Tue, Oct 17, 2023 at 07:37:50PM +0900, Tetsuo Handa wrote:
> On 2023/10/17 6:47, Thomas Gleixner wrote:
> > Something like the uncompiled/untested below should cure it for real. It
> > really does not matter whether the TSC unstable event happens a bit
> > later. The system is unhappy no matter what.
> 
> Thank you. That patch fixes the lockdep warning.
> 
> But I'm hitting something different (but might be timer/scheduler related) problem.
> What config option would cause taking more than 2 minutes to bring up only 8 CPUs?
> (This environment is Oracle VM VirtualBox on Windows 11 host.)
> 
> Before:
> ----------------------------------------
> [    0.871340] Callback from call_rcu_tasks_trace() invoked.
> [    0.872337] rcu: 	Max phase no-delay instances is 400.
> [    0.875815] NMI watchdog: Perf NMI watchdog permanently disabled
> [    0.877552] smp: Bringing up secondary CPUs ...
> [    0.879599] smpboot: x86: Booting SMP configuration:
> [    0.881333] .... node  #0, CPUs:      #1 #2 #3 #4 #5 #6 #7
> [    0.892319] TSC synchronization [CPU#0 -> CPU#1]:
> [    0.892319] Measured 66852 cycles TSC warp between CPUs, turning off TSC clock.
> [    0.892319] tsc: Marking TSC unstable due to check_tsc_sync_source failed
> [    1.108337] 
> [    1.109319] ================================
> [    1.109319] WARNING: inconsistent lock state
> [    1.109319] 6.6.0-rc5+ #7 Not tainted
> [    1.109319] --------------------------------
> [    1.109319] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [    1.109319] rcu_tasks_kthre/13 [HC0[0]:SC1[1]:HE1:SE0] takes:
> [    1.109319] ffffffff86a85818 (watchdog_lock){?.-.}-{2:2}, at: clocksource_watchdog+0x26/0x620
> [    1.109319] {IN-HARDIRQ-W} state was registered at:
> [    1.109319]   lock_acquire+0xca/0x2d0
> [    1.109319]   _raw_spin_lock_irqsave+0x4f/0x70
> [    1.109319]   clocksource_mark_unstable+0x20/0xa0
> [    1.109319]   mark_tsc_unstable+0x4a/0x70
> [    1.109319]   check_tsc_sync_source+0x169/0x1d0
> [    1.109319]   __flush_smp_call_function_queue+0x16d/0x590
> [    1.109319]   generic_smp_call_function_single_interrupt+0x17/0x20
> [    1.109319]   __sysvec_call_function_single+0x3a/0x110
> [    1.109319]   sysvec_call_function_single+0x7f/0x90
> [    1.109319]   asm_sysvec_call_function_single+0x1f/0x30
> [    1.109319]   pv_native_safe_halt+0xf/0x20
> [    1.109319]   arch_cpu_idle+0xd/0x20
> [    1.109319]   default_idle_call+0x83/0x180
> [    1.109319]   do_idle+0x21a/0x260
> [    1.109319]   cpu_startup_entry+0x31/0x40
> [    1.109319]   rest_init+0x10f/0x200
> [    1.109319]   arch_call_rest_init+0x12/0x50
> [    1.109319]   start_kernel+0x80b/0xa30
> [    1.109319]   x86_64_start_reservations+0x1c/0x30
> [    1.109319]   x86_64_start_kernel+0xbf/0x110
> [    1.109319]   secondary_startup_64_no_verify+0x17d/0x18b
> [    1.109319] irq event stamp: 232
> [    1.109319] hardirqs last  enabled at (232): [<ffffffff850c93ac>] _raw_spin_unlock_irq+0x2c/0x60
> [    1.109319] hardirqs last disabled at (231): [<ffffffff850c9081>] _raw_spin_lock_irq+0x61/0x70
> [    1.109319] softirqs last  enabled at (88): [<ffffffff841c200f>] rcu_tasks_invoke_cbs+0xff/0x2f0
> [    1.109319] softirqs last disabled at (173): [<ffffffff840ece40>] __irq_exit_rcu+0x80/0xb0
> [    1.109319] 
> [    1.109319] other info that might help us debug this:
> [    1.109319]  Possible unsafe locking scenario:
> [    1.109319] 
> [    1.109319]        CPU0
> [    1.109319]        ----
> [    1.109319]   lock(watchdog_lock);
> [    1.109319]   <Interrupt>
> [    1.109319]     lock(watchdog_lock);
> [    1.109319] 
> [    1.109319]  *** DEADLOCK ***
> [    1.109319] 
> [    1.109319] 3 locks held by rcu_tasks_kthre/13:
> [    1.109319]  #0: ffffffff869d6730 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x361/0x510
> [    1.109319]  #1: ffffffff86a7ceb8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x43c/0x510
> [    1.109319]  #2: ffffb67680003e68 ((&watchdog_timer)){+.-.}-{0:0}, at: call_timer_fn+0x7b/0x240
> [    1.109319] 
> [    1.109319] stack backtrace:
> [    1.109319] CPU: 0 PID: 13 Comm: rcu_tasks_kthre Not tainted 6.6.0-rc5+ #7
> [    1.109319] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> [    1.109319] Call Trace:
> [    1.109319]  <IRQ>
> [    1.109319]  dump_stack_lvl+0x60/0xa0
> [    1.109319]  dump_stack+0x14/0x20
> [    1.109319]  print_usage_bug.part.0+0x25c/0x350
> [    1.109319]  mark_lock.part.0+0x72f/0xa00
> [    1.109319]  __lock_acquire+0x390/0x24f0
> [    1.109319]  ? lockdep_unlock+0x56/0xc0
> [    1.109319]  lock_acquire+0xca/0x2d0
> [    1.109319]  ? clocksource_watchdog+0x26/0x620
> [    1.109319]  ? __pfx_clocksource_watchdog+0x10/0x10
> [    1.109319]  _raw_spin_lock+0x34/0x50
> [    1.109319]  ? clocksource_watchdog+0x26/0x620
> [    1.109319]  clocksource_watchdog+0x26/0x620
> [    1.109319]  ? lock_acquire+0xca/0x2d0
> [    1.109319]  ? __pfx_clocksource_watchdog+0x10/0x10
> [    1.109319]  call_timer_fn+0xa4/0x240
> [    1.109319]  ? __pfx_clocksource_watchdog+0x10/0x10
> [    1.109319]  __run_timers.part.0+0x207/0x2d0
> [    1.109319]  run_timer_softirq+0x2e/0x60
> [    1.109319]  __do_softirq+0xc6/0x446
> [    1.109319]  __irq_exit_rcu+0x80/0xb0
> [    1.109319]  irq_exit_rcu+0x12/0x20
> [    1.109319]  sysvec_apic_timer_interrupt+0x84/0x90
> [    1.109319]  </IRQ>
> [    1.109319]  <TASK>
> [    1.109319]  asm_sysvec_apic_timer_interrupt+0x1f/0x30
> [    1.109319] RIP: 0010:_flat_send_IPI_mask+0x5d/0x70
> [    1.109319] Code: f8 c9 c3 cc cc cc cc 89 75 f4 e8 ae 24 20 00 8b 75 f4 ba 00 08 00 00 44 89 e7 e8 ae 63 ff ff e8 c9 23 20 00 fb 0f 1f 44 00 00 <4c> 8b 65 f8 c9 c3 cc cc cc cc 66 0f 1f 84 00 00 00 00 00 90 90 90
> [    1.109319] RSP: 0000:ffffb67680073ac0 EFLAGS: 00010202
> [    1.109319] RAX: 0000000000000073 RBX: 0000000000000000 RCX: 0000000000000040
> [    1.109319] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff840a4357
> [    1.109319] RBP: ffffb67680073ad0 R08: 0000000000000001 R09: 0000000000000001
> [    1.109319] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000020
> [    1.109319] R13: 00000000000000fb R14: 000000000000001f R15: ffffffff869da348
> [    1.109319]  ? _flat_send_IPI_mask+0x57/0x70
> [    1.109319]  flat_send_IPI_mask+0x15/0x20
> [    1.109319]  default_send_IPI_single+0x44/0x60
> [    1.109319]  native_send_call_func_single_ipi+0x17/0x20
> [    1.109319]  __smp_call_single_queue+0xd1/0x1c0
> [    1.109319]  generic_exec_single+0x3a/0x1e0
> [    1.109319]  smp_call_function_single+0xc2/0x140
> [    1.109319]  ? _raw_spin_unlock_irqrestore+0x35/0x60
> [    1.109319]  __sync_rcu_exp_select_node_cpus+0x284/0x500
> [    1.109319]  sync_rcu_exp_select_cpus+0x232/0x550
> [    1.109319]  synchronize_rcu_expedited+0x44e/0x510
> [    1.109319]  ? lock_is_held_type+0xaf/0x120
> [    1.109319]  synchronize_rcu+0x1ad/0x1e0
> [    1.109319]  ? __mutex_lock+0x9b/0xd30
> [    1.109319]  ? __mutex_lock+0xa86/0xd30
> [    1.109319]  ? lock_acquire+0xca/0x2d0
> [    1.109319]  ? find_held_lock+0x31/0x90
> [    1.109319]  rcu_tasks_pregp_step+0x12/0x20
> [    1.109319]  rcu_tasks_wait_gp+0x5c/0x430
> [    1.109319]  rcu_tasks_one_gp+0x49f/0x510
> [    1.109319]  ? rcu_tasks_one_gp+0x75/0x510
> [    1.109319]  ? __pfx_rcu_tasks_kthread+0x10/0x10
> [    1.109319]  rcu_tasks_kthread+0xb4/0xe0
> [    1.109319]  kthread+0x109/0x140
> [    1.109319]  ? __pfx_kthread+0x10/0x10
> [    1.109319]  ret_from_fork+0x3d/0x60
> [    1.109319]  ? __pfx_kthread+0x10/0x10
> [    1.109319]  ret_from_fork_asm+0x1b/0x30
> [    1.109319]  </TASK>
> [   40.748350] smp: Brought up 1 node, 8 CPUs
> [   48.744332] smpboot: Max logical packages: 1
> [   57.896327] tasks_rcu_exit_srcu_stall: rcu_tasks grace period number 5 (since boot) gp_state: RTGS_POST_SCAN_TASKLIST is 11129 jiffies old.
> [   69.324322] Please check any exiting tasks stuck between calls to exit_tasks_rcu_start() and exit_tasks_rcu_finish()
> [   83.516363] smpboot: Total of 8 processors activated (41779.32 BogoMIPS)
> [  103.157341] Callback from call_rcu_tasks_rude() invoked.
> [  105.961362] rcu_tasks_wait_gp: rcu_tasks grace period number 5 (since boot) is 104910 jiffies old.
> [  137.781409] rcu_tasks_wait_gp: rcu_tasks grace period number 5 (since boot) is 136730 jiffies old.
> [  142.589347] Callback from call_rcu_tasks() invoked.

The timing is about the same in both cases.  Does this happen in kernels
built with CONFIG_PROVE_LOCKING=n?

Either way, what mainline or -stable version is this?

							Thanx, Paul

> ----------------------------------------
> 
> After:
> ----------------------------------------
> [    0.000000] Linux version 6.6.0-rc5+ (root@...ntu) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #8 SMP PREEMPT_DYNAMIC Tue Oct 17 09:56:49 UTC 2023
> [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.6.0-rc5+ root=UUID=a2500e32-bbc8-4f32-bf86-94a6ae160b4b ro console=ttyS0,115200n8 console=tty0
> (...snipped...)
> [    0.340002] Dynamic Preempt: full
> [    0.340209] Running RCU self tests
> [    0.340210] Running RCU synchronous self tests
> [    0.340214] rcu: Preemptible hierarchical RCU implementation.
> [    0.340215] rcu: 	RCU lockdep checking is enabled.
> [    0.340215] rcu: 	RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=8.
> [    0.340217] 	Trampoline variant of Tasks RCU enabled.
> [    0.340217] 	Rude variant of Tasks RCU enabled.
> [    0.340218] 	Tracing variant of Tasks RCU enabled.
> [    0.340219] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
> [    0.340219] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
> [    0.340275] Running RCU synchronous self tests
> [    0.343652] NR_IRQS: 524544, nr_irqs: 488, preallocated irqs: 16
> [    0.344029] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> [    0.376872] Console: colour VGA+ 80x25
> [    0.376893] printk: console [tty0] enabled
> [    0.585051] printk: console [ttyS0] enabled
> [    0.807871] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> [    0.812562] ... MAX_LOCKDEP_SUBCLASSES:  8
> [    0.814986] ... MAX_LOCK_DEPTH:          48
> [    0.817368] ... MAX_LOCKDEP_KEYS:        8192
> [    0.819835] ... CLASSHASH_SIZE:          4096
> [    0.822452] ... MAX_LOCKDEP_ENTRIES:     32768
> [    0.825290] ... MAX_LOCKDEP_CHAINS:      65536
> [    0.827689] ... CHAINHASH_SIZE:          32768
> [    0.830278]  memory used by lock dependency info: 6493 kB
> [    0.833130]  memory used for stack traces: 4224 kB
> [    0.836051]  per task-struct memory footprint: 1920 bytes
> [    0.838946] ACPI: Core revision 20230628
> [    0.841933] APIC: Switch to symmetric I/O mode setup
> [    0.855141] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [    0.858422] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x25a39815946, max_idle_ns: 440795282683 ns
> [    0.863903] Calibrating delay loop (skipped) preset value.. 5222.41 BogoMIPS (lpj=2611208)
> [    0.864890] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
> [    0.864890] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
> [    0.864890] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
> [    0.864890] Spectre V2 : WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible via Spectre v2 BHB attacks!
> [    0.864890] Spectre V2 : Mitigation: Enhanced / Automatic IBRS
> [    0.864890] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
> [    0.864903] Spectre V2 : Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT
> [    0.866894] RETBleed: Mitigation: Enhanced IBRS
> [    0.868893] Speculative Store Bypass: Vulnerable
> [    0.870902] x86/fpu: x87 FPU will use FXSAVE
> [    0.896596] Freeing SMP alternatives memory: 40K
> [    0.897898] pid_max: default: 32768 minimum: 301
> [    0.900001] LSM: initializing lsm=capability,bpf
> [    0.901913] LSM support for eBPF active
> [    0.904221] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
> [    0.907152] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
> [    0.910834] Running RCU synchronous self tests
> [    0.911894] Running RCU synchronous self tests
> [    1.029893] APIC calibration not consistent with PM-Timer: 148ms instead of 100ms
> [    1.030890] APIC delta adjusted to PM-Timer: 6288407 (9311079)
> [    1.031135] smpboot: CPU0: 12th Gen Intel(R) Core(TM) i7-1255U (family: 0x6, model: 0x9a, stepping: 0x4)
> [    1.031890] RCU Tasks: Setting shift to 3 and lim to 1 rcu_task_cb_adjust=1.
> [    1.031890] RCU Tasks Rude: Setting shift to 3 and lim to 1 rcu_task_cb_adjust=1.
> [    1.031890] RCU Tasks Trace: Setting shift to 3 and lim to 1 rcu_task_cb_adjust=1.
> [    1.031890] Running RCU-tasks wait API self tests
> [    1.136395] Performance Events: unsupported p6 CPU model 154 no PMU driver, software events only.
> [    1.138894] signal: max sigframe size: 1440
> [    1.141093] rcu: Hierarchical SRCU implementation.
> [    1.142904] Callback from call_rcu_tasks_trace() invoked.
> [    1.144897] rcu: 	Max phase no-delay instances is 400.
> [    1.148535] NMI watchdog: Perf NMI watchdog permanently disabled
> [    1.150496] smp: Bringing up secondary CPUs ...
> [    1.153270] smpboot: x86: Booting SMP configuration:
> [    1.154900] .... node  #0, CPUs:      #1 #2 #3 #4 #5 #6 #7
> [    1.163890] TSC synchronization [CPU#0 -> CPU#1]:
> [    1.163890] Measured 66840 cycles TSC warp between CPUs, turning off TSC clock.
> [    1.163890] tsc: Marking TSC unstable due to check_tsc_sync_source failed
> [   26.569928] rcu_tasks_wait_gp: rcu_tasks grace period number 5 (since boot) is 25133 jiffies old.
> [   78.872032] Callback from call_rcu_tasks() invoked.
> [   91.327971] smp: Brought up 1 node, 8 CPUs
> [   96.402957] smpboot: Max logical packages: 1
> [  119.253936] Callback from call_rcu_tasks_rude() invoked.
> [  127.412964] smpboot: Total of 8 processors activated (41779.32 BogoMIPS)
> ----------------------------------------
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ