lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c99232fa-e861-48ec-8438-028e61d3b744@kernel.dk>
Date: Tue, 20 May 2025 14:18:28 -0600
From: Jens Axboe <axboe@...nel.dk>
To: John Ogness <john.ogness@...utronix.de>,
 LKML <linux-kernel@...r.kernel.org>
Cc: Petr Mladek <pmladek@...e.com>, Steven Rostedt <rostedt@...dmis.org>,
 "senozhatsky@...omium.org" <senozhatsky@...omium.org>,
 "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: printk NMI splat on boot

On 5/20/25 2:08 PM, John Ogness wrote:
> Added Cc: Paul E. McKenney <paulmck@...nel.org>
> 
> Hi Jens,
> 
> On 2025-05-20, Jens Axboe <axboe@...nel.dk> wrote:
>> This has been going on for a while, and finally getting around to
>> reporting it. For every boot on my Dell R7625, I get one of these:
> 
> Is it possible that you could compile with:
> 
> CONFIG_PRINTK_TIME=y
> CONFIG_PRINTK_CALLER=y
> 
> so that we can see some timestamps and from which CPU the printk calls
> are originating from?

Sure, let me try that right now. Here it is:

[   21.450298] [   T3193] ahci 0000:65:00.0: 4/4 ports implemented (port mask 0xf)
[   21.458971] [     T17] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 32-.... 96-.... } 3 jiffies s: 29 root: 0x44/.
[   21.458980] [     T17] rcu: blocking rcu_node structures (internal RCU debug): l=1:32-47:0x1/. l=1:96-111:0x1/.
[   21.458985] [     T17] Sending NMI from CPU 0 to CPUs 32:
[   21.458990] [     C32] NMI backtrace for cpu 32
[   21.458993] [     C32] CPU: 32 UID: 0 PID: 3193 Comm: kworker/32:1 Not tainted 6.15.0-rc7+ #255 NONE
[   21.458995] [     C32] Hardware name: Dell Inc. PowerEdge R7625/06444F, BIOS 1.11.2 12/18/2024
[   21.458997] [     C32] Workqueue: events work_for_cpu_fn
[   21.459002] [     C32] RIP: 0010:vprintk_emit+0x360/0x430
[   21.459004] [     C32] Code: 87 01 0f 84 91 00 00 00 c6 05 76 ec 8a 01 01 c6 05 7c ec 8a 01 00 0f b6 05 68 ec 8a 01 84 c0 74 0d f3 90 0f b6 05 5b ec 8a 01 <84> c0 75 f3 e8 47 0d 00 00 80 e7 02 74 01 fb c7 05 fb ec 8c 01 00
[   21.459006] [     C32] RSP: 0018:ffffaf75617b7b18 EFLAGS: 00000002
[   21.459007] [     C32] RAX: 0000000000000001 RBX: 0000000000000246 RCX: 0000000000000000
[   21.459008] [     C32] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffffff84019c48
[   21.459009] [     C32] RBP: ffffaf75617b7be8 R08: 00000000fffdffff R09: ffffa1a6e58fffa8
[   21.459009] [     C32] R10: ffffa1a6e4e00000 R11: 0000000000000002 R12: ffffaf75617b7b80
[   21.459010] [     C32] R13: ffffffff8360f9d1 R14: ffffaf75617b7bf8 R15: 0000000000000038
[   21.459011] [     C32] FS:  0000000000000000(0000) GS:ffffa10563a34000(0000) knlGS:0000000000000000
[   21.459012] [     C32] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   21.459012] [     C32] CR2: 0000000000000000 CR3: 000000270b02a001 CR4: 0000000000370ef0
[   21.459013] [     C32] Call Trace:
[   21.459015] [     C32]  <TASK>
[   21.459017] [     C32]  dev_vprintk_emit+0x130/0x140
[   21.459023] [     C32]  dev_printk_emit+0x3e/0x40
[   21.459025] [     C32]  ? __dev_printk+0x2d/0x70
[   21.459028] [     C32]  _dev_info+0x5c/0x5e
[   21.459031] [     C32]  ahci_print_info+0xab/0x30c [libahci]
[   21.459036] [     C32]  ? pci_conf1_read+0xae/0xf0
[   21.459039] [     C32]  ahci_init_one+0xa3a/0xbf5 [ahci]
[   21.459045] [     C32]  ? sched_balance_newidle+0x2aa/0x3c0
[   21.459048] [     C32]  local_pci_probe+0x3c/0x80
[   21.459050] [     C32]  work_for_cpu_fn+0x10/0x20
[   21.459052] [     C32]  process_one_work+0x140/0x2b0
[   21.459054] [     C32]  worker_thread+0x2ea/0x430
[   21.459056] [     C32]  ? process_one_work+0x2b0/0x2b0
[   21.459057] [     C32]  kthread+0xd4/0x1d0
[   21.459059] [     C32]  ? kthreads_online_cpu+0xf0/0xf0
[   21.459061] [     C32]  ret_from_fork+0x2d/0x50
[   21.459062] [     C32]  ? kthreads_online_cpu+0xf0/0xf0
[   21.459064] [     C32]  ret_from_fork_asm+0x11/0x20
[   21.459066] [     C32]  </TASK>
[   21.459989] [     T17] Sending NMI from CPU 0 to CPUs 96:
[   21.460026] [     C96] NMI backtrace for cpu 96
[   21.460028] [     C96] CPU: 96 UID: 0 PID: 503 Comm: kworker/96:0 Not tainted 6.15.0-rc7+ #255 NONE
[   21.460030] [     C96] Hardware name: Dell Inc. PowerEdge R7625/06444F, BIOS 1.11.2 12/18/2024
[   21.460032] [     C96] Workqueue: events work_for_cpu_fn
[   21.460037] [     C96] RIP: 0010:fbcon_putcs+0xf0/0x150
[   21.460041] [     C96] Code: e5 ff ff 4c 63 44 24 0c 48 89 ee 48 89 df 89 c2 44 8b 4c 24 08 4c 89 c0 48 c1 e0 04 4c 29 c0 48 8d 04 c5 40 71 06 84 8b 48 10 <0f> bf 40 0e 41 51 45 89 f9 52 4c 89 e2 4c 8b 54 24 10 44 01 f0 41
[   21.460042] [     C96] RSP: 0018:ffffaf755af47968 EFLAGS: 00000046
[   21.460044] [     C96] RAX: ffffffff84067140 RBX: ffffa0e780215c00 RCX: 0000000000000030
[   21.460045] [     C96] RDX: 0000000000000007 RSI: ffffa0ff018a3800 RDI: ffffa0e780215c00
[   21.460046] [     C96] RBP: ffffa0ff018a3800 R08: 0000000000000000 R09: 0000000000000000
[   21.460047] [     C96] R10: ffffffff82c88930 R11: 000000000000002f R12: ffffa0ff0cca9f5c
[   21.460048] [     C96] R13: 000000000000001f R14: 000000000000001e R15: 000000000000002e
[   21.460049] [     C96] FS:  0000000000000000(0000) GS:ffffa13563a34000(0000) knlGS:0000000000000000
[   21.460050] [     C96] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   21.460051] [     C96] CR2: 00007fe388c23e10 CR3: 0000007885bd2002 CR4: 0000000000370ef0
[   21.460052] [     C96] Call Trace:
[   21.460054] [     C96]  <TASK>
[   21.460055] [     C96]  ? bit_update_start+0x40/0x40
[   21.460059] [     C96]  fbcon_redraw+0x103/0x240
[   21.460061] [     C96]  fbcon_scroll+0x15a/0x1a0
[   21.460063] [     C96]  con_scroll+0xf4/0x200
[   21.460065] [     C96]  ? fbcon_putcs+0x117/0x150
[   21.460066] [     C96]  lf+0x83/0x90
[   21.460067] [     C96]  vt_console_print+0x2b9/0x420
[   21.460069] [     C96]  console_flush_all+0x2da/0x5b0
[   21.460071] [     C96]  console_unlock+0xd9/0x240
[   21.460072] [     C96]  vprintk_emit+0x37e/0x430
[   21.460073] [     C96]  dev_vprintk_emit+0x130/0x140
[   21.460076] [     C96]  ? kernfs_activate+0x80/0xa0
[   21.460078] [     C96]  dev_printk_emit+0x3e/0x40
[   21.460080] [     C96]  ? __dev_printk+0x2d/0x70
[   21.460084] [     C96]  _dev_info+0x5c/0x5e
[   21.460089] [     C96]  usb_add_hcd.cold+0xd2/0x57d [usbcore]
[   21.460103] [     C96]  usb_hcd_pci_probe+0x16f/0x3e0 [usbcore]
[   21.460114] [     C96]  xhci_pci_common_probe+0x5c/0x220 [xhci_pci]
[   21.460118] [     C96]  local_pci_probe+0x3c/0x80
[   21.460121] [     C96]  work_for_cpu_fn+0x10/0x20
[   21.460122] [     C96]  process_one_work+0x140/0x2b0
[   21.460124] [     C96]  worker_thread+0x2ea/0x430
[   21.460126] [     C96]  ? process_one_work+0x2b0/0x2b0
[   21.460128] [     C96]  kthread+0xd4/0x1d0
[   21.460130] [     C96]  ? kthreads_online_cpu+0xf0/0xf0
[   21.460131] [     C96]  ret_from_fork+0x2d/0x50
[   21.460133] [     C96]  ? kthreads_online_cpu+0xf0/0xf0
[   21.460134] [     C96]  ret_from_fork_asm+0x11/0x20
[   21.460136] [     C96]  </TASK>

>> bnxt_en 0000:01:00.2 eno12419np2: renamed from eth2
>> usb 1-1.2: new low-speed USB device number 3 using xhci_hcd
>> usb 1-1.2: New USB device found, idVendor=047b, idProduct=0011, bcdDevice= 1.00
>> usb 3-1: New USB device found, idVendor=1604, idProduct=10c0, bcdDevice= 0.00
>> usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 96-.... } 3 jiffies s: 53 root: 0x40/.
> 
> The expedited grace period has only been going on for 3 jiffies.
> 
> What values are you using for CONFIG_RCU_EXP_CPU_STALL_TIMEOUT and
> CONFIG_RCU_CPU_STALL_TIMEOUT?

CONFIG_RCU_CPU_STALL_TIMEOUT=21
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=2

> What CONFIG_*HZ* options are you using?

CONFIG_HZ=100

>> rcu: blocking rcu_node structures (internal RCU debug): l=1:96-111:0x1/.
>> Sending NMI from CPU 0 to CPUs 96:
>> NMI backtrace for cpu 96
>> CPU: 96 UID: 0 PID: 3241 Comm: kworker/96:1 Not tainted 6.15.0-rc7+ #254 NONE 
>> Hardware name: Dell Inc. PowerEdge R7625/06444F, BIOS 1.11.2 12/18/2024
>> Workqueue: usb_hub_wq hub_event [usbcore]
>> RIP: 0010:vprintk_emit+0x360/0x430
>> Code: 87 01 0f 84 91 00 00 00 c6 05 36 ec 8a 01 01 c6 05 3c ec 8a 01 00 0f b6 05 28 ec 8a 01 84 c0 74 0d f3 90 0f b6 05 1b ec 8a 01 <84> c0 75 f3 e8 47 0d 00 00 80 e7 02 74 01 fb c7 05 bb ec 8c 01 00
>> RSP: 0018:ffffbe47220c7b78 EFLAGS: 00000002
>> RAX: 0000000000000001 RBX: 0000000000000246 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffffff93019c48
>> RBP: ffffbe47220c7c48 R08: 00000000fffdffff R09: ffffa1bf658fffa8
>> R10: ffffa1bf64e00000 R11: 0000000000000002 R12: ffffbe47220c7be0
>> R13: ffffffff9260f8ee R14: ffffbe47220c7c58 R15: 0000000000000043
>> FS:  0000000000000000(0000) GS:ffffa14dd4a34000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000561066512398 CR3: 00000090a0a11002 CR4: 0000000000370ef0
>> Call Trace:
>>  <TASK>
>>  dev_vprintk_emit+0x130/0x140
>>  dev_printk_emit+0x3e/0x40
>>  ? __dev_printk+0x2d/0x70
>>  _dev_info+0x5c/0x5e
>>  usb_new_device.cold+0x87/0x3a0 [usbcore]
>>  hub_event+0x1113/0x1900 [usbcore]
>>  ? __schedule+0x4e5/0xaf0
>>  process_one_work+0x140/0x2b0
>>  worker_thread+0x2ea/0x430
>>  ? process_one_work+0x2b0/0x2b0
>>  kthread+0xd4/0x1d0
>>  ? kthreads_online_cpu+0xf0/0xf0
>>  ret_from_fork+0x2d/0x50
>>  ? kthreads_online_cpu+0xf0/0xf0
>>  ret_from_fork_asm+0x11/0x20
>>  </TASK>
>> usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
>> usb 1-1.2: Product: Standard USB Keyboard 
>> hub 3-1:1.0: USB hub found
>> usb 1-1.2: Manufacturer: Silitek
>>
>> which appears to be this spin section:
>>
>>         /* We spin waiting for the owner to release us */
>>         spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
>>         /* Owner will clear console_waiter on hand off */
>>         while (READ_ONCE(console_waiter))
>>                 cpu_relax();
>>         spin_release(&console_owner_dep_map, _THIS_IP_);
> 
> This code is waiting for a CPU to finish printing a line so that it can
> take over printing the next line.
> 
> If you are using CONFIG_HZ_1000 and a baud rate of 115200bps, you will
> only get about 14 characters printed per jiffy. The last line
> successfully printed has 68 characters and the expedited grace period
> timed out after only 3 jiffies (which would be around 42 characters). So
> the numbers and the timeout make sense. But why is the grace period so
> short?

I think I just used whatever defaults were there. At least I didn't set
anything low on purpose. I do have a serial console turned on, which I'm
sure would explain why it's so slow (even if not used).

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ