lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 23 Apr 2020 10:51:45 +0200 (CEST)
From:   Jiri Kosina <jikos@...nel.org>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
cc:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-rt-users <linux-rt-users@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        Daniel Wagner <dwagner@...e.de>
Subject: [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re:
 [ANNOUNCE] v5.4.28-rt19)

On Mon, 30 Mar 2020, Sebastian Andrzej Siewior wrote:

> I'm pleased to announce the v5.4.28-rt19 patch set. 

First, I don't believe this is necessarily a regression coming with this 
particular version, but this is the first kernel where I tried this and it 
crashed.

When running this kernel in KVM guest, and flooding its 8250-based ttyS0 
serial console quickly leads to

[   52.674358] 000: irq 4: nobody cared (try booting with the "irqpoll" option)
[   52.674381] 000: CPU: 0 PID: 155 Comm: irq/4-ttyS0 Not tainted 5.4.28-rt19+ #7
[   52.674389] 000: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-rebuilt.suse.com 04/01/2014
[   52.674398] 000: Call Trace:
[   52.674417] 000:  <IRQ>
[   52.674426] 000:  dump_stack+0x50/0x6b
[   52.674446] 000:  __report_bad_irq+0x2b/0xb0
[   52.674454] 000:  note_interrupt+0x22e/0x280
[   52.674457] 000:  ? io_serial_out+0x11/0x20
[   52.674463] 000:  handle_irq_event_percpu+0x6e/0x90
[   52.674471] 000:  handle_irq_event+0x48/0x90
[   52.674473] 000:  handle_edge_irq+0x95/0x1f0
[   52.674476] 000:  do_IRQ+0x6c/0x120
[   52.674487] 000:  common_interrupt+0xf/0xf
[   52.674496] 000:  </IRQ>
[   52.674497] 000: RIP: 0010:io_serial_out+0x11/0x20
[   52.674503] 000: Code: 8b 57 30 d3 e6 01 f2 ec 0f b6 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f b6 8f e1 00 00 00 89 d0 8b 57 30 d3 e6 01 f2 ee <c3> 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f b6 87 e2 00 00 00
[   52.674507] 000: RSP: 0018:ffffa860c03a7de8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdb
[   52.674508] 000: RAX: 0000000000000034 RBX: ffffffffb05a3bc0 RCX: 0000000000000000
[   52.674510] 000: RDX: 00000000000003f8 RSI: 0000000000000000 RDI: ffffffffb05a3bc0
[   52.674514] 000: RBP: ffff98c1b9718000 R08: ffffffffafa0b8e0 R09: ffffa860c03a7d80
[   52.674515] 000: R10: ffffa860c03a7c40 R11: 0000000000000000 R12: 0000000000000009
[   52.674516] 000: R13: 0000000000000020 R14: 0000000000000001 R15: ffff98c13714a8b8
[   52.674524] 000:  ? io_serial_in+0x20/0x20
[   52.674529] 000:  serial8250_tx_chars+0x121/0x310
[   52.674535] 000:  ? migrate_disable+0x33/0x90
1339913399133991[   52.674540] 000:  
serial8250_handle_irq.part.27+0xc2/0xe0
[   52.674543] 000:  serial8250_default_handle_irq+0x45/0x50
[   52.674545] 000:  serial8250_interrupt+0x51/0xa0
[   52.674548] 000:  ? irq_finalize_oneshot.part.46+0xd0/0xd0
[   52.674550] 000:  irq_forced_thread_fn+0x2b/0x70
[   52.674552] 000:  irq_thread+0xd8/0x170
[   52.674554] 000:  ? wake_threads_waitq+0x30/0x30
[   52.674556] 000:  ? irq_thread_dtor+0x90/0x90
[   52.674558] 000:  kthread+0x10e/0x130
[   52.674563] 000:  ? kthread_park+0x80/0x80
[   52.674565] 000:  ret_from_fork+0x35/0x40
[   52.674571] 000: handlers:
[   52.674579] 000: [<00000000d13c0583>] irq_default_primary_handler threaded [<00000000f172db97>] serial8250_interrupt
[   52.674580] 000: Disabling IRQ #4

(and things go downhill afterwards with RCU stall on the affected CPU, 
etc).

Reproducing it is as easy as firing up qemu with '-serial stdio', booting 
with console set to ttyS0, and then doing e.g.

	i=0; while test $i -lt 100000; do /usr/bin/echo $i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i; i=$((i+1)); done

in the serial console terminal. It explodes within a few tens of seconds.

Now, whenever this happens, we always seem to be interrupted in 
0010:io_serial_out+0x11/0x20, which is this in Code

	... d3 e6 01 f2 ee <c3> 66 66 2e 0f ...

where '0xee' is the actual outb. So it looks like outb would take so long 
that 99k IRQ4s trigger in the meantime, which would seem to point to KVM / 
qemu 
issue. 

*However* after either

- turning CONFIG_PREEMPT_RT off 

	or

- starting qemu-kvm without any '-smp' args (i.e. starting the guest as 
  UP)

this issue never reproduces any more; the reason for it almost certainly 
never triggering with !CONFIG_PREEMPT_RT is that the above pattern 
obviously can't happen, because we never interrupt that io_serial_out, 
because the spin_lock_irqsave() in serial8250_handle_irq() disables IRQ in 
this region. Why it doesn't reproduce on UP I don't understand.

I haven't done much debugging on this yet, but reporting this right away 
in case anyone has any idea what's happening here.

Thanks,

-- 
Jiri Kosina
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ