[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a8baf6415463d2ad20cf556c8148432e17b211e6@linux.dev>
Date: Wed, 04 Feb 2026 09:57:16 +0000
From: "Jiayuan Chen" <jiayuan.chen@...ux.dev>
To: "Greg Kroah-Hartman" <gregkh@...uxfoundation.org>
Cc: linux-serial@...r.kernel.org, "Jiayuan Chen" <jiayuan.chen@...pee.com>,
"Jiri Slaby" <jirislaby@...nel.org>, "Petr Mladek" <pmladek@...e.com>,
"Marcos Paulo de Souza" <mpdesouza@...e.com>, "Krzysztof Kozlowski"
<krzysztof.kozlowski@....qualcomm.com>, "Dr. David Alan Gilbert"
<linux@...blig.org>, "Joseph Tilahun" <jtilahun@...ranis.com>, "Sjur
Braendeland" <sjur.brandeland@...ricsson.com>, "David S. Miller"
<davem@...emloft.net>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] serial: core: fix infinite loop in handle_tx() for
PORT_UNKNOWN
February 4, 2026 at 16:53, "Greg Kroah-Hartman" <gregkh@...uxfoundation.org mailto:gregkh@...uxfoundation.org?to=%22Greg%20Kroah-Hartman%22%20%3Cgregkh%40linuxfoundation.org%3E > wrote:
>
> On Wed, Feb 04, 2026 at 08:29:06AM +0000, Jiayuan Chen wrote:
>
> >
> > 2026/2/4 16:20, "Greg Kroah-Hartman" <gregkh@...uxfoundation.org mailto:gregkh@...uxfoundation.org?to=%22Greg%20Kroah-Hartman%22%20%3Cgregkh%40linuxfoundation.org%3E > wrote:
> >
> >
> >
> > On Wed, Feb 04, 2026 at 03:43:20PM +0800, Jiayuan Chen wrote:
> >
> > >
> > > From: Jiayuan Chen <jiayuan.chen@...pee.com>
> > >
> > > uart_write_room() and uart_write() behave inconsistently when
> > > xmit_buf is NULL (which happens for PORT_UNKNOWN ports that were
> > > never properly initialized):
> > >
> > How does this happen? Why were they not initialized properly, what
> > drivers/hardware cause this?
> >
> >
> > In QEMU environment, /dev/ttyS3 is PORT_UNKNOWN type (no real UART hardware).
> > When uart_port_startup() sees uport->type == PORT_UNKNOWN, it returns early
> > without allocating xmit_buf:
> > if (uport->type == PORT_UNKNOWN)
> > return 1; // xmit_buf never allocated
> > So xmit_buf remains NULL.
> >
> But the flags for the port will have TTY_IO_ERROR set on it, which
> should hopefully mean that no data is attempted to be sent through this
> (or a ldisc would be bound to it.)
>
> How does this port work at all? Why is QEMU advertising a broken port
> that can not do anything?
>
> And is this the only place such a check would ever be needed? What
> changed recently to suddenly require this?
This is an artificially constructed reproducer. I chose
/dev/ttyS3 specifically because it's PORT_UNKNOWN in QEMU. In real-world
usage, users wouldn't do this intentionally.
> >
> > >
> > > - uart_write_room() returns kfifo_avail() which can be > 0
> > > - uart_write() checks xmit_buf and returns 0 if NULL
> > >
> > > This inconsistency causes an infinite loop in drivers that rely on
> > > tty_write_room() to determine if they can write:
> > >
> > > while (tty_write_room(tty) > 0) {
> > > written = tty->ops->write(...);
> > > // written is always 0, loop never exits
> > > }
> > >
> > > For example, caif_serial's handle_tx() enters an infinite loop when
> > > used with PORT_UNKNOWN serial ports, causing system hangs.
> > >
> > > Fix by making uart_write_room() also check xmit_buf and return 0 if
> > > it's NULL, consistent with uart_write().
> > >
> > > Reproducer: https://gist.github.com/mrpre/d9a694cc0e19828ee3bc3b37983fde13
> > >
> > > Fixes: 9b27105b4a44 ("net-caif-driver: add CAIF serial driver (ldisc)")
> > >
> > This really isn't a fix for that driver, but rather something else.
> >
> > You're right, this is awkward. The API inconsistency between uart_write_room()
> > and uart_write() has existed since 2.6.12, but it only became visible as a
> > deadloop when CAIF was introduced - because CAIF's handle_tx() relies on
> > tty_write_room() to decide whether to call write().
> > The fix location is in uart, but the trigger condition requires CAIF (or
> > similar drivers). I can remove the Fixes tag if you prefer.
> >
> Ok, I think this goes a bit deeper. This might be due to the kfifo
> rewrite of the serial drivers, as in older kernels we did not have a
> kfifo, so if it was not initialized the code checking path is much
> different.
>
> As a "check" can you see if this fails for you on the latest 5.10.y
> tree? That is before the kfifo code was added to the uart layer.
This issue still exists in 5.10.248
[ 56.519143] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [caif_deadloop_r:457]
[ 56.520868] Modules linked in:
[ 56.520903] CPU: 2 PID: 457 Comm: caif_deadloop_r Not tainted 5.10.248 #1
[ 56.520914] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 56.520971] RIP: 0010:_raw_spin_unlock_irqrestore+0x15/0x20
[ 56.520977] Code: e8 a0 5f 38 ff 4c 29 e8 49 39 c6 73 d8 80 0b 04 eb 8d cc cc cc 0f 1f 44 00 00 55 48 89 e5 e8 8a 4e 3b ff 66 90 48 89 f7 57 9d <0f> 1f 44 00 00 5d c3 cc cc cc cc 0f 1f 47
[ 56.520986] RSP: 0018:ffffc90000f8bb60 EFLAGS: 00000282
[ 56.520988] RAX: 0000000000000001 RBX: ffff888100b984e0 RCX: ffff8881024eb800
[ 56.520990] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000282
[ 56.520991] RBP: ffffc90000f8bb60 R08: ffff8881024eb800 R09: 0000000000000000
[ 56.520992] R10: ffff88810086ed00 R11: 0000000000000000 R12: 0000000000000080
[ 56.520993] R13: ffff888102423e10 R14: ffff8881024eb800 R15: ffffffff841eeb58
[ 56.520996] FS: 00007f5c618c7740(0000) GS:ffff888137c00000(0000) knlGS:0000000000000000
[ 56.520997] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 56.520998] CR2: 00007f1767cce200 CR3: 0000000008622005 CR4: 0000000000770ee0
[ 56.521003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 56.521004] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 56.521005] PKRU: 55555554
[ 56.521010] Call Trace:
[ 56.521087] uart_write+0x1ec/0x240
[ 56.521112] handle_tx+0x9a/0x1a0
[ 56.521115] caif_xmit+0x61/0x70
[ 56.521141] dev_hard_start_xmit+0xa6/0x1e0
[ 56.521144] __dev_queue_xmit+0x7b3/0xaa0
[ 56.521165] ? packet_parse_headers+0x17a/0x250
[ 56.521169] dev_queue_xmit+0x10/0x20
[ 56.521175] packet_sendmsg+0x8eb/0x1740
[ 56.521197] ? __wake_up_common_lock+0x88/0xc0
[ 56.521214] __sock_sendmsg+0x70/0x80
[ 56.521217] __sys_sendto+0x142/0x190
[ 56.521223] __x64_sys_sendto+0x24/0x30
[ 56.521233] do_syscall_64+0x37/0x50
[ 56.521236] entry_SYSCALL_64_after_hwframe+0x67/0xd1
[ 56.521251] RIP: 0033:0x7f5c619f60d7
[ 56.521276] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 75 ef 0d 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 50
[ 56.521277] RSP: 002b:00007ffd7a4f64b8 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 56.521279] RAX: ffffffffffffffda RBX: 00007ffd7a4f67a8 RCX: 00007f5c619f60d7
[ 56.521281] RDX: 0000000000000080 RSI: 00007ffd7a4f64f0 RDI: 0000000000000004
[ 56.521282] RBP: 00007ffd7a4f6680 R08: 00007ffd7a4f64d0 R09: 0000000000000014
[ 56.521283] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
[ 56.521285] R13: 0000000000000000 R14: 000055c2c648ed58 R15: 00007f5c61b1a000
$ scripts/decode_stacktrace.sh vmlinux < dmesg.txt
[ 56.519143] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [caif_deadloop_r:457]
[ 56.520868] Modules linked in:
[ 56.520903] CPU: 2 PID: 457 Comm: caif_deadloop_r Not tainted 5.10.248 #1
[ 56.520914] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 56.520971] RIP: 0010:_raw_spin_unlock_irqrestore (./arch/x86/include/asm/paravirt.h:653 ./include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
[ 56.520977] Code: e8 a0 5f 38 ff 4c 29 e8 49 39 c6 73 d8 80 0b 04 eb 8d cc cc cc 0f 1f 44 00 00 55 48 89 e5 e8 8a 4e 3b ff 66 90 48 89 f7 57 9d <0f> 1f 44 00 00 5d c3 cc cc cc cc 0f 1f 47
All code
========
0: e8 a0 5f 38 ff call 0xffffffffff385fa5
5: 4c 29 e8 sub %r13,%rax
8: 49 39 c6 cmp %rax,%r14
b: 73 d8 jae 0xffffffffffffffe5
d: 80 0b 04 orb $0x4,(%rbx)
10: eb 8d jmp 0xffffffffffffff9f
12: cc int3
13: cc int3
14: cc int3
15: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
1a: 55 push %rbp
1b: 48 89 e5 mov %rsp,%rbp
1e: e8 8a 4e 3b ff call 0xffffffffff3b4ead
23: 66 90 xchg %ax,%ax
25: 48 89 f7 mov %rsi,%rdi
28: 57 push %rdi
29: 9d popf
2a:* 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) <-- trapping instruction
2f: 5d pop %rbp
30: c3 ret
31: cc int3
32: cc int3
33: cc int3
34: cc int3
35: 0f .byte 0xf
36: 1f (bad)
37: 47 rex.RXB
Code starting with the faulting instruction
===========================================
0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
5: 5d pop %rbp
6: c3 ret
7: cc int3
8: cc int3
9: cc int3
a: cc int3
b: 0f .byte 0xf
c: 1f (bad)
d: 47 rex.RXB
[ 56.520986] RSP: 0018:ffffc90000f8bb60 EFLAGS: 00000282
[ 56.520988] RAX: 0000000000000001 RBX: ffff888100b984e0 RCX: ffff8881024eb800
[ 56.520990] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000282
[ 56.520991] RBP: ffffc90000f8bb60 R08: ffff8881024eb800 R09: 0000000000000000
[ 56.520992] R10: ffff88810086ed00 R11: 0000000000000000 R12: 0000000000000080
[ 56.520993] R13: ffff888102423e10 R14: ffff8881024eb800 R15: ffffffff841eeb58
[ 56.520996] FS: 00007f5c618c7740(0000) GS:ffff888137c00000(0000) knlGS:0000000000000000
[ 56.520997] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 56.520998] CR2: 00007f1767cce200 CR3: 0000000008622005 CR4: 0000000000770ee0
[ 56.521003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 56.521004] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 56.521005] PKRU: 55555554
[ 56.521010] Call Trace:
[ 56.521087] uart_write (drivers/tty/serial/serial_core.c:72 drivers/tty/serial/serial_core.c:598)
[ 56.521112] handle_tx (drivers/net/caif/caif_serial.c:237)
[ 56.521115] caif_xmit (drivers/net/caif/caif_serial.c:284)
[ 56.521141] dev_hard_start_xmit (./include/linux/netdevice.h:4833 ./include/linux/netdevice.h:4847 net/core/dev.c:3601 net/core/dev.c:3617)
[ 56.521144] __dev_queue_xmit (./include/linux/netdevice.h:3322 (discriminator 25) net/core/dev.c:4204 (discriminator 25))
[ 56.521165] ? packet_parse_headers (./include/linux/skbuff.h:2616 (discriminator 1) net/packet/af_packet.c:1954 (discriminator 1))
[ 56.521169] dev_queue_xmit (net/core/dev.c:4237)
[ 56.521175] packet_sendmsg (net/packet/af_packet.c:3086 (discriminator 1) net/packet/af_packet.c:3118 (discriminator 1))
[ 56.521197] ? __wake_up_common_lock (kernel/sched/wait.c:126 (discriminator 1))
[ 56.521214] __sock_sendmsg (net/socket.c:651 (discriminator 1) net/socket.c:663 (discriminator 1))
[ 56.521217] __sys_sendto (./include/linux/file.h:33 net/socket.c:2008)
[ 56.521223] __x64_sys_sendto (net/socket.c:2013)
[ 56.521233] do_syscall_64 (arch/x86/entry/common.c:46 (discriminator 1))
[ 56.521236] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:117)
[ 56.521251] RIP: 0033:0x7f5c619f60d7
[ 56.521276] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 75 ef 0d 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 50
All code
========
0: c7 c0 ff ff ff ff mov $0xffffffff,%eax
6: eb be jmp 0xffffffffffffffc6
8: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
f: 00 00 00
12: 90 nop
13: f3 0f 1e fa endbr64
17: 80 3d 75 ef 0d 00 00 cmpb $0x0,0xdef75(%rip) # 0xdef93
1e: 41 89 ca mov %ecx,%r10d
21: 74 10 je 0x33
23: b8 2c 00 00 00 mov $0x2c,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 69 ja 0x9b
32: c3 ret
33: 55 push %rbp
34: 48 89 e5 mov %rsp,%rbp
37: 50 push %rax
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 69 ja 0x71
8: c3 ret
9: 55 push %rbp
a: 48 89 e5 mov %rsp,%rbp
d: 50 push %rax
[ 56.521277] RSP: 002b:00007ffd7a4f64b8 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 56.521279] RAX: ffffffffffffffda RBX: 00007ffd7a4f67a8 RCX: 00007f5c619f60d7
[ 56.521281] RDX: 0000000000000080 RSI: 00007ffd7a4f64f0 RDI: 0000000000000004
[ 56.521282] RBP: 00007ffd7a4f6680 R08: 00007ffd7a4f64d0 R09: 0000000000000014
[ 56.521283] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
[ 56.521285] R13: 0000000000000000 R14: 000055c2c648ed58 R15: 00007f5c61b1a000
> >
> > > ---
> > > drivers/tty/serial/serial_core.c | 5 ++++-
> > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> > > index 2805cad10511..0b2edf185cc7 100644
> > > --- a/drivers/tty/serial/serial_core.c
> > > +++ b/drivers/tty/serial/serial_core.c
> > > @@ -643,7 +643,10 @@ static unsigned int uart_write_room(struct tty_struct *tty)
> > > unsigned int ret;
> > >
> > > port = uart_port_ref_lock(state, &flags);
> > > - ret = kfifo_avail(&state->port.xmit_fifo);
> > > + if (!state->port.xmit_buf)
> > >
> > This feels odd. What ports have no transmit buffers? And why would
> > this be the only check that is needed for such broken devices?
> >
> > Maybe let's fix the root cause here, the driver that does not have a
> > transmit buffer at all?
> >
> >
> > Do you suggest we should prevent setting line discipline (like N_CAIF)
> > on PORT_UNKNOWN ports? Or should CAIF check the port type before using it?
> > Note that CAIF is currently in orphan status (no active maintainer), so
> > I'm not sure about the process for modifying it. The serial core fix
> > might be more straightforward.
> >
> I think you found a real bug here, that is independent of the caif code,
> and might just be due to the kfifo stuff. See above for my questions
> here, and if so, your patch is correct, it's just that the Fixes: tag is
> a bit off.
>
> thanks,
>
> greg k-h
>
Powered by blists - more mailing lists