[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YWPs13XFRCLl+ewj@kroah.com>
Date: Mon, 11 Oct 2021 09:50:47 +0200
From: Greg KH <gregkh@...uxfoundation.org>
To: "guanghui.fgh" <guanghuifeng@...ux.alibaba.com>
Cc: jirislaby@...nel.org, baolin.wang@...ux.alibaba.com,
linux-kernel@...r.kernel.org, zhuo.song@...ux.alibaba.com,
zhangliguang@...ux.alibaba.com
Subject: Re: [PATCH] tty: tty_buffer: Fix the softlockup issue in
flush_to_ldisc
On Mon, Oct 11, 2021 at 03:42:38PM +0800, guanghui.fgh wrote:
>
> 在 2021/10/10 21:18, Greg KH 写道:
> > On Fri, Oct 08, 2021 at 03:50:15PM +0800, guanghui.fgh wrote:
> > > 在 2021/9/30 13:38, Greg KH 写道:
> > > > On Thu, Sep 30, 2021 at 11:11:38AM +0800, Guanghui Feng wrote:
> > > > > When I run ltp testcase(ltp/testcases/kernel/pty/pty04.c) with arm64, there is a soft lockup,
> > > > > which look like this one:
> > > > >
> > > > > watchdog: BUG: soft lockup - CPU#41 stuck for 67s! [kworker/u192:2:106867]
> > > > > CPU: 41 PID: 106867 Comm: kworker/u192:2 Kdump: loaded Tainted: G OE 5.10.23 #1
> > > > > Hardware name: H3C R4960 G3/BC82AMDDA, BIOS 1.70 01/07/2021
> > > > > Workqueue: events_unbound flush_to_ldisc
> > > > > pstate: 00c00009 (nzcv daif +PAN +UAO -TCO BTYPE=--)
> > > > > pc : slip_unesc+0x80/0x214 [slip]
> > > > > lr : slip_receive_buf+0x84/0x100 [slip]
> > > > > sp : ffff80005274bce0
> > > > > x29: ffff80005274bce0 x28: 0000000000000000
> > > > > x27: ffff00525626fcc8 x26: ffff800011921078
> > > > > x25: 0000000000000000 x24: 0000000000000004
> > > > > x23: ffff00402b4059c0 x22: ffff00402b405940
> > > > > x21: ffff205d87b81e21 x20: ffff205d87b81b9b
> > > > > x19: 0000000000000000 x18: 0000000000000000
> > > > > x17: 0000000000000000 x16: 0000000000000000
> > > > > x15: 0000000000000000 x14: 5f5f5f5f5f5f5f5f
> > > > > x13: 5f5f5f5f5f5f5f5f x12: 5f5f5f5f5f5f5f5f
> > > > > x11: 5f5f5f5f5f5f5f5f x10: 5f5f5f5f5f5f5f5f
> > > > > x9 : ffff8000097d7628 x8 : ffff205d87b85e20
> > > > > x7 : 0000000000000000 x6 : 0000000000000001
> > > > > x5 : ffff8000097dc008 x4 : ffff8000097d75a4
> > > > > x3 : ffff205d87b81e1f x2 : 0000000000000005
> > > > > x1 : 000000000000005f x0 : ffff00402b405940
> > > > > Call trace:
> > > > > slip_unesc+0x80/0x214 [slip]
> > > > > tty_ldisc_receive_buf+0x64/0x80
> > > > > tty_port_default_receive_buf+0x50/0x90
> > > > > flush_to_ldisc+0xbc/0x110
> > > > > process_one_work+0x1d4/0x4b0
> > > > > worker_thread+0x180/0x430
> > > > > kthread+0x11c/0x120
> > > > > Kernel panic - not syncing: softlockup: hung tasks
> > > > > CPU: 41 PID: 106867 Comm: kworker/u192:2 Kdump: loaded Tainted: G OEL 5.10.23 #1
> > > > > Hardware name: H3C R4960 G3/BC82AMDDA, BIOS 1.70 01/07/2021
> > > > > Workqueue: events_unbound flush_to_ldisc
> > > > > Call trace:
> > > > > dump_backtrace+0x0/0x1ec
> > > > > show_stack+0x24/0x30
> > > > > dump_stack+0xd0/0x128
> > > > > panic+0x15c/0x374
> > > > > watchdog_timer_fn+0x2b8/0x304
> > > > > __run_hrtimer+0x88/0x2c0
> > > > > __hrtimer_run_queues+0xa4/0x120
> > > > > hrtimer_interrupt+0xfc/0x270
> > > > > arch_timer_handler_phys+0x40/0x50
> > > > > handle_percpu_devid_irq+0x94/0x220
> > > > > __handle_domain_irq+0x88/0xf0
> > > > > gic_handle_irq+0x84/0xfc
> > > > > el1_irq+0xc8/0x180
> > > > > slip_unesc+0x80/0x214 [slip]
> > > > > tty_ldisc_receive_buf+0x64/0x80
> > > > > tty_port_default_receive_buf+0x50/0x90
> > > > > flush_to_ldisc+0xbc/0x110
> > > > > process_one_work+0x1d4/0x4b0
> > > > > worker_thread+0x180/0x430
> > > > > kthread+0x11c/0x120
> > > > > SMP: stopping secondary CPUs
> > > > >
> > > > > In the testcase pty04, there are multple processes and we only pay close attention to the
> > > > > first three actually. The first process call the write syscall to send data to the pty master
> > > > > with all one's strength(tty_write->file_tty_write->do_tty_write->n_tty_write call chain).
> > > > > The second process call the read syscall to receive data by the pty slave(with PF_PACKET socket).
> > > > > The third process will wait a moment in which the first two processes will do there work and then
> > > > > it call ioctl to hangup the pty pair which will cease the first two process read/write to the pty.
> > > > > Before hangup the pty, the first process send data to the pty buffhead with high speed. At the same
> > > > > time if the workqueue is waken up, the workqueue will do the flush_to_ldisc to pop data from pty
> > > > > master's buffhead to line discipline in a loop until there is no more data left without any on one's
> > > > > own schedule which will result in doing work in flush_to_ldisc for a long time. As kernel configured
> > > > > without CONFIG_PREEMPT, there maybe occurs softlockup in the flush_to_ldisc. So I add cond_resched
> > > > > in the flush_to_ldisc while loop to avoid it.
> > > > Please properly wrap your changelog text at 72 columns.
> > > When I run ltp testcase(ltp/testcases/kernel/pty/pty04.c) with arm64, there is a soft lockup,
> > > which look like this one:
> > > Call trace:
> > > dump_backtrace+0x0/0x1ec
> > > show_stack+0x24/0x30
> > > dump_stack+0xd0/0x128
> > > panic+0x15c/0x374
> > > watchdog_timer_fn+0x2b8/0x304
> > > __run_hrtimer+0x88/0x2c0
> > > __hrtimer_run_queues+0xa4/0x120
> > > hrtimer_interrupt+0xfc/0x270
> > > arch_timer_handler_phys+0x40/0x50
> > > handle_percpu_devid_irq+0x94/0x220
> > > __handle_domain_irq+0x88/0xf0
> > > gic_handle_irq+0x84/0xfc
> > > el1_irq+0xc8/0x180
> > > slip_unesc+0x80/0x214 [slip]
> > > tty_ldisc_receive_buf+0x64/0x80
> > > tty_port_default_receive_buf+0x50/0x90
> > > flush_to_ldisc+0xbc/0x110
> > > process_one_work+0x1d4/0x4b0
> > > worker_thread+0x180/0x430
> > > kthread+0x11c/0x120
> > >
> > > In the testcase pty04, The first process call the write syscall to send data to the pty master.
> > > At the same time if the workqueue is waken up, the workqueue will do the flush_to_ldisc to pop data
> > > in a loop until there is no more data left which will result in doing work in flush_to_ldisc for a
> > > long time. As kernel configured without CONFIG_PREEMPT, there maybe occurs softlockup in the flush_to_ldisc.
> > Is this a "real" test for something that you have seen in a normal
> > workload? ltp is known for having buggy/confusing tests in it in the
> > past, you might wish to consult with the authors of that test.
>
> Firstly, thanks for your response.
>
> I have check the ltp pty testcase. At the same time, I find the pty
> softlockup in arm64, and it is similar to others.
>
> https://github.com/victronenergy/venus/issues/350
>
> https://groups.google.com/g/syzkaller-lts-bugs/c/SpkH8yH26js/m/3aifBl_GAwAJ
Hm, ok, can you please resubmit this based on the changes discussed in
thread and I will re-review it again.
thanks,
greg k-h
Powered by blists - more mailing lists