lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1536318728.23560.98.camel@arista.com>
Date:   Fri, 07 Sep 2018 12:12:08 +0100
From:   Dmitry Safonov <dima@...sta.com>
To:     Jiri Slaby <jslaby@...e.cz>,
        kernel test robot <rong.a.chen@...el.com>
Cc:     lkp@...org, Daniel Axtens <dja@...ens.net>,
        Dmitry Safonov <0x7f454c46@...il.com>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Tan Xiaojun <tanxiaojun@...wei.com>,
        Peter Hurley <peter@...leysoftware.com>,
        Pasi Kärkkäinen <pasik@....fi>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Michael Neuling <mikey@...ling.org>,
        Mikulas Patocka <mpatocka@...hat.com>,
        linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [LKP] [tty] 0b4f83d510: INFO:task_blocked_for_more_than#seconds

On Fri, 2018-09-07 at 08:39 +0200, Jiri Slaby wrote:
> On 09/07/2018, 06:50 AM, kernel test robot wrote:
> > FYI, we noticed the following commit (built with gcc-7):
> > 
> > commit: 0b4f83d510f6fef6bb9da25f122c8d733d50516f ("[PATCH 2/4] tty:
> > Hold tty_ldisc_lock() during tty_reopen()")
> > url: https://github.com/0day-ci/linux/commits/Dmitry-Safonov/tty-Ho
> > ld-write-ldisc-sem-in-tty_reopen/20180829-165618
> > base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git
> > tty-testing
> > 
> > in testcase: trinity
> > with following parameters:
> > 
> > 	runtime: 300s
> > 
> > test-description: Trinity is a linux system call fuzz tester.
> > test-url: http://codemonkey.org.uk/projects/trinity/
> > 
> > 
> > on test machine: qemu-system-x86_64 -enable-kvm -m 256M
> > 
> > caused below changes (please refer to attached dmesg/kmsg for
> > entire log/backtrace):
> > 
> > 
> > +--------------------------------------------------+------------+
> > ------------+
> > >                                                  | 58dd163974 |
> > > 0b4f83d510 |
> > 
> > +--------------------------------------------------+------------+
> > ------------+
> > > boot_successes                                   | 14         |
> > > 4          |
> > > boot_failures                                    | 0          |
> > > 6          |
> > > INFO:task_blocked_for_more_than#seconds          | 0          |
> > > 6          |
> > > Kernel_panic-not_syncing:hung_task:blocked_tasks | 0          |
> > > 6          |
> > 
> > +--------------------------------------------------+------------+
> > ------------+
> > 
> > 
> > 
> > [  244.816801] INFO: task validate_data:655 blocked for more than
> > 120 seconds.
> > [  244.818833]       Not tainted 4.18.0-11684-g0b4f83d #1
> > [  244.820028] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [  244.826965] validate_data   D    0   655    623 0x20020000
> > [  244.828279] Call Trace:
> > [  244.828958]  ? __schedule+0x843/0x950
> > [  244.830173]  ? __ldsem_down_read_nested+0x1c4/0x3b0
> > [  244.834903]  schedule+0x31/0x70
> > [  244.835665]  schedule_timeout+0x34/0x760
> > [  244.836613]  ? ftrace_likely_update+0x35/0x60
> > [  244.837683]  ? __ldsem_down_read_nested+0x1c4/0x3b0
> > [  244.838818]  ? ftrace_likely_update+0x35/0x60
> > [  244.840127]  ? ftrace_likely_update+0x35/0x60
> > [  244.845947]  ? __ldsem_down_read_nested+0x1c4/0x3b0
> > [  244.847882]  __ldsem_down_read_nested+0x23a/0x3b0
> > [  244.849886]  ? tty_ldisc_ref_wait+0x25/0x50
> > [  244.853807]  tty_ldisc_ref_wait+0x25/0x50
> > [  244.854946]  tty_compat_ioctl+0x8a/0x120
> > [  244.855928]  ? this_tty+0x80/0x80
> > [  244.856742]  __ia32_compat_sys_ioctl+0xc28/0x1ce0
> > [  244.857981]  do_int80_syscall_32+0x1d2/0x5f0
> > [  244.859003]  entry_INT80_compat+0x88/0xa0
> > [  244.859972] INFO: task dnsmasq:668 blocked for more than 120
> > seconds.
> > [  244.868315]       Not tainted 4.18.0-11684-g0b4f83d #1
> > [  244.869583] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [  244.871744] dnsmasq         D    0   668      1 0x20020000
> > [  244.873063] Call Trace:
> > [  244.873697]  ? __schedule+0x843/0x950
> > [  244.874572]  ? __ldsem_down_read_nested+0x1c4/0x3b0
> > [  244.875725]  schedule+0x31/0x70
> > [  244.876576]  schedule_timeout+0x34/0x760
> > [  244.877573]  ? ftrace_likely_update+0x35/0x60
> > [  244.878660]  ? __ldsem_down_read_nested+0x1c4/0x3b0
> > [  244.879872]  ? ftrace_likely_update+0x35/0x60
> > [  244.890522]  ? ftrace_likely_update+0x35/0x60
> > [  244.891572]  ? __ldsem_down_read_nested+0x1c4/0x3b0
> > [  244.892746]  __ldsem_down_read_nested+0x23a/0x3b0
> > [  244.893861]  ? tty_ldisc_ref_wait+0x25/0x50
> > [  244.894841]  tty_ldisc_ref_wait+0x25/0x50
> > [  244.895911]  tty_compat_ioctl+0x8a/0x120
> > [  244.896916]  ? this_tty+0x80/0x80
> > [  244.897717]  __ia32_compat_sys_ioctl+0xc28/0x1ce0
> > [  244.898821]  do_int80_syscall_32+0x1d2/0x5f0
> > [  244.899830]  entry_INT80_compat+0x88/0xa0
> > [  244.909466] INFO: task dropbear:734 blocked for more than 120
> > seconds.
> > [  244.911173]       Not tainted 4.18.0-11684-g0b4f83d #1
> > [  244.912394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [  244.914176] dropbear        D    0   734      1 0x20020000
> > [  244.915446] Call Trace:
> > [  244.916068]  ? __schedule+0x843/0x950
> > [  244.916945]  ? __ldsem_down_read_nested+0x1c4/0x3b0
> > [  244.918076]  schedule+0x31/0x70
> > [  244.918832]  schedule_timeout+0x34/0x760
> > [  244.919781]  ? ftrace_likely_update+0x35/0x60
> > [  244.921104]  ? __ldsem_down_read_nested+0x1c4/0x3b0
> > [  244.922304]  ? ftrace_likely_update+0x35/0x60
> > [  244.923347]  ? ftrace_likely_update+0x35/0x60
> > [  244.924369]  ? __ldsem_down_read_nested+0x1c4/0x3b0
> > [  244.925496]  __ldsem_down_read_nested+0x23a/0x3b0
> > [  244.926598]  ? tty_ldisc_ref_wait+0x25/0x50
> > [  244.927578]  tty_ldisc_ref_wait+0x25/0x50
> > [  244.928526]  tty_compat_ioctl+0x8a/0x120
> > [  244.929449]  ? this_tty+0x80/0x80
> > [  244.930240]  __ia32_compat_sys_ioctl+0xc28/0x1ce0
> > [  244.940083]  do_int80_syscall_32+0x1d2/0x5f0
> > [  244.941310]  entry_INT80_compat+0x88/0xa0
> > [  244.944070] 
> > [  244.944070] Showing all locks held in the system:
> > [  244.945558] 1 lock held by khungtaskd/18:
> > [  244.946495]  #0: (____ptrval____) (rcu_read_lock){....}, at:
> > debug_show_all_locks+0x16/0x190
> > [  244.948503] 2 locks held by askfirst/235:
> > [  244.949439]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
> > tty_ldisc_ref_wait+0x25/0x50
> > [  244.951762]  #1: (____ptrval____) (&ldata-
> > >atomic_read_lock){+.+.}, at: n_tty_read+0x13d/0xa00
> 
> Here, it just seems to wait for input from the user.
> 
> > [  244.953799] 1 lock held by validate_data/655:
> > [  244.954814]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
> > tty_ldisc_ref_wait+0x25/0x50
> > [  244.956764] 1 lock held by dnsmasq/668:
> > [  244.957649]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
> > tty_ldisc_ref_wait+0x25/0x50
> > [  244.959598] 1 lock held by dropbear/734:
> > [  244.967564]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
> > tty_ldisc_ref_wait+0x25/0x50
> 
> Hmm, I assume there is another task waiting for write_ldsem and that
> one
> prevents these readers to proceed. Unfortunately, due to the defunct
> __ptrval__ pointer hashing crap, we do not see who is waiting for
> what.
> But I am guessing all are the same locks.

Hmm, and the writer is not able to grab it because there is already
reader who uses console, I suppose.
I wonder if we can add re-acquiring for the reader side..
That will make write acquire less likely to fail under the load.
I'll try to reproduce it.

> 
> So I think, we are forced to limit the waiting to 5 seconds in reopen
> in
> the end too (the same as we do for new open in tty_init_dev).
> 
> Dmitry, could you add the limit and handle the return value of
> tty_ldisc_lock now?

Yeah, will do.

-- 
Thanks,
             Dmitry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ