lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <57F363EC.8040304@kyup.com>
Date:   Tue, 4 Oct 2016 11:10:20 +0300
From:   Nikolay Borisov <kernel@...p.com>
To:     Peter Hurley <peter@...leysoftware.com>
Cc:     "Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>
Subject: more hangs in the tty layer

Hello Peter, 

I've emailed you before re. spurious hangs in the TTY layer, but 
at that time I was running a rather old (but LTS) 3.12 kernel. Now, 
I'm running a 4.4.10 and I still observe the following lock-ups. 
I have multiple processes which hang with the following callstack: 

[4592064.967445] sulogin         D ffff8839bca53b28     0 38279  38274 0x00000006
[4592064.967448]  ffff8839bca53b28 ffff881ff2ed44c0 ffff883748d85280 ffff8839bca54000
[4592064.967452]  ffff883748d85280 ffff881ff07c988c 00000000ffffffff ffff881ff07c9890
[4592064.967454]  ffff8839bca53b40 ffffffff8161020c ffff881ff07c9888 ffff8839bca53b50
[4592064.967458] Call Trace:
[4592064.967460]  [<ffffffff8161020c>] schedule+0x3c/0x90
[4592064.967463]  [<ffffffff81610505>] schedule_preempt_disabled+0x15/0x20
[4592064.967466]  [<ffffffff816121e0>] __mutex_lock_slowpath+0xb0/0x130
[4592064.967468]  [<ffffffff8161228d>] mutex_lock+0x2d/0x40
[4592064.967471]  [<ffffffff81614388>] tty_lock+0x38/0x90
[4592064.967474]  [<ffffffff8138e6a5>] tty_release+0x55/0x590
[4592064.967478]  [<ffffffff8119a8ea>] __fput+0xea/0x1f0
[4592064.967481]  [<ffffffff8119aa2e>] ____fput+0xe/0x10
[4592064.967485]  [<ffffffff8106fc6f>] task_work_run+0x7f/0xb0
[4592064.967488]  [<ffffffff810554d3>] do_exit+0x373/0xb70
[4592064.967490]  [<ffffffff816120c2>] ? __mutex_unlock_slowpath+0x32/0x40
[4592064.967493]  [<ffffffff816120eb>] ? mutex_unlock+0x1b/0x20
[4592064.967495]  [<ffffffff81390e5b>] ? n_tty_read+0x31b/0x9e0
[4592064.967498]  [<ffffffff81055d53>] do_group_exit+0x43/0xd0
[4592064.967500]  [<ffffffff81061bd1>] get_signal+0x2e1/0x710
[4592064.967503]  [<ffffffff810053c8>] do_signal+0x28/0x770
[4592064.967505]  [<ffffffff81394d36>] ? tty_ldisc_deref+0x16/0x20
[4592064.967508]  [<ffffffff8138cfbc>] ? tty_read+0x9c/0xf0
[4592064.967511]  [<ffffffff811987a8>] ? __vfs_read+0x28/0xd0
[4592064.967514]  [<ffffffff812a7d53>] ? security_file_permission+0xa3/0xc0
[4592064.967517]  [<ffffffff810021da>] exit_to_usermode_loop+0x8a/0xd0
[4592064.967520]  [<ffffffff81002a90>] syscall_return_slowpath+0x80/0xf0
[4592064.967523]  [<ffffffff816145f4>] int_ret_from_sys_call+0x25/0x9f

After debugging the memory dump of the server it turned out the 
following task is holding the mutex on which the sulogin is stuck:

PID: 38274  TASK: ffff88395a109b80  CPU: 13  COMMAND: "sh"
 #0 [ffff8837aba27ac8] __schedule at ffffffff8160fa9a
 #1 [ffff8837aba27b10] schedule at ffffffff8161020c
 #2 [ffff8837aba27b28] schedule_timeout at ffffffff81613472
 #3 [ffff8837aba27ba8] ldsem_down_write at ffffffff81613cc5
 #4 [ffff8837aba27c10] tty_ldisc_lock at ffffffff816142e6
 #5 [ffff8837aba27c28] tty_ldisc_hangup at ffffffff8139561a
 #6 [ffff8837aba27c50] __tty_hangup at ffffffff8138cd03
 #7 [ffff8837aba27ca0] disassociate_ctty at ffffffff8138d2b9
 #8 [ffff8837aba27cd0] disassociate_ctty at ffffffff8138d5d9
 #9 [ffff8837aba27ce0] do_exit at ffffffff81055908
#10 [ffff8837aba27d48] do_group_exit at ffffffff81055d53
#11 [ffff8837aba27d70] get_signal at ffffffff81061bd1
#12 [ffff8837aba27de8] do_signal at ffffffff810053c8
#13 [ffff8837aba27ee0] exit_to_usermode_loop at ffffffff810021da
#14 [ffff8837aba27f18] syscall_return_slowpath at ffffffff81002a90
#15 [ffff8837aba27f50] int_ret_from_sys_call at ffffffff816145f4
    RIP: 00007f18fde88aba  RSP: 00007ffc47a1fa78  RFLAGS: 00000246
    RAX: fffffffffffffe00  RBX: 0000561242611d20  RCX: 00007f18fde88aba
    RDX: 0000000000000000  RSI: 00007ffc47a1fa9c  RDI: 00000000ffffffff
    RBP: 000056124260f948   R8: 0000000000000000   R9: 00007f18fe3a5740
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007ffc47a1fb20
    R13: 00007ffc47a1faa0  R14: 0000000000000000  R15: 0000000000000001
    ORIG_RAX: 000000000000003d  CS: 0033  SS: 002b

But I also observed also the following hung-up task, 
waiting for read access: 

[4592064.968193] index.pl        D ffff881430387ce8     0 47168      1 0x00000004
[4592064.968196]  ffff881430387ce8 ffff881ff2ed0dc0 ffff8808a0960000 ffff881430388000
[4592064.968200]  ffff881ff07c9828 ffff881ff07c9830 ffff8808a0960000 7fffffffffffffff
[4592064.968203]  ffff881430387d00 ffffffff8161020c ffffffff00000004 ffff881430387d80
[4592064.968205] Call Trace:
[4592064.968208]  [<ffffffff8161020c>] schedule+0x3c/0x90
[4592064.968211]  [<ffffffff81613472>] schedule_timeout+0x202/0x260
[4592064.968216]  [<ffffffff812a6a48>] ? security_capable+0x48/0x60
[4592064.968219]  [<ffffffff81613ab2>] ldsem_down_read+0x122/0x240
[4592064.968222]  [<ffffffff813992cc>] ? vt_ioctl+0x6c/0x1460
[4592064.968225]  [<ffffffff81394c90>] tty_ldisc_ref_wait+0x20/0x50
[4592064.968228]  [<ffffffff8138da37>] tty_ioctl+0x417/0xd40
[4592064.968232]  [<ffffffff8115a976>] ? handle_mm_fault+0x19b6/0x19c0
[4592064.968236]  [<ffffffff811abe8f>] do_vfs_ioctl+0x30f/0x560
[4592064.968239]  [<ffffffff811af8f2>] ? dput+0x192/0x230
[4592064.968242]  [<ffffffff811ac159>] SyS_ioctl+0x79/0x90
[4592064.968245]  [<ffffffff8161449b>] entry_SYSCALL_64_fastpath+0x16/0x6e

I checked and all the tasks are working on the same tty_struct. Here
is the state of the struct: http://paste.ubuntu.com/23273973/

To me this seems like a lost wakeup event, especially that I've 
seen patches flying around fixing such issues. Unfortunately I 
cannot reliably reproduce this and booting a production machine
with lockdep would be problematic. 

Regards, 
Nikolay 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ