linux-kernel - Re: [PATCH] sched: Avoid that __wait_on_bit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8ca35562-a670-4fe5-fa46-7d1872d90299@sandisk.com>
Date:	Tue, 9 Aug 2016 16:56:15 -0700
From:	Bart Van Assche <bart.vanassche@...disk.com>
To:	Oleg Nesterov <oleg@...hat.com>,
	Bart Van Assche <bvanassche@....org>
CC:	Peter Zijlstra <peterz@...radead.org>,
	"mingo@...nel.org" <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Johannes Weiner" <hannes@...xchg.org>, Neil Brown <neilb@...e.de>,
	Michael Shaver <jmshaver@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

On 08/08/2016 09:20 AM, Oleg Nesterov wrote:
> So far _I think_ that the bug is somewhere else... Say, someone clears
> PG_locked without wake_up(). Then SIGKILL sent to the task sleeping in
> sys_read() "adds" the necessary wakeup...

Hello Oleg,

Something that puzzles me is that removing the "else" keyword from 
abort_exclusive_wait() is sufficient to avoid the hang. If there would 
be code that clears PG_locked without calling wake_up() this hang 
probably would also be triggered by workloads that do not wake up 
lock_page_killable() with a signal. BTW, the 
WARN_ONCE(!list_empty(&wait->task_list) && waitqueue_active(q), "mode = 
%#x\n", mode) statement that I added in abort_exclusive_wait() just 
produced the following call stack:

Aug  9 16:16:38 ion-dev-ib-ini kernel: WARNING: CPU: 0 PID: 14767 at kernel/sched/wait.c:284 abort_exclusive_wait+0xe3/0xf0
Aug  9 16:16:38 ion-dev-ib-ini kernel: mode = 0x82
Aug  9 16:16:38 ion-dev-ib-ini kernel:  [last unloaded: brd]
Aug  9 16:16:38 ion-dev-ib-ini kernel: CPU: 0 PID: 14767 Comm: kpartx Tainted: G        W       4.7.0-dbg+ #3
Aug  9 16:16:38 ion-dev-ib-ini kernel: Call Trace:
Aug  9 16:16:38 ion-dev-ib-ini kernel:  [<ffffffff81320157>] dump_stack+0x68/0xa1
Aug  9 16:16:38 ion-dev-ib-ini kernel:  [<ffffffff81061c46>] __warn+0xc6/0xe0
Aug  9 16:16:38 ion-dev-ib-ini kernel:  [<ffffffff81061caa>] warn_slowpath_fmt+0x4a/0x50
Aug  9 16:16:38 ion-dev-ib-ini kernel:  [<ffffffff810ab7d3>] abort_exclusive_wait+0xe3/0xf0
Aug  9 16:16:38 ion-dev-ib-ini kernel:  [<ffffffff8161fb61>] __wait_on_bit_lock+0x61/0xa0
Aug  9 16:16:38 ion-dev-ib-ini kernel:  [<ffffffff8114ff69>] __lock_page_killable+0xb9/0xc0
Aug  9 16:16:38 ion-dev-ib-ini kernel:  [<ffffffff8115305a>] generic_file_read_iter+0x1ea/0x770
Aug  9 16:16:38 ion-dev-ib-ini kernel:  [<ffffffff812134f0>] blkdev_read_iter+0x30/0x40
Aug  9 16:16:38 ion-dev-ib-ini kernel:  [<ffffffff811d26bb>] __vfs_read+0xbb/0x130
Aug  9 16:16:38 ion-dev-ib-ini kernel:  [<ffffffff811d2aa1>] vfs_read+0x91/0x130
Aug  9 16:16:38 ion-dev-ib-ini kernel:  [<ffffffff811d3e24>] SyS_read+0x44/0xa0
Aug  9 16:16:38 ion-dev-ib-ini kernel:  [<ffffffff81624fe5>] entry_SYSCALL_64_fastpath+0x18/0xa8

(gdb) list *(generic_file_read_iter+0x1ea)
0xffffffff8115305a is in do_generic_file_read (mm/filemap.c:1730).
1725                    continue;
1726
1727    page_not_up_to_date:
1728                    /* Get exclusive access to the page ... */
1729                    error = lock_page_killable(page);
1730                    if (unlikely(error))
1731                            goto readpage_error;
1732
1733    page_not_up_to_date_locked:
1734                    /* Did it get truncated before we got the lock? */

Apparently the task that hangs is the same task as the one that
received the signal (PID 14767; state "D" = TASK_UNINTERRUPTIBLE):

[ 3718.134118] sysrq: SysRq : Show Blocked State
[ 3718.136234] kpartx          D ffff8803c7767838     0 14767      1 0x00000006
[ 3718.136928] Call Trace:
[ 3718.137089]  [<ffffffff8161f5b7>] schedule+0x37/0x90
[ 3718.137142]  [<ffffffff81623c0f>] schedule_timeout+0x27f/0x470
[ 3718.137603]  [<ffffffff8161e98f>] io_schedule_timeout+0x9f/0x110
[ 3718.137662]  [<ffffffff8161fd56>] bit_wait_io+0x16/0x60
[ 3718.137714]  [<ffffffff8161fb49>] __wait_on_bit_lock+0x49/0xa0
[ 3718.137764]  [<ffffffff8114fea9>] __lock_page+0xb9/0xc0
[ 3718.137865]  [<ffffffff81165dd0>] truncate_inode_pages_range+0x3e0/0x760
[ 3718.138175]  [<ffffffff81166160>] truncate_inode_pages+0x10/0x20
[ 3718.138477]  [<ffffffff81212a60>] kill_bdev+0x30/0x40
[ 3718.138529]  [<ffffffff81213d81>] __blkdev_put+0x71/0x360
[ 3718.138631]  [<ffffffff812140b9>] blkdev_put+0x49/0x170
[ 3718.138681]  [<ffffffff81214200>] blkdev_close+0x20/0x30
[ 3718.138732]  [<ffffffff811d4928>] __fput+0xe8/0x1f0
[ 3718.138782]  [<ffffffff811d4a69>] ____fput+0x9/0x10
[ 3718.138834]  [<ffffffff810842d3>] task_work_run+0x83/0xb0
[ 3718.138886]  [<ffffffff8106606e>] do_exit+0x3ee/0xc40
[ 3718.138987]  [<ffffffff8106694b>] do_group_exit+0x4b/0xc0
[ 3718.139038]  [<ffffffff81073d9a>] get_signal+0x2ca/0x940
[ 3718.139142]  [<ffffffff8101bf43>] do_signal+0x23/0x660
[ 3718.139247]  [<ffffffff810022b3>] exit_to_usermode_loop+0x73/0xb0
[ 3718.139297]  [<ffffffff81002cb0>] syscall_return_slowpath+0xb0/0xc0
[ 3718.139349]  [<ffffffff81625073>] entry_SYSCALL_64_fastpath+0xa6/0xa8

I'll try to see whether this behavior is reproducible.

Bart.