linux-kernel - Re: [PATCH] sched: Avoid that __wait_on_bit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <09659231-5e0e-2767-d180-4285fb6e12b3@sandisk.com>
Date:	Tue, 9 Aug 2016 16:10:02 -0700
From:	Bart Van Assche <bart.vanassche@...disk.com>
To:	Oleg Nesterov <oleg@...hat.com>
CC:	Peter Zijlstra <peterz@...radead.org>,
	"mingo@...nel.org" <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Johannes Weiner" <hannes@...xchg.org>, Neil Brown <neilb@...e.de>,
	Michael Shaver <jmshaver@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

On 08/09/2016 11:48 AM, Bart Van Assche wrote:
> [ 1548.018115] sysrq: SysRq : Show Blocked State
> [ 1548.018210]   task                        PC stack   pid father
> [ 1548.018677] systemd-udevd   D ffff8803a9f13be8     0 29908    483 0x00000000
> [ 1548.018792]  ffff8803a9f13be8 ffffffff82584bd0 00ffffff8252b1b0 ffff88046f0569c0
> [ 1548.018961]  ffff88016c98b140 ffff8800757bc9c0 ffff8803a9f14000 ffff88046f0569c0
> [ 1548.019131]  7fffffffffffffff ffffffff8161fcf0 ffff8803a9f13d50 ffff8803a9f13c00
> [ 1548.019316] Call Trace:
> [ 1548.019415]  [<ffffffff8161f567>] schedule+0x37/0x90
> [ 1548.019464]  [<ffffffff81623bbf>] schedule_timeout+0x27f/0x470
> [ 1548.019758]  [<ffffffff8161e93f>] io_schedule_timeout+0x9f/0x110
> [ 1548.019808]  [<ffffffff8161fd06>] bit_wait_io+0x16/0x60
> [ 1548.019856]  [<ffffffff8161f996>] __wait_on_bit+0x56/0x80
> [ 1548.019906]  [<ffffffff81152e1d>] wait_on_page_bit_killable+0xbd/0xc0
> [ 1548.020006]  [<ffffffff81152f50>] generic_file_read_iter+0x130/0x770
> [ 1548.020158]  [<ffffffff812134a0>] blkdev_read_iter+0x30/0x40
> [ 1548.020209]  [<ffffffff811d266b>] __vfs_read+0xbb/0x130
> [ 1548.020258]  [<ffffffff811d2a51>] vfs_read+0x91/0x130
> [ 1548.020305]  [<ffffffff811d3dd4>] SyS_read+0x44/0xa0
> [ 1548.020354]  [<ffffffff81624fa5>] entry_SYSCALL_64_fastpath+0x18/0xa8

(replying to my own e-mail)

The above call stack is probably caused by a missing I/O completion 
somewhere in the I/O stack (not in ib_srp) and hence can be ignored in 
the context of the discussion about __wait_on_bit_lock(). BTW, I have 
made the following local change in abort_exclusive_wait() in the hope 
that if I can trigger this statement that it will provide more 
information about why the __wait_on_bit_lock() hang happens:

diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index f0fdd8e..fad852d 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -280,6 +280,8 @@ void abort_exclusive_wait(wait_queue_head_t *q, 
wait_queue_t *wait,

         __set_current_state(TASK_RUNNING);
         spin_lock_irqsave(&q->lock, flags);
+       WARN_ONCE(!list_empty(&wait->task_list) && waitqueue_active(q),
+                 "mode = %#x\n", mode);
         if (!list_empty(&wait->task_list))
                 list_del_init(&wait->task_list);
         else if (waitqueue_active(q))

Bart.