[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACVXFVNcJ+tC6RUT+JUA3iw+STB+q4P_u+AvoOSrt04zEw8TZA@mail.gmail.com>
Date: Thu, 7 Jun 2018 11:29:32 +0800
From: Ming Lei <tom.leiming@...il.com>
To: Ming Lei <ming.lei@...hat.com>
Cc: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
Jens Axboe <axboe@...nel.dk>,
Bart Van Assche <Bart.VanAssche@....com>,
Dmitry Vyukov <dvyukov@...gle.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-block <linux-block@...r.kernel.org>,
Johannes Thumshirn <jthumshirn@...e.de>,
alan.christopher.jenkins@...il.com,
syzbot+c4f9cebf9d651f6e54de@...kaller.appspotmail.com,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Dan Williams <dan.j.williams@...el.com>,
Christoph Hellwig <hch@....de>,
"=Oleksandr Natalenko" <oleksandr@...alenko.name>,
martin@...htvoll.de, Hannes Reinecke <hare@...e.com>,
syzkaller-bugs@...glegroups.com,
Ross Zwisler <ross.zwisler@...ux.intel.com>,
Keith Busch <keith.busch@...el.com>,
"open list:EXT4 FILE SYSTEM" <linux-ext4@...r.kernel.org>
Subject: Re: INFO: task hung in blk_queue_enter
On Tue, Jun 5, 2018 at 8:41 AM, Ming Lei <ming.lei@...hat.com> wrote:
> On Tue, Jun 05, 2018 at 09:27:41AM +0900, Tetsuo Handa wrote:
>> Jens Axboe wrote:
>> > On 6/1/18 4:10 AM, Tetsuo Handa wrote:
>> > > Tetsuo Handa wrote:
>> > >> Since sum of percpu_count did not change after percpu_ref_kill(), this is
>> > >> not a race condition while folding percpu counter values into atomic counter
>> > >> value. That is, for some reason, someone who is responsible for calling
>> > >> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is
>> > >> unable to call percpu_ref_put().
>> > >> But I don't know how to find someone who is failing to call percpu_ref_put()...
>> > >
>> > > I found the someone. It was already there in the backtrace...
>> > >
>> >
>> > Ahh, nicely spotted! One idea would be the one below. For this case,
>> > we're recursing, so we can either do a non-block queue enter, or we
>> > can just do a live enter.
>> >
>>
>> While "block: don't use blocking queue entered for recursive bio submits" was
>> already applied, syzbot is still reporting a hung task with same signature but
>> different trace.
>>
>> https://syzkaller.appspot.com/text?tag=CrashLog&x=1432cedf800000
>> ----------------------------------------
>> [ 492.512243] INFO: task syz-executor1:20263 blocked for more than 120 seconds.
>> [ 492.519604] Not tainted 4.17.0+ #83
>> [ 492.523793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 492.531787] syz-executor1 D23384 20263 4574 0x00000004
>> [ 492.537443] Call Trace:
>> [ 492.540041] __schedule+0x801/0x1e30
>> [ 492.580958] schedule+0xef/0x430
>> [ 492.610154] blk_queue_enter+0x8da/0xdf0
>> [ 492.716327] generic_make_request+0x651/0x1790
>> [ 492.765680] submit_bio+0xba/0x460
>> [ 492.793198] submit_bio_wait+0x134/0x1e0
>> [ 492.801891] blkdev_issue_flush+0x204/0x300
>> [ 492.806236] blkdev_fsync+0x93/0xd0
>> [ 492.813620] vfs_fsync_range+0x140/0x220
>> [ 492.817702] vfs_fsync+0x29/0x30
>> [ 492.821081] __loop_update_dio+0x4de/0x6a0
>> [ 492.825341] lo_ioctl+0xd28/0x2190
>> [ 492.833442] blkdev_ioctl+0x9b6/0x2020
>> [ 492.872146] block_ioctl+0xee/0x130
>> [ 492.880139] do_vfs_ioctl+0x1cf/0x16a0
>> [ 492.927550] ksys_ioctl+0xa9/0xd0
>> [ 492.931036] __x64_sys_ioctl+0x73/0xb0
>> [ 492.934952] do_syscall_64+0x1b1/0x800
>> [ 492.963624] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [ 493.212768] 1 lock held by syz-executor1/20263:
>> [ 493.217448] #0: 00000000956bf5a3 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8d/0x2190
>> ----------------------------------------
>>
>> Is it OK to call [__]loop_update_dio() between blk_mq_freeze_queue() and
>> blk_mq_unfreeze_queue(), for vfs_fsync() from __loop_update_dio() is calling
>> blk_queue_enter() after blk_mq_freeze_queue() started blocking blk_queue_enter()
>> by caling atomic_inc_return() and percpu_ref_kill() ?
>>
>
> The vfs_fsync() isn't necessary in loop_update_dio() since both
> generic_file_write_iter() and generic_file_read_iter() can handle
> buffered io vs dio well.
>
> I will send one patch to remove the vfs_sync() later.
Hi Tetsuo,
The issue might be fixed by removing this vfs_sync(), but I'd like to
understand the idea behind since vfs_sync() shouldn't have caused
any IO to this loop queue.
I also tried to do the test via the following c syzbot, but can't reproduce
it yet after running it for several hours.
https://syzkaller.appspot.com/x/repro.c?id=4727023951937536
Could you share us how you reproduce it?
Thanks,
Ming Lei
Powered by blists - more mailing lists