[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160827090328.GA9457@dator>
Date: Sat, 27 Aug 2016 11:03:29 +0200
From: Rabin Vincent <rabin@....in>
To: Vegard Nossum <vegard.nossum@...cle.com>
Cc: Jens Axboe <axboe@...com>, Tejun Heo <tj@...nel.org>,
Jan Kara <jack@...e.cz>, Al Viro <viro@...iv.linux.org.uk>,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race
On Sat, Aug 27, 2016 at 09:07:28AM +0200, Vegard Nossum wrote:
> I got this with syzkaller:
>
> general protection fault: 0000 [#1] PREEMPT SMP KASAN
> Dumping ftrace buffer:
> (ftrace buffer empty)
> CPU: 0 PID: 11941 Comm: syz-executor Not tainted 4.8.0-rc2+ #169
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
> task: ffff880110762cc0 task.stack: ffff880102290000
> RIP: 0010:[<ffffffff81f04b7a>] [<ffffffff81f04b7a>] blk_get_backing_dev_info+0x4a/0x70
> RSP: 0018:ffff880102297cd0 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90000bb4000
> RDX: 0000000000000097 RSI: 0000000000000000 RDI: 00000000000004b8
> RBP: ffff880102297cd8 R08: 0000000000000000 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000001 R12: ffff88011a010a90
> R13: ffff88011a594568 R14: ffff88011a010890 R15: 7fffffffffffffff
> FS: 00007f2445174700(0000) GS:ffff88011aa00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000200047c8 CR3: 0000000107eb5000 CR4: 00000000000006f0
> DR0: 000000000000001e DR1: 000000000000001e DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> Stack:
> 1ffff10020452f9e ffff880102297db8 ffffffff81508daa 0000000000000000
> 0000000041b58ab3 ffffffff844e89e1 ffffffff81508b30 ffffed0020452001
> 7fffffffffffffff 0000000000000000 0000000000000000 7fffffffffffffff
> Call Trace:
> [<ffffffff81508daa>] __filemap_fdatawrite_range+0x27a/0x2e0
> [<ffffffff81508b30>] ? filemap_check_errors+0xe0/0xe0
> [<ffffffff83c24b47>] ? preempt_schedule+0x27/0x30
> [<ffffffff810020ae>] ? ___preempt_schedule+0x16/0x18
> [<ffffffff81508e36>] filemap_fdatawrite+0x26/0x30
> [<ffffffff817191b0>] fdatawrite_one_bdev+0x50/0x70
> [<ffffffff817341b4>] iterate_bdevs+0x194/0x210
> [<ffffffff81719160>] ? fdatawait_one_bdev+0x70/0x70
> [<ffffffff817195f0>] ? sync_filesystem+0x240/0x240
> [<ffffffff817196be>] sys_sync+0xce/0x160
> [<ffffffff817195f0>] ? sync_filesystem+0x240/0x240
> [<ffffffff81002b60>] ? exit_to_usermode_loop+0x190/0x190
> [<ffffffff8150455a>] ? __context_tracking_exit.part.4+0x3a/0x1e0
> [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
> [<ffffffff83c3276a>] entry_SYSCALL64_slow_path+0x25/0x25
> Code: 89 fa 48 c1 ea 03 80 3c 02 00 75 35 48 8b 9b e0 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d bb b8 04 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 17 48 8b 83 b8 04 00 00 5b 5d 48 05 10 02 00 00
> RIP [<ffffffff81f04b7a>] blk_get_backing_dev_info+0x4a/0x70
> RSP <ffff880102297cd0>
>
> The problem is that sync() calls down to blk_get_backing_dev_info()
> without necessarily having the block device opened (if it races with
> another process doing close()):
>
> /**
> * blk_get_backing_dev_info - get the address of a queue's backing_dev_info
> * @bdev: device
> *
> * Locates the passed device's request queue and returns the address of its
> * backing_dev_info. This function can only be called if @bdev is opened <----
> * and the return value is never NULL.
> */
> struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev)
> {
> struct request_queue *q = bdev_get_queue(bdev);
>
> return &q->backing_dev_info;
> }
>
> bdev_get_queue() crashes on dereferencing bdev->bd_disk->queue because
> ->bd_disk was set to NULL when close() reached __blkdev_put().
>
> Taking bdev->bd_mutex and checking bdev->bd_disk actually seems to be a
> reliable test of whether it's safe to call filemap_fdatawrite() for the
> block device inode and completely fixes the crash for me.
>
> What I don't like about this patch is that it simply skips block devices
> which we don't have any open file descriptors for. That seems wrong to
> me because sync() should do writeback on (and wait for) _all_ devices,
> not just the ones that we happen to have an open file descriptor for.
> Imagine if we opened a device, wrote a lot of data to it, closed it,
> called sync(), and sync() returns. Now we should be guaranteed the data
> was written, but I'm not sure we are in this case. But maybe I've
> misunderstood how the writeback mechanism works.
>
> Another ugly thing is that we're now holding a new mutex over a
> potentially big chunk of code (everything that happens inside
> filemap_fdatawrite()). The only thing I can say is that it seems to
> work here.
>
> I have a reproducer that reliably triggers the problem in ~2 seconds so
> I can easily test other proposed fixes if there are any. I would also be
> happy to submit a new patch with some guidance on the Right Way to fix
> this.
>
> Cc: stable@...r.kernel.org
> Signed-off-by: Vegard Nossum <vegard.nossum@...cle.com>
Don't know what's the right fix, but I posted a slightly different one
for the same crash some months ago:
https://patchwork.kernel.org/patch/8556941/
Powered by blists - more mailing lists