linux-kernel - Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a4a122ba-ed36-f890-56c3-96d1660378a4@oracle.com>
Date:   Sat, 27 Aug 2016 11:30:22 +0200
From:   Vegard Nossum <vegard.nossum@...cle.com>
To:     Rabin Vincent <rabin@....in>
Cc:     Jens Axboe <axboe@...com>, Tejun Heo <tj@...nel.org>,
        Jan Kara <jack@...e.cz>, Al Viro <viro@...iv.linux.org.uk>,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        stable@...r.kernel.org
Subject: Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race

On 08/27/2016 11:03 AM, Rabin Vincent wrote:
> On Sat, Aug 27, 2016 at 09:07:28AM +0200, Vegard Nossum wrote:
>> I got this with syzkaller:
>>
>>     general protection fault: 0000 [#1] PREEMPT SMP KASAN
>>     Dumping ftrace buffer:
>>        (ftrace buffer empty)
>>     CPU: 0 PID: 11941 Comm: syz-executor Not tainted 4.8.0-rc2+ #169
>>     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
>>     task: ffff880110762cc0 task.stack: ffff880102290000
>>     RIP: 0010:[<ffffffff81f04b7a>]  [<ffffffff81f04b7a>] blk_get_backing_dev_info+0x4a/0x70
>>     RSP: 0018:ffff880102297cd0  EFLAGS: 00010202
>>     RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90000bb4000
>>     RDX: 0000000000000097 RSI: 0000000000000000 RDI: 00000000000004b8
>>     RBP: ffff880102297cd8 R08: 0000000000000000 R09: 0000000000000001
>>     R10: 0000000000000000 R11: 0000000000000001 R12: ffff88011a010a90
>>     R13: ffff88011a594568 R14: ffff88011a010890 R15: 7fffffffffffffff
>>     FS:  00007f2445174700(0000) GS:ffff88011aa00000(0000) knlGS:0000000000000000
>>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>     CR2: 00000000200047c8 CR3: 0000000107eb5000 CR4: 00000000000006f0
>>     DR0: 000000000000001e DR1: 000000000000001e DR2: 0000000000000000
>>     DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>>     Stack:
>>      1ffff10020452f9e ffff880102297db8 ffffffff81508daa 0000000000000000
>>      0000000041b58ab3 ffffffff844e89e1 ffffffff81508b30 ffffed0020452001
>>      7fffffffffffffff 0000000000000000 0000000000000000 7fffffffffffffff
>>     Call Trace:
>>      [<ffffffff81508daa>] __filemap_fdatawrite_range+0x27a/0x2e0
>>      [<ffffffff81508b30>] ? filemap_check_errors+0xe0/0xe0
>>      [<ffffffff83c24b47>] ? preempt_schedule+0x27/0x30
>>      [<ffffffff810020ae>] ? ___preempt_schedule+0x16/0x18
>>      [<ffffffff81508e36>] filemap_fdatawrite+0x26/0x30
>>      [<ffffffff817191b0>] fdatawrite_one_bdev+0x50/0x70
>>      [<ffffffff817341b4>] iterate_bdevs+0x194/0x210
>>      [<ffffffff81719160>] ? fdatawait_one_bdev+0x70/0x70
>>      [<ffffffff817195f0>] ? sync_filesystem+0x240/0x240
>>      [<ffffffff817196be>] sys_sync+0xce/0x160
>>      [<ffffffff817195f0>] ? sync_filesystem+0x240/0x240
>>      [<ffffffff81002b60>] ? exit_to_usermode_loop+0x190/0x190
>>      [<ffffffff8150455a>] ? __context_tracking_exit.part.4+0x3a/0x1e0
>>      [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
>>      [<ffffffff83c3276a>] entry_SYSCALL64_slow_path+0x25/0x25
>>     Code: 89 fa 48 c1 ea 03 80 3c 02 00 75 35 48 8b 9b e0 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d bb b8 04 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 17 48 8b 83 b8 04 00 00 5b 5d 48 05 10 02 00 00
>>     RIP  [<ffffffff81f04b7a>] blk_get_backing_dev_info+0x4a/0x70
>>      RSP <ffff880102297cd0>
>>
>> The problem is that sync() calls down to blk_get_backing_dev_info()
>> without necessarily having the block device opened (if it races with
>> another process doing close()):
>>
>>     /**
>>      * blk_get_backing_dev_info - get the address of a queue's backing_dev_info
>>      * @bdev:       device
>>      *
>>      * Locates the passed device's request queue and returns the address of its
>>      * backing_dev_info.  This function can only be called if @bdev is opened      <----
>>      * and the return value is never NULL.
>>      */
>>     struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev)
>>     {
>>             struct request_queue *q = bdev_get_queue(bdev);
>>
>>             return &q->backing_dev_info;
>>     }
>>
>> bdev_get_queue() crashes on dereferencing bdev->bd_disk->queue because
>> ->bd_disk was set to NULL when close() reached __blkdev_put().
>>
>> Taking bdev->bd_mutex and checking bdev->bd_disk actually seems to be a
>> reliable test of whether it's safe to call filemap_fdatawrite() for the
>> block device inode and completely fixes the crash for me.
>>
>> What I don't like about this patch is that it simply skips block devices
>> which we don't have any open file descriptors for. That seems wrong to
>> me because sync() should do writeback on (and wait for) _all_ devices,
>> not just the ones that we happen to have an open file descriptor for.
>> Imagine if we opened a device, wrote a lot of data to it, closed it,
>> called sync(), and sync() returns. Now we should be guaranteed the data
>> was written, but I'm not sure we are in this case. But maybe I've
>> misunderstood how the writeback mechanism works.
>>
>> Another ugly thing is that we're now holding a new mutex over a
>> potentially big chunk of code (everything that happens inside
>> filemap_fdatawrite()). The only thing I can say is that it seems to
>> work here.
>>
>> I have a reproducer that reliably triggers the problem in ~2 seconds so
>> I can easily test other proposed fixes if there are any. I would also be
>> happy to submit a new patch with some guidance on the Right Way to fix
>> this.
>>
>> Cc: stable@...r.kernel.org
>> Signed-off-by: Vegard Nossum <vegard.nossum@...cle.com>
>
> Don't know what's the right fix, but I posted a slightly different one
> for the same crash some months ago:
> https://patchwork.kernel.org/patch/8556941/
>

Ah, I'm sorry, I didn't see that.

Your patch is 100% identical to my first attempt at a fix and I can
confirm that it also fixes the problem for me.

If people who are more savvy in block/fs code could ack the locking bits
I think we should apply the patch ASAP because it's an easy local DOS if
you have (open/read) access to any block device.


Vegard