linux-kernel - request_queue use-after-free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOi1vP9igmg_jayELVDq_NYXUHtByZz5a4M5+CqCMMHUxF5ofA@mail.gmail.com>
Date:	Mon, 16 Nov 2015 21:59:18 +0100
From:	Ilya Dryomov <idryomov@...il.com>
To:	Christoph Hellwig <hch@....de>, Tejun Heo <tj@...nel.org>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	linux-fsdevel@...r.kernel.org,
	Ceph Development <ceph-devel@...r.kernel.org>
Subject: request_queue use-after-free - inode_detach_wb()

Hello,

Last week, while running an rbd test which does a lot of maps and
unmaps (read losetup / losetup -d) with slab debugging enabled, I hit
the attached splat.  That 6a byte corresponds to the atomic_long_t
count of the percpu_ref refcnt in request_queue::backing_dev_info::wb,
pointing to a percpu_ref_put() on a freed memory.  It hasn't reproduced
since.

After a prolonged stare at rbd (we've just fixed an rbd vs sysfs
lifecycle issue, so I naturally assumed we either missed something or
it had something to do with that patch) I looked wider and concluded
that the most likely place a stray percpu_ref_put() could have come
from was inode_detach_wb().  It's called from __destroy_inode(), which
means iput(), which means bdput().

Looking at __blkdev_put(), the issue becomes clear: we are taking
precautions to flush before calling out to ->release() because, at
least according to the comment, ->release() can free queue; we are
recording owner pointer because put_disk() may free both gendisk and
queue, and then, after all that, we are calling bdput() which may
touch the queue through wb_put() in inode_detach_wb().  (The fun part
is wb_put() is supposed to be a noop for root wbs, but slab debugging
interferes with that by poisoning wb->bdi pointer.)

1514                  * dirty data before.
1515                  */
1516                 bdev_write_inode(bdev);
1517         }
1518         if (bdev->bd_contains == bdev) {
1519                 if (disk->fops->release)
1520                         disk->fops->release(disk, mode);
1521         }
1522         if (!bdev->bd_openers) {
1523                 struct module *owner = disk->fops->owner;
1524
1525                 disk_put_part(bdev->bd_part);
1526                 bdev->bd_part = NULL;
1527                 bdev->bd_disk = NULL;
1528                 if (bdev != bdev->bd_contains)
1529                         victim = bdev->bd_contains;
1530                 bdev->bd_contains = NULL;
1531
1532                 put_disk(disk); <-- may free q
1533                 module_put(owner);
1534         }
1535         mutex_unlock(&bdev->bd_mutex);
1536         bdput(bdev); <-- may touch q.backing_dev_info.wb

To reproduce, apply the attached patch (systemd-udevd condition is just
a convenience: udev reacts to change events by getting the bdev which
it then has to put), boot with slub_debug=,blkdev_queue and do:

$ sudo modprobe loop
$ sudo losetup /dev/loop0 foo.img
$ sudo dd if=/dev/urandom of=/dev/loop0 bs=1M count=1
$ sudo losetup -d /dev/loop0
$ sudo rmmod loop

(rmmod is key - it's the only way to get loop to do put_disk().  For
rbd, it's just rbd map - dd - rbd unmap.)

In the past we used to reassign to default_backing_dev_info here, but
it was nuked in b83ae6d42143 ("fs: remove mapping->backing_dev_info").
Shortly after that cgroup-specific writebacks patches from Tejun got
merged, adding inode::i_wb and inode_detach_wb() call.  The fix seems
to be to detach the inode earlier, but I'm not familiar enough with
cgroups code, so sending my findings instead of a patch.  Christoph,
Tejun?

Thanks,

                Ilya

View attachment "blkdev_queue-poison.txt" of type "text/plain" (25222 bytes)

View attachment "blkdev_put-delay.diff" of type "text/plain" (1580 bytes)