[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOi1vP9igmg_jayELVDq_NYXUHtByZz5a4M5+CqCMMHUxF5ofA@mail.gmail.com>
Date: Mon, 16 Nov 2015 21:59:18 +0100
From: Ilya Dryomov <idryomov@...il.com>
To: Christoph Hellwig <hch@....de>, Tejun Heo <tj@...nel.org>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
linux-fsdevel@...r.kernel.org,
Ceph Development <ceph-devel@...r.kernel.org>
Subject: request_queue use-after-free - inode_detach_wb()
Hello,
Last week, while running an rbd test which does a lot of maps and
unmaps (read losetup / losetup -d) with slab debugging enabled, I hit
the attached splat. That 6a byte corresponds to the atomic_long_t
count of the percpu_ref refcnt in request_queue::backing_dev_info::wb,
pointing to a percpu_ref_put() on a freed memory. It hasn't reproduced
since.
After a prolonged stare at rbd (we've just fixed an rbd vs sysfs
lifecycle issue, so I naturally assumed we either missed something or
it had something to do with that patch) I looked wider and concluded
that the most likely place a stray percpu_ref_put() could have come
from was inode_detach_wb(). It's called from __destroy_inode(), which
means iput(), which means bdput().
Looking at __blkdev_put(), the issue becomes clear: we are taking
precautions to flush before calling out to ->release() because, at
least according to the comment, ->release() can free queue; we are
recording owner pointer because put_disk() may free both gendisk and
queue, and then, after all that, we are calling bdput() which may
touch the queue through wb_put() in inode_detach_wb(). (The fun part
is wb_put() is supposed to be a noop for root wbs, but slab debugging
interferes with that by poisoning wb->bdi pointer.)
1514 * dirty data before.
1515 */
1516 bdev_write_inode(bdev);
1517 }
1518 if (bdev->bd_contains == bdev) {
1519 if (disk->fops->release)
1520 disk->fops->release(disk, mode);
1521 }
1522 if (!bdev->bd_openers) {
1523 struct module *owner = disk->fops->owner;
1524
1525 disk_put_part(bdev->bd_part);
1526 bdev->bd_part = NULL;
1527 bdev->bd_disk = NULL;
1528 if (bdev != bdev->bd_contains)
1529 victim = bdev->bd_contains;
1530 bdev->bd_contains = NULL;
1531
1532 put_disk(disk); <-- may free q
1533 module_put(owner);
1534 }
1535 mutex_unlock(&bdev->bd_mutex);
1536 bdput(bdev); <-- may touch q.backing_dev_info.wb
To reproduce, apply the attached patch (systemd-udevd condition is just
a convenience: udev reacts to change events by getting the bdev which
it then has to put), boot with slub_debug=,blkdev_queue and do:
$ sudo modprobe loop
$ sudo losetup /dev/loop0 foo.img
$ sudo dd if=/dev/urandom of=/dev/loop0 bs=1M count=1
$ sudo losetup -d /dev/loop0
$ sudo rmmod loop
(rmmod is key - it's the only way to get loop to do put_disk(). For
rbd, it's just rbd map - dd - rbd unmap.)
In the past we used to reassign to default_backing_dev_info here, but
it was nuked in b83ae6d42143 ("fs: remove mapping->backing_dev_info").
Shortly after that cgroup-specific writebacks patches from Tejun got
merged, adding inode::i_wb and inode_detach_wb() call. The fix seems
to be to detach the inode earlier, but I'm not familiar enough with
cgroups code, so sending my findings instead of a patch. Christoph,
Tejun?
Thanks,
Ilya
View attachment "blkdev_queue-poison.txt" of type "text/plain" (25222 bytes)
View attachment "blkdev_put-delay.diff" of type "text/plain" (1580 bytes)
Powered by blists - more mailing lists