[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ybegi5slVJz3ITx0@elver.google.com>
Date: Mon, 13 Dec 2021 20:35:39 +0100
From: Marco Elver <elver@...gle.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Will Deacon <will@...nel.org>,
Boqun Feng <boqun.feng@...il.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
the arch/x86 maintainers <x86@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Kees Cook <keescook@...omium.org>,
Christoph Hellwig <hch@...radead.org>,
Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH v2 8/9] atomic,x86: Alternative atomic_*_overflow() scheme
On Mon, Dec 13, 2021 at 10:24AM -0800, Linus Torvalds wrote:
> On Mon, Dec 13, 2021 at 10:19 AM Marco Elver <elver@...gle.com> wrote:
> >
> > I'm still genuinely worried about this:
> >
> > > 2. Yet another potentially larger issue is if some code
> > > kmalloc()s some structs containing refcount_t, and relies on
> > > GFP_ZERO (kzalloc()) to initialize their data assuming that a
> > > freshly initialized refcount_t contains 0.
> >
> > Even with everything properly wrapped up in atomic_ref_t, it's not going
> > to prevent mis-initialization via kzalloc() and friends.
>
> I agree that it's an issue, but it's not a new issue. We've had the
> exact same thing with a lot of other core data structures.
>
> And a ref-count of zero isn't valid _anyway_. When you allocate a
> structure, a zero ref-count by definition is wrong. You need to set
> the ref-count to the user that allocated it.
>
> So I don't actually think the "implicit zero" is an issue in practice,
> because it would be wrong in the first place. Code that relies on
> kzmalloc() to initialize a refcount cannot work right.
>
> (And by "cannot" I obviously mean "can, if you do wrong things" - it's
> not like it's *impossible* to do an "atomic_inc_ref()" to change a 0
> refcount to a 1, but it's both wrong *AND* actively stupid, since an
> allocation does not need to set the refcount atomically).
I tried to throw syzkaller at this, but just booting results in this
warning:
| WARNING: CPU: 2 PID: 1 at block/blk-mq.c:3080 blk_mq_clear_flush_rq_mapping block/blk-mq.c:3080 [inline]
| WARNING: CPU: 2 PID: 1 at block/blk-mq.c:3080 blk_mq_exit_hctx+0x20e/0x220 block/blk-mq.c:3105
| Modules linked in:
| CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc5+ #3
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
| RIP: 0010:blk_mq_clear_flush_rq_mapping block/blk-mq.c:3080 [inline]
| RIP: 0010:blk_mq_exit_hctx+0x20e/0x220 block/blk-mq.c:3105
| Code: 00 31 d2 bf 9c 00 00 00 e8 7f 20 a2 ff e9 52 ff ff ff e8 65 22 b1 ff 4c 89 e7 e8 6d 77 00 00 e9 58 fe ff ff e8 53 22 b1 ff 90 <0f> 0b 90 e9 7e fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 41 55 41 54
| RSP: 0000:ffff9c10800139f0 EFLAGS: 00010293
| RAX: 0000000000000000 RBX: ffff970c428990b8 RCX: ffffffff9c662ecd
| RDX: ffff970c40880040 RSI: 0000000000000000 RDI: ffff970c4421e200
| RBP: ffff970c43ff9e80 R08: ffff970c43f14fe0 R09: ffff9c1080013a18
| R10: 0000000000000000 R11: 000000000001403d R12: ffff970c4421e200
| R13: ffff970c44237000 R14: 0000000000000100 R15: ffff970c4421e200
| FS: 0000000000000000(0000) GS:ffff970f6fc80000(0000) knlGS:0000000000000000
| CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: 0000000000000000 CR3: 00000002c2c0c001 CR4: 0000000000770ee0
| DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
| PKRU: 55555554
| Call Trace:
| <TASK>
| blk_mq_exit_hw_queues block/blk-mq.c:3130 [inline]
| blk_mq_exit_queue+0x70/0x1a0 block/blk-mq.c:3780
| blk_cleanup_queue+0xba/0x120 block/blk-core.c:368
| __scsi_remove_device+0x77/0x190 drivers/scsi/scsi_sysfs.c:1458
| scsi_probe_and_add_lun+0xb5c/0xfa0 drivers/scsi/scsi_scan.c:1221
| __scsi_scan_target+0x121/0x660 drivers/scsi/scsi_scan.c:1604
| scsi_scan_channel drivers/scsi/scsi_scan.c:1692 [inline]
| scsi_scan_channel+0x90/0xd0 drivers/scsi/scsi_scan.c:1668
| scsi_scan_host_selected+0x117/0x170 drivers/scsi/scsi_scan.c:1721
| do_scsi_scan_host+0xba/0xd0 drivers/scsi/scsi_scan.c:1860
| scsi_scan_host drivers/scsi/scsi_scan.c:1890 [inline]
| scsi_scan_host+0x1cf/0x1f0 drivers/scsi/scsi_scan.c:1878
| virtscsi_probe+0x3a0/0x3f0 drivers/scsi/virtio_scsi.c:915
| virtio_dev_probe+0x1e4/0x2f0 drivers/virtio/virtio.c:273
| call_driver_probe drivers/base/dd.c:517 [inline]
| really_probe.part.0+0xd3/0x320 drivers/base/dd.c:596
| really_probe drivers/base/dd.c:558 [inline]
| __driver_probe_device+0xbf/0x180 drivers/base/dd.c:751
| driver_probe_device+0x27/0xd0 drivers/base/dd.c:781
| __driver_attach drivers/base/dd.c:1140 [inline]
| __driver_attach+0xb4/0x1a0 drivers/base/dd.c:1092
| bus_for_each_dev+0xab/0x100 drivers/base/bus.c:301
| bus_add_driver+0x1c0/0x220 drivers/base/bus.c:618
| driver_register+0xc4/0x140 drivers/base/driver.c:171
| init+0xa0/0xea drivers/char/virtio_console.c:2257
| do_one_initcall+0x58/0x270 init/main.c:1297
| do_initcall_level init/main.c:1370 [inline]
| do_initcalls init/main.c:1386 [inline]
| do_basic_setup init/main.c:1405 [inline]
| kernel_init_freeable+0x1f4/0x259 init/main.c:1610
| kernel_init+0x19/0x180 init/main.c:1499
| ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:295
| </TASK>
which is
WARN_ON_ONCE(refcount_read(&flush_rq->ref) != 0);
.config is attached. Reverting this patch resolves the warning. I
haven't been able to figure out which bit of that code assumes that
"freed" is represented by 0 refcount internally. There is an
alloc_pages() with GFP_ZERO in that code, which seems to allocate a
'struct request' pool that contains ->ref?
This just confirms my suspicion that this is a can of worms. We can try
to debug one issue after another until they are no more, but the
fundamental issue of saying that 0 is not 0 is a bug-magnet (in the
absence of real "constructors", which C doesn't have).
Thanks,
-- Marco
View attachment ".config" of type "text/plain" (134417 bytes)
Powered by blists - more mailing lists