linux-kernel - Re: [PATCH v2 8/9] atomic,x86: Alternative atomic_*

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ybegi5slVJz3ITx0@elver.google.com>
Date:   Mon, 13 Dec 2021 20:35:39 +0100
From:   Marco Elver <elver@...gle.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Will Deacon <will@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Kees Cook <keescook@...omium.org>,
        Christoph Hellwig <hch@...radead.org>,
        Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH v2 8/9] atomic,x86: Alternative atomic_*_overflow() scheme

On Mon, Dec 13, 2021 at 10:24AM -0800, Linus Torvalds wrote:
> On Mon, Dec 13, 2021 at 10:19 AM Marco Elver <elver@...gle.com> wrote:
> >
> > I'm still genuinely worried about this:
> >
> > >       2. Yet another potentially larger issue is if some code
> > >          kmalloc()s some structs containing refcount_t, and relies on
> > >          GFP_ZERO (kzalloc()) to initialize their data assuming that a
> > >          freshly initialized refcount_t contains 0.
> >
> > Even with everything properly wrapped up in atomic_ref_t, it's not going
> > to prevent mis-initialization via kzalloc() and friends.
> 
> I agree that it's an issue, but it's not a new issue. We've had the
> exact same thing with a lot of other core data structures.
> 
> And a ref-count of zero isn't valid _anyway_. When you allocate a
> structure, a zero ref-count by definition is wrong. You need to set
> the ref-count to the user that allocated it.
> 
> So I don't actually think the "implicit zero" is an issue in practice,
> because it would be wrong in the first place. Code that relies on
> kzmalloc() to initialize a refcount cannot work right.
> 
> (And by "cannot" I obviously mean "can, if you do wrong things" - it's
> not like it's *impossible* to do an "atomic_inc_ref()" to change a 0
> refcount to a 1, but it's both wrong *AND* actively stupid, since an
> allocation does not need to set the refcount atomically).

I tried to throw syzkaller at this, but just booting results in this
warning:

| WARNING: CPU: 2 PID: 1 at block/blk-mq.c:3080 blk_mq_clear_flush_rq_mapping block/blk-mq.c:3080 [inline]
| WARNING: CPU: 2 PID: 1 at block/blk-mq.c:3080 blk_mq_exit_hctx+0x20e/0x220 block/blk-mq.c:3105
| Modules linked in:
| CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc5+ #3
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
| RIP: 0010:blk_mq_clear_flush_rq_mapping block/blk-mq.c:3080 [inline]
| RIP: 0010:blk_mq_exit_hctx+0x20e/0x220 block/blk-mq.c:3105
| Code: 00 31 d2 bf 9c 00 00 00 e8 7f 20 a2 ff e9 52 ff ff ff e8 65 22 b1 ff 4c 89 e7 e8 6d 77 00 00 e9 58 fe ff ff e8 53 22 b1 ff 90 <0f> 0b 90 e9 7e fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 41 55 41 54
| RSP: 0000:ffff9c10800139f0 EFLAGS: 00010293
| RAX: 0000000000000000 RBX: ffff970c428990b8 RCX: ffffffff9c662ecd
| RDX: ffff970c40880040 RSI: 0000000000000000 RDI: ffff970c4421e200
| RBP: ffff970c43ff9e80 R08: ffff970c43f14fe0 R09: ffff9c1080013a18
| R10: 0000000000000000 R11: 000000000001403d R12: ffff970c4421e200
| R13: ffff970c44237000 R14: 0000000000000100 R15: ffff970c4421e200
| FS:  0000000000000000(0000) GS:ffff970f6fc80000(0000) knlGS:0000000000000000
| CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: 0000000000000000 CR3: 00000002c2c0c001 CR4: 0000000000770ee0
| DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
| PKRU: 55555554
| Call Trace:
|  <TASK>
|  blk_mq_exit_hw_queues block/blk-mq.c:3130 [inline]
|  blk_mq_exit_queue+0x70/0x1a0 block/blk-mq.c:3780
|  blk_cleanup_queue+0xba/0x120 block/blk-core.c:368
|  __scsi_remove_device+0x77/0x190 drivers/scsi/scsi_sysfs.c:1458
|  scsi_probe_and_add_lun+0xb5c/0xfa0 drivers/scsi/scsi_scan.c:1221
|  __scsi_scan_target+0x121/0x660 drivers/scsi/scsi_scan.c:1604
|  scsi_scan_channel drivers/scsi/scsi_scan.c:1692 [inline]
|  scsi_scan_channel+0x90/0xd0 drivers/scsi/scsi_scan.c:1668
|  scsi_scan_host_selected+0x117/0x170 drivers/scsi/scsi_scan.c:1721
|  do_scsi_scan_host+0xba/0xd0 drivers/scsi/scsi_scan.c:1860
|  scsi_scan_host drivers/scsi/scsi_scan.c:1890 [inline]
|  scsi_scan_host+0x1cf/0x1f0 drivers/scsi/scsi_scan.c:1878
|  virtscsi_probe+0x3a0/0x3f0 drivers/scsi/virtio_scsi.c:915
|  virtio_dev_probe+0x1e4/0x2f0 drivers/virtio/virtio.c:273
|  call_driver_probe drivers/base/dd.c:517 [inline]
|  really_probe.part.0+0xd3/0x320 drivers/base/dd.c:596
|  really_probe drivers/base/dd.c:558 [inline]
|  __driver_probe_device+0xbf/0x180 drivers/base/dd.c:751
|  driver_probe_device+0x27/0xd0 drivers/base/dd.c:781
|  __driver_attach drivers/base/dd.c:1140 [inline]
|  __driver_attach+0xb4/0x1a0 drivers/base/dd.c:1092
|  bus_for_each_dev+0xab/0x100 drivers/base/bus.c:301
|  bus_add_driver+0x1c0/0x220 drivers/base/bus.c:618
|  driver_register+0xc4/0x140 drivers/base/driver.c:171
|  init+0xa0/0xea drivers/char/virtio_console.c:2257
|  do_one_initcall+0x58/0x270 init/main.c:1297
|  do_initcall_level init/main.c:1370 [inline]
|  do_initcalls init/main.c:1386 [inline]
|  do_basic_setup init/main.c:1405 [inline]
|  kernel_init_freeable+0x1f4/0x259 init/main.c:1610
|  kernel_init+0x19/0x180 init/main.c:1499
|  ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:295
|  </TASK>

which is

	WARN_ON_ONCE(refcount_read(&flush_rq->ref) != 0);

.config is attached. Reverting this patch resolves the warning.  I
haven't been able to figure out which bit of that code assumes that
"freed" is represented by 0 refcount internally. There is an
alloc_pages() with GFP_ZERO in that code, which seems to allocate a
'struct request' pool that contains ->ref?

This just confirms my suspicion that this is a can of worms. We can try
to debug one issue after another until they are no more, but the
fundamental issue of saying that 0 is not 0 is a bug-magnet (in the
absence of real "constructors", which C doesn't have).

Thanks,
-- Marco

View attachment ".config" of type "text/plain" (134417 bytes)