[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sat, 7 Jul 2018 03:41:30 +0200
From: Tomas Bortoli <tomasbortoli@...il.com>
To: dledford@...hat.com, jgg@...pe.ca
Cc: leon@...nel.org, parav@...lanox.com, roland@...estorage.com,
swise@...ngridcomputing.com, linux-rdma@...r.kernel.org,
linux-kernel@...r.kernel.org, syzkaller@...glegroups.com
Subject: [PATCH] KASAN: use-after-free Read in rdma_listen
Hi,
I spent some time debugging the Syzkaller's found issue at subject:
https://syzkaller.appspot.com/bug?id=b8febdb3c7c8c1f1b606fb903cee66b21b2fd02f
And I've backtracked the UAF to the fact that the cma_listen_on_all()
function adds "id_priv->list" to the global var "listen_any_list" but
then such element is not removed in the rdma_destroy_id() function
(though I've seen that the call to cma_release_dev() in
rdma_destroy_id() should do the removal but doesn't get executed).
Therefore, if a program allocates a "struct rdma_cm_id" (through
ucma_open + ucma_create_id), then executes cma_listen_on_all(), then
frees the struct and repeat, during the second execution of
cma_listen_on_all() the kernel will try to update the references of the
freed node, triggering the UAF. I was able to fix the UAF with this ugly
patch:
--- b/drivers/infiniband/core/cma.c 2018-07-07 02:28:03.214589868 +0200
+++ a/drivers/infiniband/core/cma.c 2018-07-07 03:35:44.325301216 +0200
@@ -1678,6 +1678,11 @@ void rdma_destroy_id(struct rdma_cm_id *
mutex_lock(&id_priv->handler_mutex);
mutex_unlock(&id_priv->handler_mutex);
+ mutex_lock(&lock);
+ if(id_priv->list.next!=0 && id_priv->list.prev!=0)
+ list_del(&id_priv->list);
+ mutex_unlock(&lock);
+
if (id_priv->cma_dev) {
rdma_restrack_del(&id_priv->res);
if (rdma_cap_ib_cm(id_priv->id.device, 1)) {
Note: I only tested this patch against the shortest reproducer for this
issue (not any other use of rdma_cm):
https://syzkaller.appspot.com/text?tag=ReproC&x=1334f10f800000
I had to add that "if" in the patch because running the reproducer
(after several iterations) provoked a NULL-dereference in the added
list_del() call because for some reason I haven't cleared yet the next
and prev pointers of the list at issue gets zeroed, sometimes ( by what ??).
Moreover, I noticed that running the reproducer for "long" time exhaust
all the available memory. To spot the memory leaks I recompiled with:
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=10000
The reproducer induces, apparently, 2 memory leaks reported by kmemleak:
unreferenced object 0xffff880069f49d40 (size 512):
comm "repro", pid 4263, jiffies 4294722196 (age 688.262s)
hex dump (first 32 bytes):
00 b8 13 5a 00 88 ff ff 40 9d f4 69 00 88 ff ff ...Z....@.......
0a 00 98 a6 00 00 00 00 fe 80 00 00 00 00 00 00 ................
backtrace:
[<0000000075a2f334>] kmem_cache_alloc_trace+0x1b2/0x3d0
[<0000000075fd9fea>] rdma_resolve_ip+0xc0/0x6b0
[<0000000033592b0b>] rdma_resolve_addr+0x490/0x2580
[<00000000d6f2cd9d>] ucma_resolve_ip+0x193/0x260
[<0000000068f1c2b7>] ucma_write+0x2ec/0x3f0
[<00000000015692cc>] __vfs_write+0x107/0x920
[<000000009528b010>] vfs_write+0x189/0x510
[<000000001a5d169b>] ksys_write+0xfa/0x240
[<00000000b747746a>] __x64_sys_write+0x73/0xb0
[<0000000071590ffb>] do_syscall_64+0x18c/0x760
[<000000003c31113f>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0000000059247e9d>] 0xffffffffffffffff
unreferenced object 0xffff88006c0c0bc0 (size 576):
comm "repro", pid 4261, jiffies 4294722191 (age 688.261s)
hex dump (first 32 bytes):
00 02 00 00 00 00 00 00 80 b8 07 6c 00 88 ff ff ...........l....
b0 7d 2c 6b 00 88 ff ff d8 0b 0c 6c 00 88 ff ff .},k.......l....
backtrace:
[<0000000039511ef2>] kmem_cache_alloc+0x1b2/0x3d0
[<00000000106bf668>] radix_tree_node_alloc.constprop.18+0x5e/0x2e0
[<000000005b2f026d>] idr_get_free+0x9f5/0x1000
[<00000000445baa5a>] idr_alloc_u32+0x1bc/0x3d0
[<000000007fd1b6f4>] idr_alloc+0xfd/0x190
[<00000000d706389e>] cma_alloc_port+0xb0/0x170
[<000000008f968f9e>] rdma_bind_addr+0x1252/0x1f00
[<00000000e3361215>] rdma_resolve_addr+0x39e/0x2580
[<00000000d6f2cd9d>] ucma_resolve_ip+0x193/0x260
[<0000000068f1c2b7>] ucma_write+0x2ec/0x3f0
[<00000000015692cc>] __vfs_write+0x107/0x920
[<000000009528b010>] vfs_write+0x189/0x510
[<000000001a5d169b>] ksys_write+0xfa/0x240
[<00000000b747746a>] __x64_sys_write+0x73/0xb0
[<0000000071590ffb>] do_syscall_64+0x18c/0x760
[<000000003c31113f>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
I don't have a background on usage or internals of the driver at issue
but I hope these clues will help in finding the proper fix.
Tomas
Powered by blists - more mailing lists