[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230619125015.1541143-2-idosch@nvidia.com>
Date: Mon, 19 Jun 2023 15:50:14 +0300
From: Ido Schimmel <idosch@...dia.com>
To: <netdev@...r.kernel.org>
CC: <davem@...emloft.net>, <kuba@...nel.org>, <pabeni@...hat.com>,
<edumazet@...gle.com>, <jiri@...nulli.us>, <petrm@...dia.com>, Ido Schimmel
<idosch@...dia.com>
Subject: [RFC PATCH net-next 1/2] devlink: Hold a reference on parent device
Each devlink instance is associated with a parent device and a pointer
to this device is stored in the devlink structure, but devlink does not
hold a reference on this device.
This is going to be a problem in the next patch where - among other
things - devlink will acquire the device lock during netns dismantle,
before the reload operation. Since netns dismantle is performed
asynchronously and since a reference is not held on the parent device,
it will be possible to hit a use-after-free.
Prepare for the upcoming change by holding a reference on the parent
device.
Unfortunately, with this patch and this reproducer [1], the following
crash can be observed [2]. The last reference is released from the
device asynchronously - after an RCU grace period - when the netdevsim
module is no longer present. This causes device_release() to invoke a
release callback that is no longer present: nsim_bus_dev_release().
It's not clear to me if I'm doing something wrong in devlink (I don't
think so), if it's a bug in netdevsim or alternatively a bug in core
driver code that allows the bus module to go away before all the devices
that were connected to it are released.
The problem can be solved by devlink holding a reference on the backing
module (i.e., dev->driver->owner) or by each netdevsim device holding a
reference on the netdevsim module. However, this will prevent the
removal of the module when devices are present, something that is
possible today.
[1]
#!/bin/bash
for i in $(seq 1 1000); do
echo "$i"
insmod drivers/net/netdevsim/netdevsim.ko
echo "10 0" > /sys/bus/netdevsim/new_device
rmmod netdevsim
done
[2]
BUG: unable to handle page fault for address: ffffffffc0490910
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 12e040067 P4D 12e040067 PUD 12e042067 PMD 100a38067 PTE 0
Oops: 0010 [#1] PREEMPT SMP
CPU: 0 PID: 138 Comm: kworker/0:2 Not tainted 6.4.0-rc5-custom-g42e05937ca59 #299
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
Workqueue: events devlink_release
RIP: 0010:0xffffffffc0490910
Code: Unable to access opcode bytes at 0xffffffffc04908e6.
RSP: 0018:ffffb487802f3e40 EFLAGS: 00010282
RAX: ffffffffc0490910 RBX: ffff92e6c0057800 RCX: 0001020304050608
RDX: 0000000000000001 RSI: ffffffff92b7d763 RDI: ffff92e6c0057800
RBP: ffff92e6c1ef0a00 R08: ffff92e6c0055158 R09: ffff92e6c2be9134
R10: 0000000000000018 R11: fefefefefefefeff R12: ffffffff934c3e80
R13: ffff92e6c2a1a740 R14: 0000000000000000 R15: ffff92e7f7c30b05
FS: 0000000000000000(0000) GS:ffff92e7f7c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc04908e6 CR3: 0000000101f1a004 CR4: 0000000000170ef0
Call Trace:
<TASK>
? __die+0x23/0x70
? page_fault_oops+0x181/0x470
? exc_page_fault+0xa6/0x140
? asm_exc_page_fault+0x26/0x30
? device_release+0x23/0x90
? device_release+0x34/0x90
? kobject_put+0x7d/0x1b0
? devlink_release+0x16/0x30
? process_one_work+0x1e0/0x3d0
? worker_thread+0x4e/0x3b0
? rescuer_thread+0x3a0/0x3a0
? kthread+0xe5/0x120
? kthread_complete_and_exit+0x20/0x20
? ret_from_fork+0x1f/0x30
</TASK>
Modules linked in: [last unloaded: netdevsim]
Signed-off-by: Ido Schimmel <idosch@...dia.com>
---
net/devlink/core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/devlink/core.c b/net/devlink/core.c
index c23ebabadc52..b191112f57af 100644
--- a/net/devlink/core.c
+++ b/net/devlink/core.c
@@ -4,6 +4,7 @@
* Copyright (c) 2016 Jiri Pirko <jiri@...lanox.com>
*/
+#include <linux/device.h>
#include <net/genetlink.h>
#include "devl_internal.h"
@@ -91,6 +92,7 @@ static void devlink_release(struct work_struct *work)
mutex_destroy(&devlink->lock);
lockdep_unregister_key(&devlink->lock_key);
+ put_device(devlink->dev);
kfree(devlink);
}
@@ -204,6 +206,7 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
if (ret < 0)
goto err_xa_alloc;
+ get_device(dev);
devlink->dev = dev;
devlink->ops = ops;
xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC);
--
2.40.1
Powered by blists - more mailing lists