[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <600ddf9e-589a-2aa0-7b69-a438f833ca10@samsung.com>
Date: Mon, 15 May 2023 11:09:10 +0200
From: Marek Szyprowski <m.szyprowski@...sung.com>
To: Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org
Cc: kuba@...nel.org, pabeni@...hat.com, davem@...emloft.net,
edumazet@...gle.com, jacob.e.keller@...el.com, saeedm@...dia.com,
moshe@...dia.com
Subject: Re: [patch net] devlink: change per-devlink netdev notifier to
static one
On 10.05.2023 16:46, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@...dia.com>
>
> The commit 565b4824c39f ("devlink: change port event netdev notifier
> from per-net to global") changed original per-net notifier to be
> per-devlink instance. That fixed the issue of non-receiving events
> of netdev uninit if that moved to a different namespace.
> That worked fine in -net tree.
>
> However, later on when commit ee75f1fc44dd ("net/mlx5e: Create
> separate devlink instance for ethernet auxiliary device") and
> commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in
> case of PCI device suspend") were merged, a deadlock was introduced
> when removing a namespace with devlink instance with another nested
> instance.
>
> Here there is the bad flow example resulting in deadlock with mlx5:
> net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
> devlink_pernet_pre_exit() -> devlink_reload() ->
> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
> mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
> mlx5e_destroy_devlink() -> devlink_free() ->
> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
>
> Steps to reproduce:
> $ modprobe mlx5_core
> $ ip netns add ns1
> $ devlink dev reload pci/0000:08:00.0 netns ns1
> $ ip netns del ns1
>
> Resolve this by converting the notifier from per-devlink instance to
> a static one registered during init phase and leaving it registered
> forever. Use this notifier for all devlink port instances created
> later on.
>
> Note what a tree needs this fix only in case all of the cited fixes
> commits are present.
>
> Reported-by: Moshe Shemesh <moshe@...dia.com>
> Fixes: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global")
> Fixes: ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
> Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
> Signed-off-by: Jiri Pirko <jiri@...dia.com>
This patch landed recently in linux-next as commit e93c9378e33f
("devlink: change per-devlink netdev notifier to static one").
Unfortunately it causes serious regression with kernel compiled from
multi_v7_defconfig (ARM 32bit) on all my test boards. Here is a example
of the boot failure observed on QEMU's virt ARM 32bit machine:
8<--- cut here ---
Unable to handle kernel execution of memory at virtual address e5150010
when execute
[e5150010] *pgd=4267a811, *pte=04750653, *ppte=04750453
Internal error: Oops: 8000000f [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 779 Comm: ip Not tainted 6.4.0-rc2-next-20230515 #6688
Hardware name: Generic DT based system
PC is at 0xe5150010
LR is at notifier_call_chain+0x60/0x11c
pc : [<e5150010>] lr : [<c0365f50>] psr: 60000013
...
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 4397806a DAC: 00000051
...
Process ip (pid: 779, stack limit = 0x82b757b5)
Stack: (0xe9201a00 to 0xe9202000)
...
notifier_call_chain from raw_notifier_call_chain+0x18/0x20
raw_notifier_call_chain from __dev_open+0x74/0x190
__dev_open from __dev_change_flags+0x17c/0x1f4
__dev_change_flags from dev_change_flags+0x18/0x54
dev_change_flags from do_setlink+0x24c/0xefc
do_setlink from rtnl_newlink+0x534/0x818
rtnl_newlink from rtnetlink_rcv_msg+0x250/0x300
rtnetlink_rcv_msg from netlink_rcv_skb+0xb8/0x110
netlink_rcv_skb from netlink_unicast+0x1f8/0x2bc
netlink_unicast from netlink_sendmsg+0x1cc/0x44c
netlink_sendmsg from ____sys_sendmsg+0x9c/0x260
____sys_sendmsg from ___sys_sendmsg+0x68/0x94
___sys_sendmsg from sys_sendmsg+0x4c/0x88
sys_sendmsg from ret_fast_syscall+0x0/0x54
Exception stack(0xe9201fa8 to 0xe9201ff0)
...
---[ end trace 0000000000000000 ]---
[....] Configuring network interfaces...Segmentation fault
ifup: failed to bring up lo
Reverting $subject patch on top of linux next-20230515 fixes this issue.
> ---
> net/devlink/core.c | 16 +++++++---------
> net/devlink/devl_internal.h | 1 -
> net/devlink/leftover.c | 5 ++---
> 3 files changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/net/devlink/core.c b/net/devlink/core.c
> index 777b091ef74d..0e58eee44bdb 100644
> --- a/net/devlink/core.c
> +++ b/net/devlink/core.c
> @@ -204,11 +204,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
> if (ret < 0)
> goto err_xa_alloc;
>
> - devlink->netdevice_nb.notifier_call = devlink_port_netdevice_event;
> - ret = register_netdevice_notifier(&devlink->netdevice_nb);
> - if (ret)
> - goto err_register_netdevice_notifier;
> -
> devlink->dev = dev;
> devlink->ops = ops;
> xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC);
> @@ -233,8 +228,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
>
> return devlink;
>
> -err_register_netdevice_notifier:
> - xa_erase(&devlinks, devlink->index);
> err_xa_alloc:
> kfree(devlink);
> return NULL;
> @@ -266,8 +259,6 @@ void devlink_free(struct devlink *devlink)
> xa_destroy(&devlink->params);
> xa_destroy(&devlink->ports);
>
> - WARN_ON_ONCE(unregister_netdevice_notifier(&devlink->netdevice_nb));
> -
> xa_erase(&devlinks, devlink->index);
>
> devlink_put(devlink);
> @@ -303,6 +294,10 @@ static struct pernet_operations devlink_pernet_ops __net_initdata = {
> .pre_exit = devlink_pernet_pre_exit,
> };
>
> +static struct notifier_block devlink_port_netdevice_nb __net_initdata = {
> + .notifier_call = devlink_port_netdevice_event,
> +};
> +
> static int __init devlink_init(void)
> {
> int err;
> @@ -311,6 +306,9 @@ static int __init devlink_init(void)
> if (err)
> goto out;
> err = register_pernet_subsys(&devlink_pernet_ops);
> + if (err)
> + goto out;
> + err = register_netdevice_notifier(&devlink_port_netdevice_nb);
>
> out:
> WARN_ON(err);
> diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
> index e133f423294a..62921b2eb0d3 100644
> --- a/net/devlink/devl_internal.h
> +++ b/net/devlink/devl_internal.h
> @@ -50,7 +50,6 @@ struct devlink {
> u8 reload_failed:1;
> refcount_t refcount;
> struct rcu_work rwork;
> - struct notifier_block netdevice_nb;
> char priv[] __aligned(NETDEV_ALIGN);
> };
>
> diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c
> index dffca2f9bfa7..cd0254968076 100644
> --- a/net/devlink/leftover.c
> +++ b/net/devlink/leftover.c
> @@ -7073,10 +7073,9 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
> struct devlink_port *devlink_port = netdev->devlink_port;
> struct devlink *devlink;
>
> - devlink = container_of(nb, struct devlink, netdevice_nb);
> -
> - if (!devlink_port || devlink_port->devlink != devlink)
> + if (!devlink_port)
> return NOTIFY_OK;
> + devlink = devlink_port->devlink;
>
> switch (event) {
> case NETDEV_POST_INIT:
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Powered by blists - more mailing lists