lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMArcTWV6YBzPN7e6y6Zf6LYAL8gnyVdGAnGuXbV4CFLP65f5w@mail.gmail.com>
Date: Thu, 8 May 2025 18:59:58 +0900
From: Taehee Yoo <ap420073@...il.com>
To: Mina Almasry <almasrymina@...gle.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, 
	edumazet@...gle.com, andrew+netdev@...n.ch, horms@...nel.org, sdf@...ichev.me, 
	netdev@...r.kernel.org, asml.silence@...il.com, dw@...idwei.uk, 
	skhawaja@...gle.com, willemb@...gle.com, jdamato@...tly.com
Subject: Re: [PATCH net v2] net: devmem: fix kernel panic when socket close
 after module unload

On Thu, May 8, 2025 at 3:28 AM Mina Almasry <almasrymina@...gle.com> wrote:
>

Hi Mina,
Thanks a lot for your review!

> On Tue, May 6, 2025 at 7:09 AM Taehee Yoo <ap420073@...il.com> wrote:
> >
> > Kernel panic occurs when a devmem TCP socket is closed after NIC module
> > is unloaded.
> >
> > This is Devmem TCP unregistration scenarios. number is an order.
> > (a)socket close
>
> Lets call this "netlink socket close", to differentiate between
> netlink and data sockets.

Thanks, I will change it in the next patch.

>
> >    (b)pp destroy
> >    (c)uninstall    result
> > 1                  2                3               OK
> > 1                  3                2               (d)Impossible
> > 2                  1                3               OK
> > 3                  1                2               (e)Kernel panic
> > 2                  3                1               (d)Impossible
> > 3                  2                1               (d)Impossible
> >
> > (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
> >     closed.
> > (b) page_pool_destroy() is called when the interface is down.
> > (c) mp_ops->uninstall() is called when an interface is unregistered.
> > (d) There is no scenario in mp_ops->uninstall() is called before
> >     page_pool_destroy().
> >     Because unregister_netdevice_many_notify() closes interfaces first
> >     and then calls mp_ops->uninstall().
> > (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
> >     netdev_lock().
> >     But if the interface module has already been removed, net_device
> >     pointer is invalid, so it causes kernel panic.
> >
> > In summary, there are only 3 possible scenarios.
> >  A. sk close -> pp destroy -> uninstall.
> >  B. pp destroy -> sk close -> uninstall.
> >  C. pp destroy -> uninstall -> sk close.
> >
> > Case C is a kernel panic scenario.
> >
> > In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
> > binding->dev to NULL.
> > It indicates an bound net_device was unregistered.
> >
> > It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
> > if binding->dev is NULL.
> >
> > It inverts socket/netdev lock order like below:
> >     netdev_lock();
> >     mutex_lock(&priv->lock);
> >     mutex_unlock(&priv->lock);
> >     netdev_unlock();
> >
> > Because of inversion of locking ordering, mp_dmabuf_devmem_uninstall()
> > acquires socket lock from now on.
> >
> > Tests:
> > Scenario A:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     kill $pid
> >     ip link set $interface down
> >     modprobe -rv $module
> >
> > Scenario B:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     ip link set $interface down
> >     kill $pid
> >     modprobe -rv $module
> >
> > Scenario C:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     modprobe -rv $module
> >     sleep 5
> >     kill $pid
> >
> > Splat looks like:
> > Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> > KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
> > CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
> > Tainted: [B]=BAD_PAGE, [W]=WARN
> > RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
> > Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
> > RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
> > RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
> > RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
> > RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
> > R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
> > R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
> > FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> >  ...
> >  netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
> >  genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
> >  ...
> >  netlink_release (net/netlink/af_netlink.c:737)
> >  ...
> >  __sock_release (net/socket.c:647)
> >  sock_close (net/socket.c:1393)
> >
> > Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
> > Signed-off-by: Taehee Yoo <ap420073@...il.com>
> > ---
> >
> > v2:
> >  - Fix commit message.
> >  - Correct Fixes tag.
> >  - Inverse locking order.
> >  - Do not put a reference count of binding in
> >    mp_dmabuf_devmem_uninstall().
> >
> > In order to test this patch, driver side implementation of devmem TCP[1]
> > is needed to be applied.
> >
> > [1] https://lore.kernel.org/netdev/20250415052458.1260575-1-ap420073@gmail.com/T/#u
> >
> >  net/core/devmem.c      |  6 ++++++
> >  net/core/devmem.h      |  3 +++
> >  net/core/netdev-genl.c | 27 ++++++++++++++++++---------
> >  3 files changed, 27 insertions(+), 9 deletions(-)
> >
> > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > index 6e27a47d0493..636c1e82b8da 100644
> > --- a/net/core/devmem.c
> > +++ b/net/core/devmem.c
> > @@ -167,6 +167,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> >
> >  struct net_devmem_dmabuf_binding *
> >  net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> > +                      struct netdev_nl_sock *priv,
> >                        struct netlink_ext_ack *extack)
> >  {
> >         struct net_devmem_dmabuf_binding *binding;
> > @@ -189,6 +190,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> >         }
> >
> >         binding->dev = dev;
> > +       binding->priv = priv;
> >
> >         err = xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id,
> >                               binding, xa_limit_32b, &id_alloc_next,
> > @@ -376,12 +378,16 @@ static void mp_dmabuf_devmem_uninstall(void *mp_priv,
> >         struct netdev_rx_queue *bound_rxq;
> >         unsigned long xa_idx;
> >
> > +       mutex_lock(&binding->priv->lock);
> >         xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
> >                 if (bound_rxq == rxq) {
> >                         xa_erase(&binding->bound_rxqs, xa_idx);
> > +                       if (xa_empty(&binding->bound_rxqs))
> > +                               binding->dev = NULL;
>
> priv->lock is meant to protect priv->bindings only. To be honest, I
> find priv->lock being used to protect both priv->bindings and
> bindings->dev is extremely confusing. I think it may be contributing
> to the convoluted lock ordering in netdev_nl_sock_priv_destroy. It
> also makes it such that the binding needs to have a reference to the
> netlink socket that owns it which is mentally confusing to me. The
> binding should abstractly speaking not need to know anything about the
> netlink socket that holds it. Also, AFAICT this may be buggy. The
> binding may outlive the netlink socket. For example when the netlink
> socket is closed but the binding is kept alive because there are
> references to the netmems in it, UAF may happen.

Thanks for sharing your concerns.

>
> Instead of this, if you need to protect concurrent access to
> binding->dev, either:
>
> 1. create new spin_lock, binding->dev_lock, which protects access to
> binding->dev, or

Would a mutex be sufficient, or is a spinlock required?

> 2. Use rcu protection to lockelessly set binding->dev, or
> 3. Use some cmpxchg logic to locklessly set/query binding->dev and
> detect if it got modified in a racing thread.
>
> But please no multiplexing priv->lock to protect both priv->bindings
> and binding->dev.

Okay, I think introducing a new lock for protecting a binding would
be enough. Not only protect binding->dev, all members of a binding.
So, binding->lock would be more fit as a name.

All members except a dev are not required to be protected so far.
Because they are initialized when binding is initialized, and they are
not changed in live.
So, binding is alive, members are valid except for a dev.
However, introducing binding->lock makes the code obvious, and it will
not confuse us someday we want to protect other members of a binding.

>
> >                         break;
> >                 }
> >         }
> > +       mutex_unlock(&binding->priv->lock);
> >  }
> >
> >  static const struct memory_provider_ops dmabuf_devmem_ops = {
> > diff --git a/net/core/devmem.h b/net/core/devmem.h
> > index 7fc158d52729..afd6320b2c9b 100644
> > --- a/net/core/devmem.h
> > +++ b/net/core/devmem.h
> > @@ -11,6 +11,7 @@
> >  #define _NET_DEVMEM_H
> >
> >  #include <net/netmem.h>
> > +#include <net/netdev_netlink.h>
> >
> >  struct netlink_ext_ack;
> >
> > @@ -20,6 +21,7 @@ struct net_devmem_dmabuf_binding {
> >         struct sg_table *sgt;
> >         struct net_device *dev;
> >         struct gen_pool *chunk_pool;
> > +       struct netdev_nl_sock *priv;
> >
> >         /* The user holds a ref (via the netlink API) for as long as they want
> >          * the binding to remain alive. Each page pool using this binding holds
> > @@ -63,6 +65,7 @@ struct dmabuf_genpool_chunk_owner {
> >  void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding);
> >  struct net_devmem_dmabuf_binding *
> >  net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> > +                      struct netdev_nl_sock *priv,
> >                        struct netlink_ext_ack *extack);
> >  void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding);
> >  int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> > index 230743bdbb14..b8bb73574276 100644
> > --- a/net/core/netdev-genl.c
> > +++ b/net/core/netdev-genl.c
> > @@ -859,13 +859,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
> >                 goto err_genlmsg_free;
> >         }
> >
> > -       mutex_lock(&priv->lock);
> > -
> >         err = 0;
> >         netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
> >         if (!netdev) {
> >                 err = -ENODEV;
> > -               goto err_unlock_sock;
> > +               goto err_genlmsg_free;
> >         }
> >         if (!netif_device_present(netdev))
> >                 err = -ENODEV;
> > @@ -877,10 +875,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
> >                 goto err_unlock;
> >         }
> >
> > -       binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, info->extack);
> > +       mutex_lock(&priv->lock);
> > +       binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);
>
> I'm not so sure about this as well. priv->lock should protect access
> to priv->bindings only. I'm not sure why it's locked before
> net_devmem_bind_dmabuf? Can it be locked around the access to
> priv->bindings only?

As you mentioned, this lock is unnecessary to protect
netdev_devmem_bind_dmabuf().
It is required to protect only priv->bindings.
So,  I will move priv->lock to lock only around priv->bindings
in the next patch.

>
> >         if (IS_ERR(binding)) {
> >                 err = PTR_ERR(binding);
> > -               goto err_unlock;
> > +               goto err_unlock_sock;
> >         }
> >
> >         nla_for_each_attr_type(attr, NETDEV_A_DMABUF_QUEUES,
> > @@ -921,18 +920,17 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
> >         if (err)
> >                 goto err_unbind;
> >
> > -       netdev_unlock(netdev);
> > -
> >         mutex_unlock(&priv->lock);
> > +       netdev_unlock(netdev);
> >
> >         return 0;
> >
> >  err_unbind:
> >         net_devmem_unbind_dmabuf(binding);
> > -err_unlock:
> > -       netdev_unlock(netdev);
> >  err_unlock_sock:
> >         mutex_unlock(&priv->lock);
> > +err_unlock:
> > +       netdev_unlock(netdev);
> >  err_genlmsg_free:
> >         nlmsg_free(rsp);
> >         return err;
> > @@ -948,14 +946,25 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
> >  {
> >         struct net_devmem_dmabuf_binding *binding;
> >         struct net_devmem_dmabuf_binding *temp;
> > +       netdevice_tracker dev_tracker;
> >         struct net_device *dev;
> >
> >         mutex_lock(&priv->lock);
> >         list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
> >                 dev = binding->dev;
> > +               if (!dev) {
> > +                       net_devmem_unbind_dmabuf(binding);
> > +                       continue;
> > +               }
> > +               netdev_hold(dev, &dev_tracker, GFP_KERNEL);
> > +               mutex_unlock(&priv->lock);
> >                 netdev_lock(dev);
> > +               mutex_lock(&priv->lock);
> >                 net_devmem_unbind_dmabuf(binding);
> > +               mutex_unlock(&priv->lock);
> >                 netdev_unlock(dev);
> > +               netdev_put(dev, &dev_tracker);
> > +               mutex_lock(&priv->lock);
> >         }
> >         mutex_unlock(&priv->lock);
> >  }
> > --
> > 2.34.1
> >
>
>
> --
> Thanks,
> Mina

Thanks a lot!
Taehee Yoo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ