netdev - Re: [PATCH net] net: devmem: fix kernel panic when socket close after module unload

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMArcTXCKA6uBMwah223Y7V152FyWs7R_nJ483j8pehJ1hF4QA@mail.gmail.com>
Date: Thu, 17 Apr 2025 15:57:47 +0900
From: Taehee Yoo <ap420073@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Stanislav Fomichev <stfomichev@...il.com>, Mina Almasry <almasrymina@...gle.com>, davem@...emloft.net, 
	pabeni@...hat.com, edumazet@...gle.com, andrew+netdev@...n.ch, 
	horms@...nel.org, asml.silence@...il.com, dw@...idwei.uk, sdf@...ichev.me, 
	skhawaja@...gle.com, simona.vetter@...ll.ch, kaiyuanz@...gle.com, 
	netdev@...r.kernel.org
Subject: Re: [PATCH net] net: devmem: fix kernel panic when socket close after
 module unload

On Thu, Apr 17, 2025 at 9:35 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Thu, 17 Apr 2025 00:01:57 +0900 Taehee Yoo wrote:
> > Thank you so much for a detailed guide :)
> > I tried what you suggested, then I tested cases A, B, and C.
> > I can't see any splats from lockdep, kasan, etc.
> > Also, I checked that bindings are released well by checking
> > /sys/kernel/debug/dma_buf/bufinfo.
> > I think this approach works well.
> > However, I tested this simply. So I'm not sure yet about race condition.
> > I need more tests targeting race condition.
> >
> > I modified the locking order in the netdev_nl_bind_rx_doit().
> > And modified netdev_nl_sock_priv_destroy() code looks like:
> >
> > void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
> > {
> >         struct net_devmem_dmabuf_binding *binding;
> >         struct net_devmem_dmabuf_binding *temp;
> >         struct net_device *dev;
> >
> >         mutex_lock(&priv->lock);
> >         list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
>
> Not sure you can "for each entry safe here. Since you drop the lock in
> the loop what this helper saves as the "temp" / next struct may be
> freed by the time we get to it. I think we need:
>
>         mutex_lock()
>         while (!list_empty())
>                 binding = list_first..
>
> >                 dev = binding->dev;
> >                 if (dev) {
>

Thanks. I will try to use that you suggested.

> nit: flip the condition to avoid the indent
>
> but I think the condition is too early, we should protect the pointer
> itself with the same lock as the list. So if the entry is on the list
> dev must not be NULL.

Yes, I think it would be okay to remove this condition.

>
> >                         netdev_hold(dev, &priv->dev_tracker, GFP_KERNEL);
>
> I think you can declare the tracker on the stack, FWIW

Okay, I will use stack instead.

>
> >                         mutex_unlock(&priv->lock);
> >                         netdev_lock(dev);
> >                         mutex_lock(&priv->lock);
> >                         if (binding->dev)
> >                                 net_devmem_unbind_dmabuf(binding);
>
> Mina suggests that we should only release the ref from the socket side.
> I guess that'd be good, it will prevent the binding itself from going
> away. Either way you need to make sure you hold a ref on the binding.
> Either by letting mp_dmabuf_devmem_uninstall() be as is, or taking
> a new ref before you release the socket lock here.

Thanks Mina for the suggestion!
What I would like to do is like that
If binding->dev is NULL, it skips locking, but it still keeps calling
net_devmem_unbind_dmabuf().
Calling net_devmem_unbind_dmabuf() is safe even if after module unload,
because binding->bound_rxq is deleted by the uninstall path.
If bound_rxq is empty, binding->dev will not be accessed.
The only uninstall side code change is to set binding->dev to NULL and
add priv->lock.
This approach was already suggested by Stanislav earlier in this thread.

This is based on an inverse locking order from
priv lock -> instance lock to instance lock -> priv lock.
Mina, Stanislav, and Jakub, can you confirm this?

>
> >                         mutex_unlock(&priv->lock);
> >                         netdev_unlock(dev);
> >                         netdev_put(dev, &priv->dev_tracker);
> >                         mutex_lock(&priv->lock);
> >                 }
> >         }
> >         mutex_unlock(&priv->lock);
> > }