[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMArcTXCKA6uBMwah223Y7V152FyWs7R_nJ483j8pehJ1hF4QA@mail.gmail.com>
Date: Thu, 17 Apr 2025 15:57:47 +0900
From: Taehee Yoo <ap420073@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Stanislav Fomichev <stfomichev@...il.com>, Mina Almasry <almasrymina@...gle.com>, davem@...emloft.net,
pabeni@...hat.com, edumazet@...gle.com, andrew+netdev@...n.ch,
horms@...nel.org, asml.silence@...il.com, dw@...idwei.uk, sdf@...ichev.me,
skhawaja@...gle.com, simona.vetter@...ll.ch, kaiyuanz@...gle.com,
netdev@...r.kernel.org
Subject: Re: [PATCH net] net: devmem: fix kernel panic when socket close after
module unload
On Thu, Apr 17, 2025 at 9:35 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Thu, 17 Apr 2025 00:01:57 +0900 Taehee Yoo wrote:
> > Thank you so much for a detailed guide :)
> > I tried what you suggested, then I tested cases A, B, and C.
> > I can't see any splats from lockdep, kasan, etc.
> > Also, I checked that bindings are released well by checking
> > /sys/kernel/debug/dma_buf/bufinfo.
> > I think this approach works well.
> > However, I tested this simply. So I'm not sure yet about race condition.
> > I need more tests targeting race condition.
> >
> > I modified the locking order in the netdev_nl_bind_rx_doit().
> > And modified netdev_nl_sock_priv_destroy() code looks like:
> >
> > void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
> > {
> > struct net_devmem_dmabuf_binding *binding;
> > struct net_devmem_dmabuf_binding *temp;
> > struct net_device *dev;
> >
> > mutex_lock(&priv->lock);
> > list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
>
> Not sure you can "for each entry safe here. Since you drop the lock in
> the loop what this helper saves as the "temp" / next struct may be
> freed by the time we get to it. I think we need:
>
> mutex_lock()
> while (!list_empty())
> binding = list_first..
>
> > dev = binding->dev;
> > if (dev) {
>
Thanks. I will try to use that you suggested.
> nit: flip the condition to avoid the indent
>
> but I think the condition is too early, we should protect the pointer
> itself with the same lock as the list. So if the entry is on the list
> dev must not be NULL.
Yes, I think it would be okay to remove this condition.
>
> > netdev_hold(dev, &priv->dev_tracker, GFP_KERNEL);
>
> I think you can declare the tracker on the stack, FWIW
Okay, I will use stack instead.
>
> > mutex_unlock(&priv->lock);
> > netdev_lock(dev);
> > mutex_lock(&priv->lock);
> > if (binding->dev)
> > net_devmem_unbind_dmabuf(binding);
>
> Mina suggests that we should only release the ref from the socket side.
> I guess that'd be good, it will prevent the binding itself from going
> away. Either way you need to make sure you hold a ref on the binding.
> Either by letting mp_dmabuf_devmem_uninstall() be as is, or taking
> a new ref before you release the socket lock here.
Thanks Mina for the suggestion!
What I would like to do is like that
If binding->dev is NULL, it skips locking, but it still keeps calling
net_devmem_unbind_dmabuf().
Calling net_devmem_unbind_dmabuf() is safe even if after module unload,
because binding->bound_rxq is deleted by the uninstall path.
If bound_rxq is empty, binding->dev will not be accessed.
The only uninstall side code change is to set binding->dev to NULL and
add priv->lock.
This approach was already suggested by Stanislav earlier in this thread.
This is based on an inverse locking order from
priv lock -> instance lock to instance lock -> priv lock.
Mina, Stanislav, and Jakub, can you confirm this?
>
> > mutex_unlock(&priv->lock);
> > netdev_unlock(dev);
> > netdev_put(dev, &priv->dev_tracker);
> > mutex_lock(&priv->lock);
> > }
> > }
> > mutex_unlock(&priv->lock);
> > }
Powered by blists - more mailing lists