lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z_6Yg5UjNRLQA9mH@mini-arch>
Date: Tue, 15 Apr 2025 10:33:55 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: Taehee Yoo <ap420073@...il.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
	edumazet@...gle.com, andrew+netdev@...n.ch, horms@...nel.org,
	almasrymina@...gle.com, asml.silence@...il.com, dw@...idwei.uk,
	sdf@...ichev.me, skhawaja@...gle.com, simona.vetter@...ll.ch,
	kaiyuanz@...gle.com, netdev@...r.kernel.org
Subject: Re: [PATCH net] net: devmem: fix kernel panic when socket close
 after module unload

On 04/15, Taehee Yoo wrote:
> Kernel panic occurs when a devmem TCP socket is closed after NIC module
> is unloaded.
> 
> This is Devmem TCP unregistration scenarios. number is an order.
> (a)socket close    (b)pp destroy    (c)uninstall    result
> 1                  2                3               OK
> 1                  3                2               (d)Impossible
> 2		   1                3               OK
> 3                  1                2               (e)Kernel panic
> 2                  3                1               (d)Impossible
> 3                  2                1               (d)Impossible
> 
> (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
>     closed.
> (b) page_pool_destroy() is called when the interface is down.
> (c) mp_ops->uninstall() is called when an interface is unregistered.
> (d) There is no scenario in mp_ops->uninstall() is called before
>     page_pool_destroy().
>     Because unregister_netdevice_many_notify() closes interfaces first
>     and then calls mp_ops->uninstall().
> (e) netdev_nl_sock_priv_destroy() accesses struct net_device.
>     But if the interface module has already been removed, net_device
>     pointer is invalid, so it causes kernel panic.
> 
> In summary, there are only 3 possible scenarios.
>  A. sk close -> pp destroy -> uninstall.
>  B. pp destroy -> sk close -> uninstall.
>  C. pp destroy -> uninstall -> sk close.
> 
> Case C is a kernel panic scenario.
> 
> In order to fix this problem, it makes netdev_nl_sock_priv_destroy() do
> nothing if a module is already removed.
> The mp_ops->uninstall() handles these instead.
> 
> The netdev_nl_sock_priv_destroy() iterates binding->list and releases
> them all with net_devmem_unbind_dmabuf().
> The net_devmem_unbind_dmabuf() has the below steps.
> 1. Delete binding from a list.

[..]

> 2. Call _net_mp_close_rxq() for all rxq's bound to a binding.

This should not happen in the (C) case, right? mp_dmabuf_devmem_uninstall
removes the rxq from the xa. Can you explain more on why specifically
the crash occurs? Are we missing some check in netdev_nl_sock_priv_destroy
that makes sure the device is still alive?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ