[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200813134111.3d22b6ac@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date: Thu, 13 Aug 2020 13:41:11 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Shay Agroskin <shayagr@...zon.com>
Cc: <davem@...emloft.net>, <netdev@...r.kernel.org>, <dwmw@...zon.com>,
<zorik@...zon.com>, <matua@...zon.com>, <saeedb@...zon.com>,
<msw@...zon.com>, <aliguori@...zon.com>, <nafea@...zon.com>,
<gtzalik@...zon.com>, <netanel@...zon.com>, <alisaidi@...zon.com>,
<benh@...zon.com>, <akiyano@...zon.com>, <sameehj@...zon.com>,
<ndagan@...zon.com>
Subject: Re: [PATCH V1 net 1/3] net: ena: Prevent reset after device
destruction
On Thu, 13 Aug 2020 15:51:46 +0300 Shay Agroskin wrote:
> Long answer:
> The ena_destroy_device() function is called with rtnl_lock() held,
> so it cannot run in parallel with the reset function. Also the
> destroy function clears the bit ENA_FLAG_TRIGGER_RESET without
> which the reset function just exits without doing anything.
>
> A problem can then only happen when some routine sets the
> ENA_FLAG_TRIGGER_RESET bit before the reset function is executed,
> the following describes all functions from which this bit can be
> set:
ena_fw_reset_device() runs from a workqueue, it can be preempted right
before it tries to take the rtnl_lock. Then after arbitrarily long
delay it will start again, take the lock, and dereference
adapter->flags. But adapter could have been long freed at this point.
Unless you flush a workqueue or cancel_work_sync() you can never be
sure it's not scheduled. And I can only see a flush when module is
unloaded now.
Powered by blists - more mailing lists