netdev - Re: [PATCH V1 net 1/3] net: ena: Prevent reset after device destruction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200813134111.3d22b6ac@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Thu, 13 Aug 2020 13:41:11 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Shay Agroskin <shayagr@...zon.com>
Cc:     <davem@...emloft.net>, <netdev@...r.kernel.org>, <dwmw@...zon.com>,
        <zorik@...zon.com>, <matua@...zon.com>, <saeedb@...zon.com>,
        <msw@...zon.com>, <aliguori@...zon.com>, <nafea@...zon.com>,
        <gtzalik@...zon.com>, <netanel@...zon.com>, <alisaidi@...zon.com>,
        <benh@...zon.com>, <akiyano@...zon.com>, <sameehj@...zon.com>,
        <ndagan@...zon.com>
Subject: Re: [PATCH V1 net 1/3] net: ena: Prevent reset after device
 destruction

On Thu, 13 Aug 2020 15:51:46 +0300 Shay Agroskin wrote:
> Long answer:
> The ena_destroy_device() function is called with rtnl_lock() held, 
> so it cannot run in parallel with the reset function. Also the 
> destroy function clears the bit ENA_FLAG_TRIGGER_RESET without 
> which the reset function just exits without doing anything.
> 
> A problem can then only happen when some routine sets the 
> ENA_FLAG_TRIGGER_RESET bit before the reset function is executed, 
> the following describes all functions from which this bit can be 
> set:

ena_fw_reset_device() runs from a workqueue, it can be preempted right
before it tries to take the rtnl_lock. Then after arbitrarily long
delay it will start again, take the lock, and dereference
adapter->flags. But adapter could have been long freed at this point.

Unless you flush a workqueue or cancel_work_sync() you can never be
sure it's not scheduled. And I can only see a flush when module is
unloaded now.