netdev - Re: [PATCH net 2/3] ionic: no double destroy workqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <230d76ef-5ca8-400d-8f59-406292025da3@intel.com>
Date: Wed, 11 Dec 2024 11:23:03 -0800
From: Jacob Keller <jacob.e.keller@...el.com>
To: "Nelson, Shannon" <shannon.nelson@....com>, <netdev@...r.kernel.org>,
	<davem@...emloft.net>, <kuba@...nel.org>, <edumazet@...gle.com>,
	<pabeni@...hat.com>, <andrew+netdev@...n.ch>
CC: <brett.creeley@....com>
Subject: Re: [PATCH net 2/3] ionic: no double destroy workqueue



On 12/10/2024 1:44 PM, Nelson, Shannon wrote:
> On 12/10/2024 1:02 PM, Jacob Keller wrote:
>> On 12/10/2024 9:48 AM, Shannon Nelson wrote:
>>> There are some FW error handling paths that can cause us to
>>> try to destroy the workqueue more than once, so let's be sure
>>> we're checking for that.
>>>
>>> The case where this popped up was in an AER event where the
>>> handlers got called in such a way that ionic_reset_prepare()
>>> and thus ionic_dev_teardown() got called twice in a row.
>>> The second time through the workqueue was already destroyed,
>>> and destroy_workqueue() choked on the bad wq pointer.
>>>
>>> We didn't hit this in AER handler testing before because at
>>> that time we weren't using a private workqueue.  Later we
>>> replaced the use of the system workqueue with our own private
>>> workqueue but hadn't rerun the AER handler testing since then.
>>>
>>> Fixes: 9e25450da700 ("ionic: add private workqueue per-device")
>>> Signed-off-by: Shannon Nelson <shannon.nelson@....com>
>>> ---
>>>   drivers/net/ethernet/pensando/ionic/ionic_dev.c | 5 ++++-
>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_dev.c b/drivers/net/ethernet/pensando/ionic/ionic_dev.c
>>> index 9e42d599840d..57edcde9e6f8 100644
>>> --- a/drivers/net/ethernet/pensando/ionic/ionic_dev.c
>>> +++ b/drivers/net/ethernet/pensando/ionic/ionic_dev.c
>>> @@ -277,7 +277,10 @@ void ionic_dev_teardown(struct ionic *ionic)
>>>        idev->phy_cmb_pages = 0;
>>>        idev->cmb_npages = 0;
>>>
>>> -     destroy_workqueue(ionic->wq);
>>> +     if (ionic->wq) {
>>> +             destroy_workqueue(ionic->wq);
>>> +             ionic->wq = NULL;
>>> +     }
>>
>> This seems like you still could race if two threads call
>> ionic_dev_teardown twice. Is that not possible due to some other
>> synchronization mechanism?
> 
> Good question.  Thanks for looking at this and the other patches.
> 
> This is not a race thing so much as an already-been-here thing.  This 
> function is only called by the probe, remove, and reset_prepare threads, 
> all driven as PCI calls.  I'm reasonably sure that they won't be called 
> my simultaneous threads, so we just need to be sure that we don't break 
> if reset_prepare and remove get called one after the other because some 
> PCI bus element got removed by surprise.
> 
> sln

Ok. This is all serialized by the device/PCI layer then?

Makes sense.

Reviewed-by: Jacob Keller <jacob.e.keller@...el.com>

> 
> 
>>
>> Thanks,
>> Jake
>>
>>>        mutex_destroy(&idev->cmb_inuse_lock);
>>>   }
>>>
>>
>