[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65b80fd5-7dcf-3cc6-b5e9-00a6a9d8706a@suse.com>
Date: Fri, 19 Apr 2019 21:07:09 +0800
From: Firo <fyang@...e.com>
To: Saeed Mahameed <saeedm@...lanox.com>,
"sathya.perla@...adcom.com" <sathya.perla@...adcom.com>,
"sriharsha.basavapatna@...adcom.com"
<sriharsha.basavapatna@...adcom.com>,
"somnath.kotur@...adcom.com" <somnath.kotur@...adcom.com>,
"ajit.khaparde@...adcom.com" <ajit.khaparde@...adcom.com>,
"davem@...emloft.net" <davem@...emloft.net>
Cc: "firogm@...il.com" <firogm@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH 1/1] be2net: Detach interface for avoiding a system crash
On 4/19/19 2:17 AM, Saeed Mahameed wrote:
> On Thu, 2019-04-18 at 15:05 +0800, Firo wrote:
>>
>> On 4/2/19 12:25 AM, Saeed Mahameed wrote:
>>> On Mon, 2019-04-01 at 20:24 +0800, Firo Yang wrote:
>>>> This crash is triggered by a user-after-free since lake of
>>>> the synchronization of a race condition between
>>>> be_update_queues() modifying multi-purpose channels of
>>>> network device and be_tx_timeout().
>>>>
>>>> BUG: unable to handle kernel NULL pointer dereference at (null)
>>>> Call Trace:
>>>> be_tx_timeout+0xa5/0x360 [be2net]
>>>> dev_watchdog+0x1d8/0x210
>>>> call_timer_fn+0x32/0x140
>>>>
>>>> To fix it, detach the interface before modifying
>>>> multi-purpose channels of network device.
>>>>
>>>> Signed-off-by: Firo Yang <fyang@...e.com>
>>>> ---
>>>> drivers/net/ethernet/emulex/benet/be_main.c | 12 ++++++++----
>>>> 1 file changed, 8 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/emulex/benet/be_main.c
>>>> b/drivers/net/ethernet/emulex/benet/be_main.c
>>>> index d5026909dec5..25d0128bf684 100644
>>>> --- a/drivers/net/ethernet/emulex/benet/be_main.c
>>>> +++ b/drivers/net/ethernet/emulex/benet/be_main.c
>>>> @@ -4705,6 +4705,8 @@ int be_update_queues(struct be_adapter
>>>> *adapter)
>>>> struct net_device *netdev = adapter->netdev;
>>>> int status;
>>>>
>>>> + netif_device_detach(netdev);
>>>> +
>>>
>>> This will reduce the probability, but will not do the trick.
>>> since this will not guarantee that the dev_watchdog is disabled.
>> Hi Saeed,
>>
>> What about using dev_watchdog_down/up() to temporarily disable the
>> dev_watchdog?
>>
>> +++ b/drivers/net/ethernet/emulex/benet/be_main.c
>> @@ -4697,6 +4697,8 @@ int be_update_queues(struct be_adapter
>> *adapter)
>> struct net_device *netdev = adapter->netdev;
>> int status;
>>
>> + dev_watchdog_down();
>> +
>
> there is no such API, currently this is a static function, and i don't
> think it is a good idea to mess around with the watchdog.
>
> If you want to avoid deferred work and explicit locking, you need
> something similar to what you did with the device_detach to flag to the
> watchdog to not look at your tx queues.
>
> by looking at be_close() I see that it calls netif_tx_disable(netdev);
> which provides some kind of state synchronization with the watchdog.
>
> so maybe netif_device_detach() will work, but it is a very heavy gun!
Sorry, I cannot fully understand why you think netif_device_detach() is
very heavy. Except clearing __LINK_STATE_PRESENT, netif_device_detach()
almost does the same thing as netif_tx_disable(); could you please
detail your thought?
> netif_carrier_off(netdev) should do the work, and at the end you will
> need to restore the original carrier state.
netif_carrier_off() might trigger a linkwatch event. From this point of
view, maybe netif_device_detach() is better.
Thanks,
Firo
>
> carrier_ok = netif_carrier_ok(netdev);
> /* must be called before netif_tx_disable() */
> netif_carrier_off(netdev);
>
> // do stuff
>
> if (carrier_ok)
> netif_carrier_on(netdev);
>
>> if (netif_running(netdev))
>> be_close(netdev);
>>
>> @@ -4711,21 +4713,21 @@ int be_update_queues(struct be_adapter
>> *adapter)
>> be_clear_queues(adapter);
>> status = be_cmd_if_destroy(adapter, adapter->if_handle, 0);
>> if (status)
>> - return status;
>> + goto out;
>>
>> if (!msix_enabled(adapter)) {
>> status = be_msix_enable(adapter);
>> if (status)
>> - return status;
>> + goto out;
>> }
>>
>> status = be_if_create(adapter);
>> if (status)
>> - return status;
>> + goto out;
>>
>> status = be_setup_queues(adapter);
>> if (status)
>> - return status;
>> + goto out;
>>
>> be_schedule_worker(adapter);
>>
>> @@ -4741,6 +4743,9 @@ int be_update_queues(struct be_adapter
>> *adapter)
>> if (netif_running(netdev))
>> status = be_open(netdev);
>>
>> +out:
>> + dev_watchdog_up();
>>
>> Thanks,
>> Firo
>>
>>> what you need is proper locking mechanism and/or scheduling the
>>> tx_timeout handling out of atomic context if a mutex will be
>>> required.
>>>
>>> netif_device_detach is a too heavy hammer for such synchronizations
>>> tasks.
>>>
>>>> if (netif_running(netdev))
>>>> be_close(netdev);
>>>>
>>>> @@ -4719,21 +4721,21 @@ int be_update_queues(struct be_adapter
>>>> *adapter)
>>>> be_clear_queues(adapter);
>>>> status = be_cmd_if_destroy(adapter, adapter->if_handle, 0);
>>>> if (status)
>>>> - return status;
>>>> + goto out;
>>>>
>>>> if (!msix_enabled(adapter)) {
>>>> status = be_msix_enable(adapter);
>>>> if (status)
>>>> - return status;
>>>> + goto out;
>>>> }
>>>>
>>>> status = be_if_create(adapter);
>>>> if (status)
>>>> - return status;
>>>> + goto out;
>>>>
>>>> status = be_setup_queues(adapter);
>>>> if (status)
>>>> - return status;
>>>> + goto out;
>>>>
>>>> be_schedule_worker(adapter);
>>>>
>>>> @@ -4748,6 +4750,8 @@ int be_update_queues(struct be_adapter
>>>> *adapter)
>>>> if (netif_running(netdev))
>>>> status = be_open(netdev);
>>>>
>>>> +out:
>>>> + netif_device_attach(netdev);
>>>> return status;
>>>> }
>>>>
Powered by blists - more mailing lists