linux-kernel - Re: [PATCH V2 net-next 08/12] net: hns3: stop schedule reset service while unloading driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <7e8b6189-57f8-f2af-f027-1fecfbd5c993@huawei.com>
Date:   Thu, 25 Apr 2019 20:04:11 +0800
From:   tanhuazhong <tanhuazhong@...wei.com>
To:     Neil Horman <nhorman@...hat.com>
CC:     <davem@...emloft.net>, <netdev@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <salil.mehta@...wei.com>,
        <yisen.zhuang@...wei.com>, <linuxarm@...wei.com>,
        Peng Li <lipeng321@...wei.com>
Subject: Re: [PATCH V2 net-next 08/12] net: hns3: stop schedule reset service
 while unloading driver



On 2019/4/25 19:41, Neil Horman wrote:
> On Thu, Apr 25, 2019 at 02:06:07PM +0800, tanhuazhong wrote:
>>
>>
>> On 2019/4/24 21:55, Neil Horman wrote:
>>> On Wed, Apr 24, 2019 at 07:05:27PM +0800, Huazhong Tan wrote:
>>>> This patch uses HCLGE_STATE_REMOVING/HCLGEVF_STATE_REMOVING flag to
>>>> indicate that the driver is unloading, and we should stop new coming
>>>> reset service to be scheduled, otherwise, reset service will access
>>>> some resource which has been freed by unloading.
>>>>
>>>> Signed-off-by: Huazhong Tan <tanhuazhong@...wei.com>
>>>> Signed-off-by: Peng Li <lipeng321@...wei.com>
>>>> ---
>>>> V1->V2: fixes a flag setting error
>>>> ---
>>>>    drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c   | 4 +++-
>>>>    drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 4 +++-
>>>>    drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h | 1 +
>>>>    3 files changed, 7 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>>>> index 4d5568e..ead8308 100644
>>>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>>>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>>>> @@ -2175,7 +2175,8 @@ static void hclge_mbx_task_schedule(struct hclge_dev *hdev)
>>>>    static void hclge_reset_task_schedule(struct hclge_dev *hdev)
>>>>    {
>>>> -	if (!test_and_set_bit(HCLGE_STATE_RST_SERVICE_SCHED, &hdev->state))
>>>> +	if (!test_bit(HCLGE_STATE_REMOVING, &hdev->state) &&
>>>> +	    !test_and_set_bit(HCLGE_STATE_RST_SERVICE_SCHED, &hdev->state))
>>>>    		schedule_work(&hdev->rst_service_task);
>>>>    }
>>>
>>> In what use case do you need an extra bit for this?  From my read, this work
>>> task only gets scheduled from:
>>> 1) Interrupt handlers
>>> 2) Its own service task
>>>
>>> Based on the fact that you are calling cancel_work_sync(...rst_service_task)
>>> from the pci teardown routine, irqs should all be disabled on your devices
>>> already (meaning interrupts shouldn't schedule it), and cancel_work_sync
>>> guarantees that rearming cant happen from within its own service task.
>>>
>>> Neil
>>>
>>
>> Beside these two cases, when the client detects an error and requests a
>> reset, it will call hclge_reset_event and schedule the reset work task to
>> deal with the request. This may happen after calling
>> cancel_work_sync(...rst_service_task) from the pci teardown routine.
>>
>> Best Regards, Huazhong
>>
> But that is handled from either:
> 1) hns_roce_v2_msix_interrupt_abn
> or
> 2) hns3_nic_net_timeout
> or
> 3) hns3_slot_reset
> 
> You should be protected from (1) because interrupts should be disabled before
> you call cancel_work_syn.
> 
> You should be protected from (2) because the network interface will have been
> brought down first (disabling the watchdog timer for the any interfaces on this
> hardware
> 
> You should be protected from (3) because you are tearing down the pci device
> anyway, and the pci subsystem should be ignoring resets
> 
> Neil
> 

Yes, your analysis is right, i will remove this patch.

Thanks, Huazhong.

>>>
>>> .
>>>
>>
> 
> .
>