linux-kernel - Re: [PATCH V2 net-next 08/12] net: hns3: stop schedule reset service while unloading driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190425114104.GC15861@hmswarspite.think-freely.org>
Date:   Thu, 25 Apr 2019 07:41:04 -0400
From:   Neil Horman <nhorman@...hat.com>
To:     tanhuazhong <tanhuazhong@...wei.com>
Cc:     davem@...emloft.net, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, salil.mehta@...wei.com,
        yisen.zhuang@...wei.com, linuxarm@...wei.com,
        Peng Li <lipeng321@...wei.com>
Subject: Re: [PATCH V2 net-next 08/12] net: hns3: stop schedule reset service
 while unloading driver

On Thu, Apr 25, 2019 at 02:06:07PM +0800, tanhuazhong wrote:
> 
> 
> On 2019/4/24 21:55, Neil Horman wrote:
> > On Wed, Apr 24, 2019 at 07:05:27PM +0800, Huazhong Tan wrote:
> > > This patch uses HCLGE_STATE_REMOVING/HCLGEVF_STATE_REMOVING flag to
> > > indicate that the driver is unloading, and we should stop new coming
> > > reset service to be scheduled, otherwise, reset service will access
> > > some resource which has been freed by unloading.
> > > 
> > > Signed-off-by: Huazhong Tan <tanhuazhong@...wei.com>
> > > Signed-off-by: Peng Li <lipeng321@...wei.com>
> > > ---
> > > V1->V2: fixes a flag setting error
> > > ---
> > >   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c   | 4 +++-
> > >   drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 4 +++-
> > >   drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h | 1 +
> > >   3 files changed, 7 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> > > index 4d5568e..ead8308 100644
> > > --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> > > +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> > > @@ -2175,7 +2175,8 @@ static void hclge_mbx_task_schedule(struct hclge_dev *hdev)
> > >   static void hclge_reset_task_schedule(struct hclge_dev *hdev)
> > >   {
> > > -	if (!test_and_set_bit(HCLGE_STATE_RST_SERVICE_SCHED, &hdev->state))
> > > +	if (!test_bit(HCLGE_STATE_REMOVING, &hdev->state) &&
> > > +	    !test_and_set_bit(HCLGE_STATE_RST_SERVICE_SCHED, &hdev->state))
> > >   		schedule_work(&hdev->rst_service_task);
> > >   }
> > 
> > In what use case do you need an extra bit for this?  From my read, this work
> > task only gets scheduled from:
> > 1) Interrupt handlers
> > 2) Its own service task
> > 
> > Based on the fact that you are calling cancel_work_sync(...rst_service_task)
> > from the pci teardown routine, irqs should all be disabled on your devices
> > already (meaning interrupts shouldn't schedule it), and cancel_work_sync
> > guarantees that rearming cant happen from within its own service task.
> > 
> > Neil
> > 
> 
> Beside these two cases, when the client detects an error and requests a
> reset, it will call hclge_reset_event and schedule the reset work task to
> deal with the request. This may happen after calling
> cancel_work_sync(...rst_service_task) from the pci teardown routine.
> 
> Best Regards, Huazhong
> 
But that is handled from either:
1) hns_roce_v2_msix_interrupt_abn
or
2) hns3_nic_net_timeout
or
3) hns3_slot_reset

You should be protected from (1) because interrupts should be disabled before
you call cancel_work_syn.

You should be protected from (2) because the network interface will have been
brought down first (disabling the watchdog timer for the any interfaces on this
hardware

You should be protected from (3) because you are tearing down the pci device
anyway, and the pci subsystem should be ignoring resets

Neil

> > 
> > .
> > 
>