linux-kernel - Re: [PATCH V2 rdma-next 3/4] RDMA/hns: Add reset process for RoCE in hip08

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20180528164622.GB17505@ziepe.ca>
Date:   Mon, 28 May 2018 10:46:22 -0600
From:   Jason Gunthorpe <jgg@...pe.ca>
To:     "Wei Hu (Xavier)" <xavier.huwei@...wei.com>
Cc:     dledford@...hat.com, linux-rdma@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2 rdma-next 3/4] RDMA/hns: Add reset process for RoCE in
 hip08

On Sat, May 26, 2018 at 09:47:43AM +0800, Wei Hu (Xavier) wrote:
> 
> 
> On 2018/5/25 22:55, Jason Gunthorpe wrote:
> > On Fri, May 25, 2018 at 01:54:31PM +0800, Wei Hu (Xavier) wrote:
> >>
> >> On 2018/5/25 5:31, Jason Gunthorpe wrote:
> >>>>  static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
> >>>>  	.init_instance = hns_roce_hw_v2_init_instance,
> >>>>  	.uninit_instance = hns_roce_hw_v2_uninit_instance,
> >>>> +	.reset_notify = hns_roce_hw_v2_reset_notify,
> >>>>  };
> >>>>  
> >>>>  static struct hnae3_client hns_roce_hw_v2_client = {
> >>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
> >>>> index 1b79a38..ac51372 100644
> >>>> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c
> >>>> @@ -332,6 +332,9 @@ static struct ib_ucontext *hns_roce_alloc_ucontext(struct ib_device *ib_dev,
> >>>>  	struct hns_roce_ib_alloc_ucontext_resp resp = {};
> >>>>  	struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
> >>>>  
> >>>> +	if (!hr_dev->active)
> >>>> +		return ERR_PTR(-EAGAIN);
> >>> This still doesn't make sense, ib_unregister_device already makes sure
> >>> that hns_roce_alloc_ucontext isn't running and won't be called before
> >>> returning, don't need another flag to do that.
> >>>
> >>> Since this is the only place the active flag is tested it can just be deleted
> >>> entirely.
> >> Hi, Jason
> >>    
> >>     roce reset process:
> >>     1. hr_dev->active = false;  //make sure no any process call
> >> ibv_open_device.   
> >>     2. call ib_dispatch_event() function to report IB_EVENT_DEVICE_FATAL
> >> event.
> >>     3. msleep(100);   // for some app to free resources
> >>     4. call ib_unregister_device().   
> >>     5. ...
> >>     6. ...
> >>
> >>     There are 2 steps as above before calling ib_unregister_device(), we
> >> evaluate
> >> hr_dev->active with false to avoid no any process call
> >> ibv_open_device.
> > If you think this is the right flow then it is core issue to block new
> > opens, not an individual driver issue, send a core patch - eg add a
> > 'ib_driver_fatal_error()' call that does the dispatch and call it from
> > all the drivers using this IB_EVENT_DEVICE_FATAL
> Hi, Jason
> 
> It seem to be no difference between calling ib_driver_fatal_error and
> calling ib_dispatch_event  directly in manufacturer driver.
> 
> void ib_driver_fatal_error(struct ib_device *ib_dev, u8 port_num)
>  {
>        struct ib_event event;
>                 
>   event.event = IB_EVENT_DEVICE_FATAL;
>     event.device = ib_dev;
>     event.element.port_num = port_num;
>     ib_dispatch_event(&event);
> }

My point was the core code should block calling
hns_roce_alloc_ucontext after DEVICE_FATAL if we agree that is
correct, the driver shouldn't be doing it.

Jason