lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 25 May 2018 08:55:53 -0600
From:   Jason Gunthorpe <jgg@...pe.ca>
To:     "Wei Hu (Xavier)" <xavier.huwei@...wei.com>
Cc:     dledford@...hat.com, linux-rdma@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2 rdma-next 3/4] RDMA/hns: Add reset process for RoCE in
 hip08

On Fri, May 25, 2018 at 01:54:31PM +0800, Wei Hu (Xavier) wrote:
> 
> 
> On 2018/5/25 5:31, Jason Gunthorpe wrote:
> >>  static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
> >>  	.init_instance = hns_roce_hw_v2_init_instance,
> >>  	.uninit_instance = hns_roce_hw_v2_uninit_instance,
> >> +	.reset_notify = hns_roce_hw_v2_reset_notify,
> >>  };
> >>  
> >>  static struct hnae3_client hns_roce_hw_v2_client = {
> >> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
> >> index 1b79a38..ac51372 100644
> >> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c
> >> @@ -332,6 +332,9 @@ static struct ib_ucontext *hns_roce_alloc_ucontext(struct ib_device *ib_dev,
> >>  	struct hns_roce_ib_alloc_ucontext_resp resp = {};
> >>  	struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
> >>  
> >> +	if (!hr_dev->active)
> >> +		return ERR_PTR(-EAGAIN);
> > This still doesn't make sense, ib_unregister_device already makes sure
> > that hns_roce_alloc_ucontext isn't running and won't be called before
> > returning, don't need another flag to do that.
> >
> > Since this is the only place the active flag is tested it can just be deleted
> > entirely.
> Hi, Jason
>    
>     roce reset process:
>     1. hr_dev->active = false;  //make sure no any process call
> ibv_open_device.   
>     2. call ib_dispatch_event() function to report IB_EVENT_DEVICE_FATAL
> event.
>     3. msleep(100);   // for some app to free resources
>     4. call ib_unregister_device().   
>     5. ...
>     6. ...
> 
>     There are 2 steps as above before calling ib_unregister_device(), we
> evaluate
> hr_dev->active with false to avoid no any process call
> ibv_open_device.

If you think this is the right flow then it is core issue to block new
opens, not an individual driver issue, send a core patch - eg add a
'ib_driver_fatal_error()' call that does the dispatch and call it from
all the drivers using this IB_EVENT_DEVICE_FATAL

I'm not completely sure this makes sense though, it might be better
for the core code to force stuff a IB_EVENT_DEVICE_FATAL to contexts
that open after the fatal event..

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ