lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 26 Apr 2019 09:19:39 -0700
From:   Jakub Kicinski <jakub.kicinski@...ronome.com>
To:     Moshe Shemesh <moshe@...lanox.com>
Cc:     Saeed Mahameed <saeedm@...lanox.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        Jiri Pirko <jiri@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next] devlink: Execute devlink health recover as a work

On Fri, Apr 26, 2019 at 6:04 AM Moshe Shemesh <moshe@...lanox.com> wrote:
> On 4/26/2019 5:37 AM, Jakub Kicinski wrote:
> > On Fri, 26 Apr 2019 01:42:34 +0000, Saeed Mahameed wrote:
> >>>> @@ -4813,7 +4831,11 @@ static int
> >>>> devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
> >>>>    if (!reporter)
> >>>>            return -EINVAL;
> >>>>
> >>>> -  return devlink_health_reporter_recover(reporter, NULL);
> >>>> +  if (!reporter->ops->recover)
> >>>> +          return -EOPNOTSUPP;
> >>>> +
> >>>> +  queue_work(devlink->reporters_wq, &reporter->recover_work);
> >>>> +  return 0;
> >>>>   }
> >>>
> >>> So the recover user space request will no longer return the status,
> >>> and
> >>> it will not actually wait for the recover to happen.  Leaving user
> >>> pondering - did the recover run and fail, or did it nor get run
> >>> yet...
> >>>
> >>
> >> wait_for_completion_interruptible_timeout is missing from the design ?
> >
> > Perhaps, but I think its better to avoid the async execution of
> > the recover all together.  Perhaps its better to refcount the
> > reporters on the call to recover_doit?  Or some such.. :)
> >
>
> I tried using refcount instead of devlink lock here. But once I get to
> reporter destroy I wait for the refcount and not sure if I should
> release the reporter after some timeout or have endless wait for
> refcount. Both options seem not good.

Well you should "endlessly" wait.  Why would the refcount not drop,
you have to remove it from the list first, so no new operations can
start, right?
In principle there is no difference between waiting for refcount to
drop, flushing the work, or waiting for the devlink lock if reporter
holds it?

Powered by blists - more mailing lists