linux-kernel - Re: [PATCH net-next] devlink: Execute devlink health recover as a work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJpBn1wN67rTM2Gn-nsh0hTjq6OM2m6DS+BGU-HhSQ5ehn95Cw@mail.gmail.com>
Date:   Fri, 26 Apr 2019 09:19:39 -0700
From:   Jakub Kicinski <jakub.kicinski@...ronome.com>
To:     Moshe Shemesh <moshe@...lanox.com>
Cc:     Saeed Mahameed <saeedm@...lanox.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        Jiri Pirko <jiri@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next] devlink: Execute devlink health recover as a work

On Fri, Apr 26, 2019 at 6:04 AM Moshe Shemesh <moshe@...lanox.com> wrote:
> On 4/26/2019 5:37 AM, Jakub Kicinski wrote:
> > On Fri, 26 Apr 2019 01:42:34 +0000, Saeed Mahameed wrote:
> >>>> @@ -4813,7 +4831,11 @@ static int
> >>>> devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
> >>>>    if (!reporter)
> >>>>            return -EINVAL;
> >>>>
> >>>> -  return devlink_health_reporter_recover(reporter, NULL);
> >>>> +  if (!reporter->ops->recover)
> >>>> +          return -EOPNOTSUPP;
> >>>> +
> >>>> +  queue_work(devlink->reporters_wq, &reporter->recover_work);
> >>>> +  return 0;
> >>>>   }
> >>>
> >>> So the recover user space request will no longer return the status,
> >>> and
> >>> it will not actually wait for the recover to happen.  Leaving user
> >>> pondering - did the recover run and fail, or did it nor get run
> >>> yet...
> >>>
> >>
> >> wait_for_completion_interruptible_timeout is missing from the design ?
> >
> > Perhaps, but I think its better to avoid the async execution of
> > the recover all together.  Perhaps its better to refcount the
> > reporters on the call to recover_doit?  Or some such.. :)
> >
>
> I tried using refcount instead of devlink lock here. But once I get to
> reporter destroy I wait for the refcount and not sure if I should
> release the reporter after some timeout or have endless wait for
> refcount. Both options seem not good.

Well you should "endlessly" wait.  Why would the refcount not drop,
you have to remove it from the list first, so no new operations can
start, right?
In principle there is no difference between waiting for refcount to
drop, flushing the work, or waiting for the devlink lock if reporter
holds it?