linux-kernel - Re: [PATCH net-next] devlink: Execute devlink health recover as a work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <43faf9b6-9202-9f4e-18b4-64c195c02b1a@mellanox.com>
Date:   Sat, 27 Apr 2019 17:15:06 +0000
From:   Moshe Shemesh <moshe@...lanox.com>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>
CC:     Saeed Mahameed <saeedm@...lanox.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        Jiri Pirko <jiri@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next] devlink: Execute devlink health recover as a
 work



On 4/26/2019 7:19 PM, Jakub Kicinski wrote:
> On Fri, Apr 26, 2019 at 6:04 AM Moshe Shemesh <moshe@...lanox.com> wrote:
>> On 4/26/2019 5:37 AM, Jakub Kicinski wrote:
>>> On Fri, 26 Apr 2019 01:42:34 +0000, Saeed Mahameed wrote:
>>>>>> @@ -4813,7 +4831,11 @@ static int
>>>>>> devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
>>>>>>     if (!reporter)
>>>>>>             return -EINVAL;
>>>>>>
>>>>>> -  return devlink_health_reporter_recover(reporter, NULL);
>>>>>> +  if (!reporter->ops->recover)
>>>>>> +          return -EOPNOTSUPP;
>>>>>> +
>>>>>> +  queue_work(devlink->reporters_wq, &reporter->recover_work);
>>>>>> +  return 0;
>>>>>>    }
>>>>>
>>>>> So the recover user space request will no longer return the status,
>>>>> and
>>>>> it will not actually wait for the recover to happen.  Leaving user
>>>>> pondering - did the recover run and fail, or did it nor get run
>>>>> yet...
>>>>>
>>>>
>>>> wait_for_completion_interruptible_timeout is missing from the design ?
>>>
>>> Perhaps, but I think its better to avoid the async execution of
>>> the recover all together.  Perhaps its better to refcount the
>>> reporters on the call to recover_doit?  Or some such.. :)
>>>
>>
>> I tried using refcount instead of devlink lock here. But once I get to
>> reporter destroy I wait for the refcount and not sure if I should
>> release the reporter after some timeout or have endless wait for
>> refcount. Both options seem not good.
> 
> Well you should "endlessly" wait.  Why would the refcount not drop,
> you have to remove it from the list first, so no new operations can
> start, right?
> In principle there is no difference between waiting for refcount to
> drop, flushing the work, or waiting for the devlink lock if reporter
> holds it?
> 
Makes sense, I will rewrite this patch, thanks.