netdev - Re: [PATCH net-next] devlink: Execute devlink health recover as a work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8bb8a789-c189-df07-5d90-fd8f2fa172ca@mellanox.com>
Date:   Fri, 26 Apr 2019 13:04:29 +0000
From:   Moshe Shemesh <moshe@...lanox.com>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>,
        Saeed Mahameed <saeedm@...lanox.com>
CC:     "davem@...emloft.net" <davem@...emloft.net>,
        Jiri Pirko <jiri@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next] devlink: Execute devlink health recover as a
 work



On 4/26/2019 5:37 AM, Jakub Kicinski wrote:
> On Fri, 26 Apr 2019 01:42:34 +0000, Saeed Mahameed wrote:
>>>> @@ -4813,7 +4831,11 @@ static int
>>>> devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
>>>>   	if (!reporter)
>>>>   		return -EINVAL;
>>>>   
>>>> -	return devlink_health_reporter_recover(reporter, NULL);
>>>> +	if (!reporter->ops->recover)
>>>> +		return -EOPNOTSUPP;
>>>> +
>>>> +	queue_work(devlink->reporters_wq, &reporter->recover_work);
>>>> +	return 0;
>>>>   }
>>>
>>> So the recover user space request will no longer return the status,
>>> and
>>> it will not actually wait for the recover to happen.  Leaving user
>>> pondering - did the recover run and fail, or did it nor get run
>>> yet...
>>>    
>>
>> wait_for_completion_interruptible_timeout is missing from the design ?
> 
> Perhaps, but I think its better to avoid the async execution of
> the recover all together.  Perhaps its better to refcount the
> reporters on the call to recover_doit?  Or some such.. :)
> 

I tried using refcount instead of devlink lock here. But once I get to 
reporter destroy I wait for the refcount and not sure if I should 
release the reporter after some timeout or have endless wait for 
refcount. Both options seem not good.