[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <43faf9b6-9202-9f4e-18b4-64c195c02b1a@mellanox.com>
Date: Sat, 27 Apr 2019 17:15:06 +0000
From: Moshe Shemesh <moshe@...lanox.com>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>
CC: Saeed Mahameed <saeedm@...lanox.com>,
"davem@...emloft.net" <davem@...emloft.net>,
Jiri Pirko <jiri@...lanox.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next] devlink: Execute devlink health recover as a
work
On 4/26/2019 7:19 PM, Jakub Kicinski wrote:
> On Fri, Apr 26, 2019 at 6:04 AM Moshe Shemesh <moshe@...lanox.com> wrote:
>> On 4/26/2019 5:37 AM, Jakub Kicinski wrote:
>>> On Fri, 26 Apr 2019 01:42:34 +0000, Saeed Mahameed wrote:
>>>>>> @@ -4813,7 +4831,11 @@ static int
>>>>>> devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
>>>>>> if (!reporter)
>>>>>> return -EINVAL;
>>>>>>
>>>>>> - return devlink_health_reporter_recover(reporter, NULL);
>>>>>> + if (!reporter->ops->recover)
>>>>>> + return -EOPNOTSUPP;
>>>>>> +
>>>>>> + queue_work(devlink->reporters_wq, &reporter->recover_work);
>>>>>> + return 0;
>>>>>> }
>>>>>
>>>>> So the recover user space request will no longer return the status,
>>>>> and
>>>>> it will not actually wait for the recover to happen. Leaving user
>>>>> pondering - did the recover run and fail, or did it nor get run
>>>>> yet...
>>>>>
>>>>
>>>> wait_for_completion_interruptible_timeout is missing from the design ?
>>>
>>> Perhaps, but I think its better to avoid the async execution of
>>> the recover all together. Perhaps its better to refcount the
>>> reporters on the call to recover_doit? Or some such.. :)
>>>
>>
>> I tried using refcount instead of devlink lock here. But once I get to
>> reporter destroy I wait for the refcount and not sure if I should
>> release the reporter after some timeout or have endless wait for
>> refcount. Both options seem not good.
>
> Well you should "endlessly" wait. Why would the refcount not drop,
> you have to remove it from the list first, so no new operations can
> start, right?
> In principle there is no difference between waiting for refcount to
> drop, flushing the work, or waiting for the devlink lock if reporter
> holds it?
>
Makes sense, I will rewrite this patch, thanks.
Powered by blists - more mailing lists