[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200914143100.06a4641d@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date: Mon, 14 Sep 2020 14:31:00 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Jiri Pirko <jiri@...nulli.us>
Cc: Vasundhara Volam <vasundhara-v.volam@...adcom.com>,
Moshe Shemesh <moshe@...lanox.com>,
"David S. Miller" <davem@...emloft.net>,
Jiri Pirko <jiri@...lanox.com>,
Netdev <netdev@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>,
Michael Chan <michael.chan@...adcom.com>
Subject: Re: [PATCH net-next RFC v4 01/15] devlink: Add reload action option
to devlink reload command
On Mon, 14 Sep 2020 13:28:29 +0200 Jiri Pirko wrote:
> Mon, Sep 14, 2020 at 11:54:55AM CEST, vasundhara-v.volam@...adcom.com wrote:
> >On Mon, Sep 14, 2020 at 3:02 PM Jiri Pirko <jiri@...nulli.us> wrote:
> >> >> +mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink, enum devlink_reload_action action,
> >> >> + struct netlink_ext_ack *extack,
> >> >> + unsigned long *actions_performed)
> >> >Sorry for repeating again, for fw_activate action on our device, all
> >> >the driver entities undergo reset asynchronously once user initiates
> >> >"devlink dev reload action fw_activate" and reload_up does not have
> >> >much to do except reporting actions that will be/being performed.
> >> >
> >> >Once reset is complete, the health reporter will be notified using
> >>
> >> Hmm, how is the fw reset related to health reporter recovery? Recovery
> >> happens after some error event. I don't believe it is wise to mix it.
> >Our device has a fw_reset health reporter, which is updated on reset
> >events and firmware activation is one among them. All non-fatal
> >firmware reset events are reported on fw_reset health reporter.
>
> Hmm, interesting. In that case, assuming this is fine, should we have
> some standard in this. I mean, if the driver supports reset, should it
> also define the "fw_reset" reporter to report such events?
>
> Jakub, what is your take here?
Sounds doubly wrong to me.
As you say health reporters should trigger on error events,
communicating completion of an action requested by the user
seems very wrong. IIUC operators should monitor and collect
health failures. In this case looks like all events from fw_reset
would need to be discarded, since they are not meaningful
without the context of what triggered them.
And secondly, reporting the completion via some async mechanism
that user has to monitor is just plain lazy. That's pushing out
the work that has to be done out to user space. Wait for the
completion in the driver.
> >> Instead, why don't you block in reload_up() until the reset is complete?
> >
> >Though user initiate "devlink dev reload" event on a single interface,
> >all driver entities undergo reset and all entities recover
> >independently. I don't think we can block the reload_up() on the
> >interface(that user initiated the command), until whole reset is
> >complete.
>
> Why not? mlxsw reset takes up to like 10 seconds for example.
+1, why?
Powered by blists - more mailing lists