[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20191107094236.GA2200@nanopsycho>
Date: Thu, 7 Nov 2019 10:42:36 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc: Ido Schimmel <idosch@...sch.org>, netdev@...r.kernel.org,
davem@...emloft.net, jiri@...lanox.com, shalomt@...lanox.com,
mlxsw@...lanox.com, Ido Schimmel <idosch@...lanox.com>
Subject: Re: [PATCH net-next 0/6] mlxsw: Add extended ACK for EMADs
Wed, Nov 06, 2019 at 06:26:47PM CET, jakub.kicinski@...ronome.com wrote:
>On Wed, 6 Nov 2019 09:20:39 +0100, Jiri Pirko wrote:
>> Tue, Nov 05, 2019 at 10:48:58PM CET, jakub.kicinski@...ronome.com wrote:
>> >On Tue, 5 Nov 2019 22:48:26 +0200, Ido Schimmel wrote:
>> >> On Tue, Nov 05, 2019 at 09:54:48AM -0800, Jakub Kicinski wrote:
>> >> > Hm, the firmware has no log that it keeps? Surely FW runs a lot of
>> >> > periodic jobs etc which may encounter some error conditions, how do
>> >> > you deal with those?
>> >>
>> >> There are intrusive out-of-tree modules that can get this information.
>> >> It's currently not possible to retrieve this information from the
>> >> driver. We try to move away from such methods, but it can't happen
>> >> overnight. This set and the work done in the firmware team to add this
>> >> new TLV is one step towards that goal.
>> >>
>> >> > Bottom line is I don't like when data from FW is just blindly passed
>> >> > to user space.
>> >>
>> >> The same information will be passed to user space regardless if you use
>> >> ethtool / devlink / printk.
>> >
>> >Sure, but the additional hoop to jump through makes it clear that this
>> >is discouraged and it keeps clear separation between the Linux
>> >interfaces and proprietary custom FW.. "stuff".
>>
>> Hmm, let me try to understand. So you basically have problem with
>> passing random FW generated data and putting it to user (via dmesg,
>> extack). However, ethtool dump is fine. Devlink health reporter is also
>> fine.
>
>Yup.
>
>> That is completely sufficient for async events/errors.
>> However in this case, we have MSG sent to fw which generates an ERROR
>> and this error is sent from FW back, as a reaction to this particular
>> message.
>
>Well, outputting to dmesg is not more synchronous than putting it in
>some other device specific facility.
Well, not really. In dmesg, you see not only the fw msg, you see it along
with the tid (sequence number) and emad register name.
>
>> What do you suggest we should use in order to maintain the MSG-ERROR
>> pairing? Perhaps a separate devlink health reporter just for this?
>
>Again, to be clear - that's future work, right? Kernel logs as
>implemented here do not maintain MSG-ERROR pairing.
As I described above, they actually do.
>
>> What do you think?
>
>In all honesty it's hard to tell for sure, because we don't see the FW
>and what it needs. That's kind of the point, it's a black box.
>
>I prefer the driver to mediate all the information in a meaningful way.
>If you want to report an error regarding a parameter the FW could
>communicate some identification of the field and the reason and the
>driver can control the output, for example format a string to print to
>logs?
You are right, it is a blackbox. But the blackbox is sending texts about
what is went wrong with some particular operation. And these texts are
quite handy to figure out what is going on there. Either we ignore them,
or we show them somehow. It is very valuable to show them.
Powered by blists - more mailing lists