[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1899d771-cf4b-7f22-3261-5b39feea2f7e@intel.com>
Date: Tue, 26 May 2020 14:13:22 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: Jakub Kicinski <kuba@...nel.org>, Jiri Pirko <jiri@...nulli.us>
Cc: Ido Schimmel <idosch@...sch.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
petrm@...lanox.com, amitc@...lanox.com
Subject: Re: devlink interface for asynchronous event/messages from firmware?
On 5/22/2020 10:46 AM, Jakub Kicinski wrote:
> On Fri, 22 May 2020 13:00:28 +0200 Jiri Pirko wrote:
>> Thu, May 21, 2020 at 11:51:13PM CEST, kuba@...nel.org wrote:
>>> For pure debug/tracing perhaps trace_devlink_hwerr() is the right fit?
>>
>> Well, trace_devlink_hwerr() is for simple errors that are mapped 1:1
>> with some string.
>
> Ah, damn, I missed it takes char :/
using trace_devlink_hwerr is better than what we *have* been doing at
least. :) I think if we instead made our own driver trace point it might
work well enough.
>
>> From what I got, Jacob needs to pass some data structures to the
>> user. Something more similar to health reporter dumps and their fmsg.
>
> For health reporters AFAIU right now every health reporter event
> indicates something bad has happened, so it should be logged and
> potentially reported to the vendor.
>
Right, that's why I don't think it's a great fit.
> My understanding is that Jake needs more of a tracing infra, for
> debug messages. Is that true? Do you need an on/off switch for
> those as well?
>
The messages come over different "modules" of the firmware, I think we
have ~16-20 or so modules, so ideally we'd have an on-off switch for
each module, and there's also a message level range which is sort of
like the dbg, info, err messaging.
The current solution relies on a custom driver build that enables the
logging and the messaging, and uses some module parameters to configure
this stuff. The big downside is that we don't feel the current
implementation can be left in, certainly not upstream. This means,
anytime a firmware engineer says "please get us firmware logs" we have
to reproduce whatever issue with the custom build of the driver.
The value of having this information is a significant increase in
productivity when debugging issues that might be occurring in the
firmware, or in misuse of fw<->driver interfaces, or missed expectations
between developers, etc.
Our goal is to find something that we can safely leave in the driver
that will be off by default, but enabled if necessary to capture the
logging data.
>From the sounds of it, maybe the best solution is to implement this as a
trace event. Possibly we could just implement it as a driver-specific
trace event so it'd show up in tracing/events/<driver>/fwlogs, or
something like that. That still leaves open the question of the best way
to configure which modules and levels are enabled...
This debug logging is separate from a similar-sounding system that is
intended to report non-debug messages such as link-failure reason. I do
agree that something like that ought to instead be handled by the driver
determining "oh this is a link failure indication, so I'll report it
over the ethtool netlink interface, and convert it to the value expected
by that interface".
I'm not sure what other data besides link-failure reporting that is
intended to be sent in this simpler format, as I haven't gotten any
other examples yet. The intent was to have these messages displayed by
doing a simple lookup from code to message, as there would be
significantly fewer of these and they are intended to help guide system
administrators. But given that the only example I've seen so far is the
link messages, it's unclear to me what else they would be used for.
And just to clarify, in either case the intention is that these are
one-way and read-only interfaces.
Thanks,
Jake
Powered by blists - more mailing lists