[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200522110305.GD2478@nanopsycho>
Date: Fri, 22 May 2020 13:03:05 +0200
From: Jiri Pirko <jiri@...nulli.us>
To: Jacob Keller <jacob.e.keller@...el.com>
Cc: Ido Schimmel <idosch@...sch.org>, Jakub Kicinski <kuba@...nel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
petrm@...lanox.com, amitc@...lanox.com
Subject: Re: devlink interface for asynchronous event/messages from firmware?
Thu, May 21, 2020 at 10:59:32PM CEST, jacob.e.keller@...el.com wrote:
>
>
>On 5/21/2020 1:52 PM, Ido Schimmel wrote:
>> On Thu, May 21, 2020 at 01:22:34PM -0700, Jacob Keller wrote:
>>> On 5/20/2020 5:16 PM, Jakub Kicinski wrote:
>>>> On Wed, 20 May 2020 17:03:02 -0700 Jacob Keller wrote:
>>>>> Hi Jiri, Jakub,
>>>>>
>>>>> I've been asked to investigate using devlink as a mechanism for
>>>>> reporting asynchronous events/messages from firmware including
>>>>> diagnostic messages, etc.
>>>>>
>>>>> Essentially, the ice firmware can report various status or diagnostic
>>>>> messages which are useful for debugging internal behavior. We want to be
>>>>> able to get these messages (and relevant data associated with them) in a
>>>>> format beyond just "dump it to the dmesg buffer and recover it later".
>>>>>
>>>>> It seems like this would be an appropriate use of devlink. I thought
>>>>> maybe this would work with devlink health:
>>>>>
>>>>> i.e. we create a devlink health reporter, and then when firmware sends a
>>>>> message, we use devlink_health_report.
>>>>>
>>>>> But when I dug into this, it doesn't seem like a natural fit. The health
>>>>> reporters expect to see an "error" state, and don't seem to really fit
>>>>> the notion of "log a message from firmware" notion.
>>>>>
>>>>> One of the issues is that the health reporter only keeps one dump, when
>>>>> what we really want is a way to have a monitoring application get the
>>>>> dump and then store its contents.
>>>>>
>>>>> Thoughts on what might make sense for this? It feels like a stretch of
>>>>> the health interface...
>>>>>
>>>>> I mean basically what I am thinking of having is using the devlink_fmsg
>>>>> interface to just send a netlink message that then gets sent over the
>>>>> devlink monitor socket and gets dumped immediately.
>>>>
>>>> Why does user space need a raw firmware interface in the first place?
>>>>
>>>> Examples?
>>>>
>>>
>>> So the ice firmware can optionally send diagnostic debug messages via
>>> its control queue. The current solutions we've used internally
>>> essentially hex-dump the binary contents to the kernel log, and then
>>> these get scraped and converted into a useful format for human consumption.
>>>
>>> I'm not 100% of the format, but I know it's based on a decoding file
>>> that is specific to a given firmware image, and thus attempting to tie
>>> this into the driver is problematic.
>>
>> You explained how it works, but not why it's needed :)
>
>Well, the reason we want it is to be able to read the debug/diagnostics
>data in order to debug issues that might be related to firmware or
>software mis-use of firmware interfaces.
I think that the health reporter would be able to serve this purpose.
There is an event in firmware-> the event is propagated to the user.
The limitation we have in devlink health right now is that we only store
the last event. So perhaps we need to extend to optionally hold a
list/ring-buffer of events?
>
>By having it be a separate interface rather than trying to scrape from
>the kernel message buffer, it becomes something we can have as a
>possibility for debugging in the field.
>
>>
>>> There is also a plan to provide a simpler interface for some of the
>>> diagnostic messages where a simple bijection between one code to one
>>> message for a handful of events, like if the link engine can detect a
>>> known reason why it wasn't able to get link. I suppose these could be
>>> translated and immediately printed by the driver without a special
>>> interface.
>>
>> Petr worked on something similar last year:
>> https://lore.kernel.org/netdev/cover.1552672441.git.petrm@mellanox.com/
>>
>> Amit is currently working on a new version based on ethtool (netlink).
>>
>
>I'll take a look, thanks!
>
>-Jake
Powered by blists - more mailing lists