netdev - Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8098982f-1488-8da2-3db1-27eecf9741ce@intel.com>
Date:   Tue, 14 Feb 2023 16:07:04 -0800
From:   Jacob Keller <jacob.e.keller@...el.com>
To:     Jakub Kicinski <kuba@...nel.org>
CC:     Paul M Stillwell Jr <paul.m.stillwell.jr@...el.com>,
        Tony Nguyen <anthony.l.nguyen@...el.com>,
        <davem@...emloft.net>, <pabeni@...hat.com>, <edumazet@...gle.com>,
        <netdev@...r.kernel.org>, <jiri@...dia.com>, <idosch@...sch.org>
Subject: Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice
 driver



On 2/14/2023 3:19 PM, Jakub Kicinski wrote:
> On Tue, 14 Feb 2023 14:39:18 -0800 Jacob Keller wrote:
>> On 2/14/2023 8:14 AM, Paul M Stillwell Jr wrote:
>>>> I believe that's in line with devlink health. The devlink health log
>>>> is "formatted" but I really doubt that any user can get far in debugging
>>>> without vendor support.
>>>>  
>>>
>>> I agree, I just don't see what the trigger is in our case for FW logging.
>>>   
>>
>> Here's the thoughts I had for devlink health:
>>
>> 1) support health reporters storing more than a single event. Currently
>> all health reporters respond to a single event and then do not allow
>> storing new captures until the current one is processed. This breaks for
>> our firmware logging because we get separate events from firmware for
>> each buffer of messages. We could make this configurable such that we
>> limit the total maximum to prevent kernel memory overrun. (and some
>> policy for how to discard events when the buffer is full?)
> 
> I think the idea is that the system keeps a continuous ring buffer of
> logs and dumps it whenever bad events happens. That's in case of logs,
> obviously you can expose other types of state with health.
> 

Ok so in our case the firmware presumably has some internal storage for
messages and some set of masks for what messages it will "store" and
forward to us.

We can enable logging and enable the modules and logging levels. This
this enables firmware to log messages to its buffer, which is then
periodically flushed to us. The messages may be for "bad" events but may
not be. Sometimes its tracing information, or other details about how
the firmware. This is all encoded, and we (driver devs) are not given
the tools to decode it. My understanding is that this partly is from
fear of tying to specific driver versions.

We're supposed to dump this binary contents and give it to firmware devs
who can then decode it using the appropriate decoder for the given version.

The information is very helpful for debugging problems, but we can't
exactly translate it into usable information on our own. The mere
presence of some information (once logging is enabled) is *not*
sufficient to know if there was a problem. Basically every flow can be
enabled to log all kinds of detail.

Additionally, we get messages in blocks, and don't have any mechanism in
the driver to split these blocks into individual messages, or to tell
what log level or module the messages in a block are from.

>> 2a) add some knobs to enable/disable a health reporter
> 
> For ad-hoc collection of the ring buffer there are dump and diagnose
> callbacks which can be triggered at any time.

Sure. The point I was making is that we need some indication from user
space whether to enable or disable the logging.. because otherwise we'd
potentially just consume memory indefinitely, or we'd have limited space
to store messages and we'd overflow it.

The volume of logs can be very high depending on what modules or log
level is enabled. Thus storing all messages all the time is not really
viable.

One of our early prototypes for this feature hex-encoded the messages to
syslog but we pretty much always lost messages doing this because we'd
overflow the syslog buffer. That is what led us to use debugfs for
reading/writing the binary data. (Not to mention that since its binary
data only decodable by Intel there is no reason to spam the syslog
buffer...)

> 
>> 2b) add some firmware logging specific knobs as a "build on top of
>> health reporters" or by creating a separate firmware logging bit that
>> ties into a reporter. These knows would be how to set level, etc.
> 
> Right, the level setting is the part that I'm the least sure of.
> That sounds like something more fitting to ethtool dumps.
> 

I don't feel like this fits into ethtool at all as its not network
specific and tying it to a netdev feels weird.

>> 3) for ice, once the health reporter is enabled we request the firmware
>> to send us logging, then we get our admin queue message and simply copy
>> this into the health reporter as a new event
>>
>> 4) user space is in charge of monitoring health reports and can decide
>> how to copy events out to disk and when to delete the health reports
>> from the kernel.
> 
> That's also out of what's expected with health reporters. User should
> not have to run vendor tools with devlink health. Decoding of the dump
> may require vendor tools but checking if system is healthy or something
> crashed should happen without any user space involvement.
> 

So this wasn't about using a specific "vendor" tool, but more that
devlink health can decide when to delete a given dump?

Ultimately we have to take the binary data and give it to a vendor
specific tool to decode (whether I like that or not...). The information
required to decode the messages is not something we have permission to
share and code into the driver.

>> Basically: extend health reporters to allow multiple captures and add a
>> related module to configure firmware logging via a health reporter,
>> where the "event" is just "I have a new blob to store".
>>
>> How does this sound?
>>
>> For the specifics of 2b) I think we can probably agree that levels is
>> fairly generic (i.e. the specifics of what each level are is vendor
>> specific but the fact that there are numbers and that higher or lower
>> numbers means more severe is fairly standard)
>>
>> I know the ice firmware has many such modules we can enable or disable
>> and we would ideally be able to set which modules are active or not.
>> However all messages come through in the same blobs so we can't separate
>> them and report them to individual health reporter events. I think we
>> could have modules as a separate option for toggling which ones are on
>> or off. I would expect other vendors to have something similar or have
>> no modules at all and just an on/off switch?
> 
> I bet all vendors at this point have separate modules in the FW.
> It's been the case for a while, that's why we have multiple versions
> supported in devlink dev info.

So one key here is that module for us refers to various sub-components
of our main firmware, and does not tie into the devlink info modules at
all, nor would that even make sense to us.

Its more like sections i.e.

DCB,
MDIO,
NVM,
Scheduler,
Tx queue management,
SyncE,
LLDP,
Link Management,
...

I believe when a firmware dev adds a log message they choose an
appropriate section and log level for when it should be reported.

This makes me think the right approach is to add a new "devlink fwlog"
section entirely where we can define its semantics. It doesn't quite
line up with the current intention of health reporters.

We also considered some sort of extension to devlink regions, where each
new batch of messages from firmware would be a new snapshot.

Again this still requires some form of controls for whether to enable
logging, how many snapshots to store, how to discard old snapshots if we
run out of space, and what modules and log levels to enable.

Thanks,
Jake