lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230214151910.419d72cf@kernel.org>
Date:   Tue, 14 Feb 2023 15:19:10 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     Jacob Keller <jacob.e.keller@...el.com>
Cc:     Paul M Stillwell Jr <paul.m.stillwell.jr@...el.com>,
        Tony Nguyen <anthony.l.nguyen@...el.com>,
        <davem@...emloft.net>, <pabeni@...hat.com>, <edumazet@...gle.com>,
        <netdev@...r.kernel.org>, <jiri@...dia.com>, <idosch@...sch.org>
Subject: Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice
 driver

On Tue, 14 Feb 2023 14:39:18 -0800 Jacob Keller wrote:
> On 2/14/2023 8:14 AM, Paul M Stillwell Jr wrote:
> >> I believe that's in line with devlink health. The devlink health log
> >> is "formatted" but I really doubt that any user can get far in debugging
> >> without vendor support.
> >>  
> > 
> > I agree, I just don't see what the trigger is in our case for FW logging.
> >   
> 
> Here's the thoughts I had for devlink health:
> 
> 1) support health reporters storing more than a single event. Currently
> all health reporters respond to a single event and then do not allow
> storing new captures until the current one is processed. This breaks for
> our firmware logging because we get separate events from firmware for
> each buffer of messages. We could make this configurable such that we
> limit the total maximum to prevent kernel memory overrun. (and some
> policy for how to discard events when the buffer is full?)

I think the idea is that the system keeps a continuous ring buffer of
logs and dumps it whenever bad events happens. That's in case of logs,
obviously you can expose other types of state with health.

> 2a) add some knobs to enable/disable a health reporter

For ad-hoc collection of the ring buffer there are dump and diagnose
callbacks which can be triggered at any time.

> 2b) add some firmware logging specific knobs as a "build on top of
> health reporters" or by creating a separate firmware logging bit that
> ties into a reporter. These knows would be how to set level, etc.

Right, the level setting is the part that I'm the least sure of.
That sounds like something more fitting to ethtool dumps.

> 3) for ice, once the health reporter is enabled we request the firmware
> to send us logging, then we get our admin queue message and simply copy
> this into the health reporter as a new event
> 
> 4) user space is in charge of monitoring health reports and can decide
> how to copy events out to disk and when to delete the health reports
> from the kernel.

That's also out of what's expected with health reporters. User should
not have to run vendor tools with devlink health. Decoding of the dump
may require vendor tools but checking if system is healthy or something
crashed should happen without any user space involvement.

> Basically: extend health reporters to allow multiple captures and add a
> related module to configure firmware logging via a health reporter,
> where the "event" is just "I have a new blob to store".
> 
> How does this sound?
> 
> For the specifics of 2b) I think we can probably agree that levels is
> fairly generic (i.e. the specifics of what each level are is vendor
> specific but the fact that there are numbers and that higher or lower
> numbers means more severe is fairly standard)
> 
> I know the ice firmware has many such modules we can enable or disable
> and we would ideally be able to set which modules are active or not.
> However all messages come through in the same blobs so we can't separate
> them and report them to individual health reporter events. I think we
> could have modules as a separate option for toggling which ones are on
> or off. I would expect other vendors to have something similar or have
> no modules at all and just an on/off switch?

I bet all vendors at this point have separate modules in the FW.
It's been the case for a while, that's why we have multiple versions
supported in devlink dev info.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ