lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc8fe848-b590-fa4c-cc6b-5ccdf89ce0fa@intel.com>
Date: Tue, 10 Oct 2023 16:00:13 -0700
From: Paul M Stillwell Jr <paul.m.stillwell.jr@...el.com>
To: Jakub Kicinski <kuba@...nel.org>, Tony Nguyen <anthony.l.nguyen@...el.com>
CC: <davem@...emloft.net>, <pabeni@...hat.com>, <edumazet@...gle.com>,
	<netdev@...r.kernel.org>, <jacob.e.keller@...el.com>,
	<vaishnavi.tipireddy@...el.com>, <horms@...nel.org>, <leon@...nel.org>,
	<corbet@....net>, <linux-doc@...r.kernel.org>, <rdunlap@...radead.org>
Subject: Re: [PATCH net-next v4 5/5] ice: add documentation for FW logging

On 10/6/2023 4:46 PM, Jakub Kicinski wrote:
> On Thu,  5 Oct 2023 10:01:10 -0700 Tony Nguyen wrote:
>> From: Paul M Stillwell Jr <paul.m.stillwell.jr@...el.com>
>>
>> Add documentation for FW logging in
>> Documentation/networking/device-drivers/ethernet/intel/ice.rst
> 
> Wrong spelling, I think, because no such file.
> 

Sorry, hyphen vs underscore issue, will fix.

>> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@...el.com>
>> Signed-off-by: Tony Nguyen <anthony.l.nguyen@...el.com>
> 
>> +Firmware (FW) logging
>> +---------------------
> 
> I think you need empty lines after the headers.
> Did you try to build this documentation and checked the warnings?
> 

I believe this to be correct. It is the same as the section above it for 
GNSS and it looks correct when complete. I did run 'make htmldocs' on 
this and I don't get any errors or warnings and the page looks correct.

>> +The driver supports FW logging via the debugfs interface on PF 0 only. In order
>> +for FW logging to work, the NVM must support it. The 'fwlog' file will only get
>> +created in the ice debugfs directory if the NVM supports FW logging.
> 
> Odd phrasing - "in order to work it needs to be supported"
> 
> also NVM == non-volatile memory, you mean the logging goes into NVM
> or NVM as in FW in the NVM needs to support it?
> 

Yeah, I can see it as oddly phrased. What I'm trying to say is that the 
NVM image on the NIC has to support FW logging and if it doesn't then 
the 'fwlog' directory will not be created. I'll take another run at it 
to try to make it less confusing.

>> +Module configuration
>> +~~~~~~~~~~~~~~~~~~~~
>> +To see the status of FW logging, read the 'fwlog/modules' file like this::
>> +
>> +  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules
>> +
>> +To configure FW logging, write to the 'fwlog/modules' file like this::
>> +
>> +  # echo <fwlog_event> <fwlog_level> > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules
>> +
>> +where
>> +
>> +* fwlog_level is a name as described below. Each level includes the
>> +  messages from the previous/lower level
>> +
>> +      *	NONE
>> +      *	ERROR
>> +      *	WARNING
>> +      *	NORMAL
>> +      *	VERBOSE
> 
> Is this going to give us a nice list when we render the docs?
> White space looks odd.
> 

Yes, it does give a nice list

>> +* fwlog_event is a name that represents the module to receive events for. The
>> +  module names are
>> +
>> +      *	GENERAL
>> +      *	CTRL
>> +      *	LINK
>> +      *	LINK_TOPO
>> +      *	DNL
>> +      *	I2C
>> +      *	SDP
>> +      *	MDIO
>> +      *	ADMINQ
>> +      *	HDMA
>> +      *	LLDP
>> +      *	DCBX
>> +      *	DCB
>> +      *	XLR
>> +      *	NVM
>> +      *	AUTH
>> +      *	VPD
>> +      *	IOSF
>> +      *	PARSER
>> +      *	SW
>> +      *	SCHEDULER
>> +      *	TXQ
>> +      *	RSVD
>> +      *	POST
>> +      *	WATCHDOG
>> +      *	TASK_DISPATCH
>> +      *	MNG
>> +      *	SYNCE
>> +      *	HEALTH
>> +      *	TSDRV
>> +      *	PFREG
>> +      *	MDLVER
>> +      *	ALL
>> +
>> +The name ALL is special and specifies setting all of the modules to the
>> +specified fwlog_level.
>> +
>> +Example usage to configure the modules::
>> +
>> +  # echo LINK VERBOSE > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules
>> +
>> +Enabling FW log
>> +~~~~~~~~~~~~~~~
>> +Once the desired modules are configured the user enables logging. To do
>> +this the user can write a 1 (enable) or 0 (disable) to 'fwlog/enable'. An
>> +example is::
>> +
>> +  # echo 1 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/enable
> 
> Hm, so we "select" the module and then enable / disable?
> 
> It'd feel more natural to steal the +/- thing from dynamic printing.
> To enable:
> 
>   # echo '+LINK VERBOSE' > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/active
> 
> To disable:
> 
>   # echo '-LINK VERBOSE' > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/active
> 
> No?
> 

I like this idea, but not sure if it will work or not for us. What I'm 
trying to do is reduce the number of AQ commands we send to the FW when 
configuring/enabling logging.

What normally happens is the user sets multiple different modules up 
with different log values so my initial thought is to allow the user to 
do all the configuration first and then 'enable' that configuration. 
This way there is only 1 AQ write to the FW instead of a bunch of them 
and we know that once the logging is 'enabled' then the data we get from 
the FW is the data that we expect to see.

If we enable each module individually then we are going to get data 
coming from the FW as each module gets enabled. That can get confusing 
to the FW team as they look at the log data because they may not see all 
the events they expect to see in any given time because the event wasn't 
enabled.

>> +Retrieving FW log data
>> +~~~~~~~~~~~~~~~~~~~~~~
>> +The FW log data can be retrieved by reading from 'fwlog/data'. The user can
>> +write to 'fwlog/data' to clear the data. The data can only be cleared when FW
>> +logging is disabled.
> 
> Oh, now it sounds like only one thing can be enabled at a time.
> Can you clarify?
> 

What I'm trying to describe here is a mechanism to read all the data 
(whatever modules have been enabled) as it's coming in and to also be 
able to clear the data in case the user wants to start fresh (by writing 
0 to the file). Does that make sense? I probably wasn't clear in the 
previous section that the user can enable many modules at the same time.

>> The FW log data is a binary file that is sent to Intel and
>> +used to help debug user issues.
>> +
>> +An example to read the data is::
>> +
>> +  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data > fwlog.bin
>> +
>> +An example to clear the data is::
>> +
>> +  # echo 0 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data
>> +
>> +Changing how often the log events are sent to the driver
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +The driver receives FW log data from the Admin Receive Queue (ARQ). The
>> +frequency that the FW sends the ARQ events can be configured by writing to
>> +'fwlog/resolution'. The range is 1-128 (1 means push every log message, 128
>> +means push only when the max AQ command buffer is full). The suggested value is
>> +10. The user can see what the value is configured to by reading
>> +'fwlog/resolution'. An example to set the value is::
>> +
>> +  # echo 50 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/resolution
> 
> Resolution doesn't sound quite right, batch_size maybe?
> 

I agree, resolution is what the FW team uses, but I'll change this to 
some other name

>> +Configuring the number of buffers used to store FW log data
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +The driver stores FW log data in a ring within the driver. The default size of
>> +the ring is 256 4K buffers. Some use cases may require more or less data so
>> +the user can change the number of buffers that are allocated for FW log data.
>> +To change the number of buffers write to 'fwlog/nr_buffs'. The value must be one
>> +of: 64, 128, 256, or 512. FW logging must be disabled to change the value. An
>> +example of changing the value is::
>> +
>> +  # echo 128 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/nr_buffs
> 
> Why 4K? The number of buffers is irrelevant to the user, why not let
> the user configure the size in bytes (which his how much DRAM the
> driver will hold hostage)?

I'm trying to keep the numbers small for the user :). I could say 
1048576 bytes (256 x 4096), but those kinds of numbers get unwieldy to a 
user (IMO).

The FW logs generate a LOT of data depending on what modules are enabled 
so we typically need a lot of buffers to handle them.

In the past we have tried to use the syslog mechanism, but we generate 
SO much data that we overwhelm that and lose data. That's why the idea 
of using static buffers is appealing to us. We could still overrun the 
buffers, but at least we will have contiguous data. The problem then 
becomes one of allocating enough space for what the user is trying to 
catch instead of trying to start/stop logging and hoping you get all the 
events in the log.

I can drop the mention of 4K buffers in the documentation. Or we could 
use terms like 1M, 2M, 512K, et al. That would require string parsing in 
the driver though and I'm trying to avoid that if possible. What do you 
think?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ