[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc8fe848-b590-fa4c-cc6b-5ccdf89ce0fa@intel.com>
Date: Tue, 10 Oct 2023 16:00:13 -0700
From: Paul M Stillwell Jr <paul.m.stillwell.jr@...el.com>
To: Jakub Kicinski <kuba@...nel.org>, Tony Nguyen <anthony.l.nguyen@...el.com>
CC: <davem@...emloft.net>, <pabeni@...hat.com>, <edumazet@...gle.com>,
<netdev@...r.kernel.org>, <jacob.e.keller@...el.com>,
<vaishnavi.tipireddy@...el.com>, <horms@...nel.org>, <leon@...nel.org>,
<corbet@....net>, <linux-doc@...r.kernel.org>, <rdunlap@...radead.org>
Subject: Re: [PATCH net-next v4 5/5] ice: add documentation for FW logging
On 10/6/2023 4:46 PM, Jakub Kicinski wrote:
> On Thu, 5 Oct 2023 10:01:10 -0700 Tony Nguyen wrote:
>> From: Paul M Stillwell Jr <paul.m.stillwell.jr@...el.com>
>>
>> Add documentation for FW logging in
>> Documentation/networking/device-drivers/ethernet/intel/ice.rst
>
> Wrong spelling, I think, because no such file.
>
Sorry, hyphen vs underscore issue, will fix.
>> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@...el.com>
>> Signed-off-by: Tony Nguyen <anthony.l.nguyen@...el.com>
>
>> +Firmware (FW) logging
>> +---------------------
>
> I think you need empty lines after the headers.
> Did you try to build this documentation and checked the warnings?
>
I believe this to be correct. It is the same as the section above it for
GNSS and it looks correct when complete. I did run 'make htmldocs' on
this and I don't get any errors or warnings and the page looks correct.
>> +The driver supports FW logging via the debugfs interface on PF 0 only. In order
>> +for FW logging to work, the NVM must support it. The 'fwlog' file will only get
>> +created in the ice debugfs directory if the NVM supports FW logging.
>
> Odd phrasing - "in order to work it needs to be supported"
>
> also NVM == non-volatile memory, you mean the logging goes into NVM
> or NVM as in FW in the NVM needs to support it?
>
Yeah, I can see it as oddly phrased. What I'm trying to say is that the
NVM image on the NIC has to support FW logging and if it doesn't then
the 'fwlog' directory will not be created. I'll take another run at it
to try to make it less confusing.
>> +Module configuration
>> +~~~~~~~~~~~~~~~~~~~~
>> +To see the status of FW logging, read the 'fwlog/modules' file like this::
>> +
>> + # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules
>> +
>> +To configure FW logging, write to the 'fwlog/modules' file like this::
>> +
>> + # echo <fwlog_event> <fwlog_level> > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules
>> +
>> +where
>> +
>> +* fwlog_level is a name as described below. Each level includes the
>> + messages from the previous/lower level
>> +
>> + * NONE
>> + * ERROR
>> + * WARNING
>> + * NORMAL
>> + * VERBOSE
>
> Is this going to give us a nice list when we render the docs?
> White space looks odd.
>
Yes, it does give a nice list
>> +* fwlog_event is a name that represents the module to receive events for. The
>> + module names are
>> +
>> + * GENERAL
>> + * CTRL
>> + * LINK
>> + * LINK_TOPO
>> + * DNL
>> + * I2C
>> + * SDP
>> + * MDIO
>> + * ADMINQ
>> + * HDMA
>> + * LLDP
>> + * DCBX
>> + * DCB
>> + * XLR
>> + * NVM
>> + * AUTH
>> + * VPD
>> + * IOSF
>> + * PARSER
>> + * SW
>> + * SCHEDULER
>> + * TXQ
>> + * RSVD
>> + * POST
>> + * WATCHDOG
>> + * TASK_DISPATCH
>> + * MNG
>> + * SYNCE
>> + * HEALTH
>> + * TSDRV
>> + * PFREG
>> + * MDLVER
>> + * ALL
>> +
>> +The name ALL is special and specifies setting all of the modules to the
>> +specified fwlog_level.
>> +
>> +Example usage to configure the modules::
>> +
>> + # echo LINK VERBOSE > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules
>> +
>> +Enabling FW log
>> +~~~~~~~~~~~~~~~
>> +Once the desired modules are configured the user enables logging. To do
>> +this the user can write a 1 (enable) or 0 (disable) to 'fwlog/enable'. An
>> +example is::
>> +
>> + # echo 1 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/enable
>
> Hm, so we "select" the module and then enable / disable?
>
> It'd feel more natural to steal the +/- thing from dynamic printing.
> To enable:
>
> # echo '+LINK VERBOSE' > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/active
>
> To disable:
>
> # echo '-LINK VERBOSE' > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/active
>
> No?
>
I like this idea, but not sure if it will work or not for us. What I'm
trying to do is reduce the number of AQ commands we send to the FW when
configuring/enabling logging.
What normally happens is the user sets multiple different modules up
with different log values so my initial thought is to allow the user to
do all the configuration first and then 'enable' that configuration.
This way there is only 1 AQ write to the FW instead of a bunch of them
and we know that once the logging is 'enabled' then the data we get from
the FW is the data that we expect to see.
If we enable each module individually then we are going to get data
coming from the FW as each module gets enabled. That can get confusing
to the FW team as they look at the log data because they may not see all
the events they expect to see in any given time because the event wasn't
enabled.
>> +Retrieving FW log data
>> +~~~~~~~~~~~~~~~~~~~~~~
>> +The FW log data can be retrieved by reading from 'fwlog/data'. The user can
>> +write to 'fwlog/data' to clear the data. The data can only be cleared when FW
>> +logging is disabled.
>
> Oh, now it sounds like only one thing can be enabled at a time.
> Can you clarify?
>
What I'm trying to describe here is a mechanism to read all the data
(whatever modules have been enabled) as it's coming in and to also be
able to clear the data in case the user wants to start fresh (by writing
0 to the file). Does that make sense? I probably wasn't clear in the
previous section that the user can enable many modules at the same time.
>> The FW log data is a binary file that is sent to Intel and
>> +used to help debug user issues.
>> +
>> +An example to read the data is::
>> +
>> + # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data > fwlog.bin
>> +
>> +An example to clear the data is::
>> +
>> + # echo 0 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data
>> +
>> +Changing how often the log events are sent to the driver
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +The driver receives FW log data from the Admin Receive Queue (ARQ). The
>> +frequency that the FW sends the ARQ events can be configured by writing to
>> +'fwlog/resolution'. The range is 1-128 (1 means push every log message, 128
>> +means push only when the max AQ command buffer is full). The suggested value is
>> +10. The user can see what the value is configured to by reading
>> +'fwlog/resolution'. An example to set the value is::
>> +
>> + # echo 50 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/resolution
>
> Resolution doesn't sound quite right, batch_size maybe?
>
I agree, resolution is what the FW team uses, but I'll change this to
some other name
>> +Configuring the number of buffers used to store FW log data
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +The driver stores FW log data in a ring within the driver. The default size of
>> +the ring is 256 4K buffers. Some use cases may require more or less data so
>> +the user can change the number of buffers that are allocated for FW log data.
>> +To change the number of buffers write to 'fwlog/nr_buffs'. The value must be one
>> +of: 64, 128, 256, or 512. FW logging must be disabled to change the value. An
>> +example of changing the value is::
>> +
>> + # echo 128 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/nr_buffs
>
> Why 4K? The number of buffers is irrelevant to the user, why not let
> the user configure the size in bytes (which his how much DRAM the
> driver will hold hostage)?
I'm trying to keep the numbers small for the user :). I could say
1048576 bytes (256 x 4096), but those kinds of numbers get unwieldy to a
user (IMO).
The FW logs generate a LOT of data depending on what modules are enabled
so we typically need a lot of buffers to handle them.
In the past we have tried to use the syslog mechanism, but we generate
SO much data that we overwhelm that and lose data. That's why the idea
of using static buffers is appealing to us. We could still overrun the
buffers, but at least we will have contiguous data. The problem then
becomes one of allocating enough space for what the user is trying to
catch instead of trying to start/stop logging and hoping you get all the
events in the log.
I can drop the mention of 4K buffers in the documentation. Or we could
use terms like 1M, 2M, 512K, et al. That would require string parsing in
the driver though and I'm trying to avoid that if possible. What do you
think?
Powered by blists - more mailing lists